# Twitter Datafeed Sentiment Analysis Using API

# Methodology:
1. Generate API keys and access tokens from twitter developer website (it's not chargeable as of 22-09-22)
2. Use tweepy library for connecting with twitter server using the api keys
3. Get the tweets from your account homepage and save in a dataframe
4. Preprocess the tweets (cleaning)
5. Initialize a text classification transformer model from Huggingface website
6. Use the model to classify the tweets as Negative, Neutral, or Positive

# Install tweepy library, it's not available in the default kernel

In [1]:
!pip install tweepy

Collecting tweepy
  Downloading tweepy-4.10.1-py3-none-any.whl (94 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.6/94.6 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tweepy
Successfully installed tweepy-4.10.1
[0m

# Import the necessary libraries

In [2]:
import tweepy
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
import numpy as np
from kaggle_secrets import UserSecretsClient
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

# Twitter API keys (it's free of cost)
1. Go to twitter developer website: https://developer.twitter.com/en/apply-for-access
2. Login with your twitter account (phone and email address is required)
3. Create a standalone app (app is a placeholder for your api keys)
4. Save your api keys
5. Generate access tokens, save them
6. Setup user authentication and save
7. If you get stuck with the above steps, check this (i learnt from here - @Mehran): https://youtu.be/Lu1nskBkPJU
8. My api keys are saved as a secret in this file, you cannot access them (sorry about that)

In [3]:
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("twt_api_key")
api_key_secret = user_secrets.get_secret("twt_api_key_secret")
access_token = user_secrets.get_secret("twt_access_token")
access_token_secret = user_secrets.get_secret("twt_access_token_secret")

# Make authentication handler - a connection between this python kernel and the twitter server

In [4]:
auth = tweepy.OAuth1UserHandler(api_key, api_key_secret)
auth. set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Access the public tweets from your account home timeline and save in a dataframe

In [5]:
public_tweets = api.home_timeline()
columns = ['time','user','tweet']
data = []
for tweet in public_tweets:
    data.append([tweet.created_at, tweet.user.screen_name, tweet.text])
df = pd.DataFrame(data, columns=columns)
df

Unnamed: 0,time,user,tweet
0,2022-09-22 08:44:18+00:00,machinelearnflx,Robo-Advisers and the Future of Financial Advice - https://t.co/L7GRgLfqEV https://t.co/LinWS0aOB6 #Datascience
1,2022-09-22 08:24:17+00:00,machinelearnflx,IBM Applied AI course https://t.co/PkO4c7gtLZ #machinelearning #datascience #datascientist #datascientist… https://t.co/lEG1nBxmHB
2,2022-09-22 07:54:22+00:00,abhi1thakur,Join MLSpace Discord: https://t.co/61S68cYnjs
3,2022-09-22 07:12:34+00:00,machinelearnflx,Council Post: How Artificial Intelligence Can Improve Organizational Decision Making https://t.co/XFdA0UbxpS #ArtificialIntelligence
4,2022-09-22 06:38:05+00:00,machinelearnflx,eugeneyan/applied-ml: 📚 Papers &amp; tech blogs by companies sharing their work on data science &amp; machine learning in p… https://t.co/RPHQWOZkIs
5,2022-09-22 06:07:44+00:00,AdiPolak,RT @confluentinc: Join @AdiPolak at #Current22 as she discusses how to combine the worlds of Chaos Engineering + managing data stages in la…
6,2022-09-22 06:00:04+00:00,svpino,"Today's machine learning question:\n\n""Bagging or Boosting?.""\n\nYou can give it a try here ↓ https://t.co/XLLzdwsoka"
7,2022-09-22 05:31:25+00:00,machinelearnflx,Council Post: Five Ways Artificial Intelligence Is Reshaping Enterprise Sales Operations https://t.co/xO2qS5R5mZ #ArtificialIntelligence
8,2022-09-22 05:00:14+00:00,machinelearnflx,https://t.co/ctyUBkKsh7 TensorFlow Developer course https://t.co/NF9i5j1yqO #machinelearning #datascience… https://t.co/fPODEsLk0U
9,2022-09-22 04:02:14+00:00,machinelearnflx,Fintech ESG 7 Big Trends https://t.co/icMtI13UlY #Fintech


# Function to clean the raw tweets before feeding into the model

In [6]:
def tweet_preprocess(tweet):
    tweet_words = []
    for word in tweet.split(' '):
        if word.startswith('@') and len(word)>1:
            word = '@user'
        elif word.startswith('http'):
            word = 'http'
        tweet_words.append(word)
    return ' '.join(tweet_words)

In [7]:
df['clean_tweet'] = df.tweet.map(tweet_preprocess)
df.head(1)

Unnamed: 0,time,user,tweet,clean_tweet
0,2022-09-22 08:44:18+00:00,machinelearnflx,Robo-Advisers and the Future of Financial Advice - https://t.co/L7GRgLfqEV https://t.co/LinWS0aOB6 #Datascience,Robo-Advisers and the Future of Financial Advice - http http #Datascience


# Download the transformer model checkpoint from Huggingface website (there are many checkpoints, select one)

In [8]:
# there is a very nice shortcut to the below codes that I love, by using the transformer pipeline, however, I'm using the tokenizer for the sake of learning
checkpoint = 'cardiffnlp/twitter-roberta-base-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
labels = ['Negative', 'Neutral', 'Positive'] # This is declared in the model card on the website

Downloading:   0%|          | 0.00/747 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

# Function to conver the cleaned tweets into sentiment and probability of the sentiment

In [9]:
def tweet_analyze(tweet):
    encoded_tweet = tokenizer(tweet, return_tensors='pt')
    output = model(**encoded_tweet)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    label_id = np.argmax(scores)
    label = labels[label_id]
    return label,scores[label_id]

In [10]:
df.insert(2, 'sentiment', np.nan)

In [11]:
df['sentiment'] = df.clean_tweet.map(tweet_analyze)
df.head(50)

Unnamed: 0,time,user,sentiment,tweet,clean_tweet
0,2022-09-22 08:44:18+00:00,machinelearnflx,"(Neutral, 0.84693974)",Robo-Advisers and the Future of Financial Advice - https://t.co/L7GRgLfqEV https://t.co/LinWS0aOB6 #Datascience,Robo-Advisers and the Future of Financial Advice - http http #Datascience
1,2022-09-22 08:24:17+00:00,machinelearnflx,"(Neutral, 0.8494098)",IBM Applied AI course https://t.co/PkO4c7gtLZ #machinelearning #datascience #datascientist #datascientist… https://t.co/lEG1nBxmHB,IBM Applied AI course http #machinelearning #datascience #datascientist #datascientist… http
2,2022-09-22 07:54:22+00:00,abhi1thakur,"(Neutral, 0.8410758)",Join MLSpace Discord: https://t.co/61S68cYnjs,Join MLSpace Discord: http
3,2022-09-22 07:12:34+00:00,machinelearnflx,"(Neutral, 0.54549533)",Council Post: How Artificial Intelligence Can Improve Organizational Decision Making https://t.co/XFdA0UbxpS #ArtificialIntelligence,Council Post: How Artificial Intelligence Can Improve Organizational Decision Making http #ArtificialIntelligence
4,2022-09-22 06:38:05+00:00,machinelearnflx,"(Neutral, 0.83235806)",eugeneyan/applied-ml: 📚 Papers &amp; tech blogs by companies sharing their work on data science &amp; machine learning in p… https://t.co/RPHQWOZkIs,eugeneyan/applied-ml: 📚 Papers &amp; tech blogs by companies sharing their work on data science &amp; machine learning in p… http
5,2022-09-22 06:07:44+00:00,AdiPolak,"(Neutral, 0.7882411)",RT @confluentinc: Join @AdiPolak at #Current22 as she discusses how to combine the worlds of Chaos Engineering + managing data stages in la…,RT @user Join @user at #Current22 as she discusses how to combine the worlds of Chaos Engineering + managing data stages in la…
6,2022-09-22 06:00:04+00:00,svpino,"(Neutral, 0.78525186)","Today's machine learning question:\n\n""Bagging or Boosting?.""\n\nYou can give it a try here ↓ https://t.co/XLLzdwsoka","Today's machine learning question:\n\n""Bagging or Boosting?.""\n\nYou can give it a try here ↓ http"
7,2022-09-22 05:31:25+00:00,machinelearnflx,"(Neutral, 0.67874235)",Council Post: Five Ways Artificial Intelligence Is Reshaping Enterprise Sales Operations https://t.co/xO2qS5R5mZ #ArtificialIntelligence,Council Post: Five Ways Artificial Intelligence Is Reshaping Enterprise Sales Operations http #ArtificialIntelligence
8,2022-09-22 05:00:14+00:00,machinelearnflx,"(Neutral, 0.86652267)",https://t.co/ctyUBkKsh7 TensorFlow Developer course https://t.co/NF9i5j1yqO #machinelearning #datascience… https://t.co/fPODEsLk0U,http TensorFlow Developer course http #machinelearning #datascience… http
9,2022-09-22 04:02:14+00:00,machinelearnflx,"(Neutral, 0.7858145)",Fintech ESG 7 Big Trends https://t.co/icMtI13UlY #Fintech,Fintech ESG 7 Big Trends http #Fintech


# I loved making this kernel, learnt twitter api usage today! I'm just loving this entire ML journey, everyday is kind of fun.