<a href="https://colab.research.google.com/github/annamm77/twitter_workshop/blob/main/Twitter_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter Sentiment Analysis

---



In [None]:
# Run once to install the libraries we'll use, note tweepy should be >4.4.0
!pip install tweepy --upgrade
import tweepy
import getpass

import pandas as pd
import re

from textblob import TextBlob

from wordcloud import WordCloud
import matplotlib.pyplot as plt

## Setting Up Tweepy Client

To run Tweepy on your own, you'll need to do the following:
* [Apply](https://developer.twitter.com/en/apply-for-access) for a Twitter Developer Account
* Generate a bearer token and save it
  * A new token can be generated from the [Dashboard page](https://developer.twitter.com/en/portal/dashboard), click on the key icon next to your project app

If you do not have access to a Twitter Developer Account, there will be a prefetched data set you can use to proceed.

In [None]:
# Run this cell to set up Tweepy Client
bearer_token = getpass.getpass()
client = tweepy.Client(bearer_token=bearer_token)

## Fetching Tweets
In this cell we'll fetch Tweets using our Tweepy Client using the search_recent_tweets function

### Make it your own!
* Try using a different query. [See building query documentation here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).
* Try using a different method with the client. Note some methods are only available through paid access. [See available methods on Tweepy Client documentation here](https://docs.tweepy.org/en/stable/client.html). 


In [None]:
# Run this cell to fetch Tweets or skip this cell and continue to use prefetched Tweets

# Query goes here

# Client fetch goes here


## Process and Sanitize Tweets

We have our results from the client, now we'll need to process and clean up the data before we run our analysis.

In [None]:
# If Tweepy client is not set up, set this to True to use prefetched Tweets 
use_prefetched_tweets = False

# Create a DataFrame with a column for the raw tweets
if not use_prefetched_tweets:
  df = pd.DataFrame([tweet.text for tweet in tweets.data], columns=["raw_tweet"])
else:
  prefetched_tweets = [
                        "Cool website: https://t.co/zAZYAnaUaa", 
                        "I #love #hashtags they are #everything to #me",
                        "Go to my awesome Twitter timeline for more cerebral thoughts https://t.co/By2r1hZ5NT",
                        "Twitter Am I doing this right? Or no?",
                        "🤠🤠🤠🤠🤠 howdy 🤠🤠🤠🤠",
                        "The best thing I ate today was tirokafteri and pita it was super super good!!!!!",
                        "There's two plants on my desk that need serious medical and horitcultural attention",
                        "How many tweets is too may tweets for demo purposes?",
                        "Live coding is really exciting",
                        "If this didn't work I'd be super sad"
                       ]
  df = pd.DataFrame(prefetched_tweets, columns=["raw_tweet"])

def sanitizeTweetText(text):
    text = re.sub("@[A-Za-z0-9]+", "", text) # Remove @ mentions
    text = text.replace("#", "") # Remove hashtag sign
    text = re.sub(r"(?:\@|http?\://|https?\://|www)\S+", "", text) # Remove http links
    text = text.strip() # Remove any leading or trailing whitespace
    return text    

# Add a column for the sanitized tweets
df["sanitized_tweet"] = df["raw_tweet"].apply(sanitizeTweetText)

# Remove any blank sanitized tweets
df = df[df.sanitized_tweet != '']

df

## Perform Sentiment Analysis
In this cell we'll perform sentiment analysis using the TextBlob library sentiment.polarity function

### Make it your own!
* Try using another method in TextBlob to generate a new column. [See TextBlob documentation here.](https://textblob.readthedocs.io/en/dev/index.html)
* Print out the total number of positive, neutral and negative Tweets 

In [None]:
# Create a method to determine polarity of a tweet

# Add a column for raw polarity score


## Extra Fun

What else could you do with this data? 
* Generate a word cloud of the Tweets using the WordCloud library
* Plot a graph showing the sentiment distribution 
* Get the percentage of positive Tweets from total Tweets

In [None]:
allWords = " ".join(df["sanitized_tweet"])

wordCloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(allWords)

plt.imshow(wordCloud, interpolation="bilinear")
plt.axis("off")
plt.show()