# Using Tweepy and NLTK to Analyze Tweets about Netflix #Punisher

In this Jupyter notebook, I intend to stream twitter data about The Punisher using Python's Tweepy library. I will then flatten the tweets, load them to Pandas, and analyze using techniques including, but not limited to, NLTK.



A few modules used are checked in to my GitHub page (flatten_tweets, slistener).

## 1. Setup and Stream Tweets

First, import libraries and setup matplotlib to run inline.

In [1]:
%matplotlib inline

import json
import glob
import pandas as pd
import numpy as np
from tweepy import OAuthHandler, API, Stream
from slistener import SListener
from flatten_tweets import flatten_tweets, check_word_in_tweet
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer

Load credentials from a JSON.  Since these keys are personal, they are kept in a file that is not checked in.

In [2]:
def load_cred():
    with open('twitter_credentials.json') as cred_data:
        info = json.load(cred_data)
        consumer_key = info['CONSUMER_KEY']
        consumer_secret = info['CONSUMER_SECRET']
        access_key = info['ACCESS_KEY']
        access_secret = info['ACCESS_SECRET']
    
    return consumer_key, consumer_secret, access_key, access_secret

consumer_key, consumer_secret, access_token, access_token_secret = load_cred()

Authorization and initialization

In [3]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = API(auth)

Set up words to track (in this case just #Punisher)

In [4]:
keywords_to_track = ['#Punisher']

SListener module is checked in to my GitHub page.  Here, instantiate SListener, Stream, and begin collecting tweets.

In [None]:
listen = SListener(api)
stream = Stream(auth, listen)
stream.filter(track = keywords_to_track)

## 2. Data Intake and Processing

Load JSONs. These were collected for a few hours over a couple of days (1/20 - 1/21).  Season 2 of The Punisher was released on 1/18/2019.

In [5]:
tweet_list = []

for file in glob.glob("streamer*.json"):
    with open(file, 'r') as tweet_data:
        tweets_json = filter(None, tweet_data.read().split("\n"))
        
    for tweet in tweets_json:
        tweet_obj = json.loads(tweet)
        tweet_list.append(tweet_obj)
        
print("{0} tweets being analyzed.".format(len(tweet_list)))

374 tweets being analyzed.


Flatten tweets, loading into Pandas DataFrame, print first 5 rows of text.

In [6]:
tweets = flatten_tweets(tweet_list)
ds_tweets = pd.DataFrame(tweets)
print('Text from first 5 tweets:')
print(ds_tweets['text'].head(5))

Text from first 5 tweets:
0    RT @Randomgamerma: My reaction to season 2 of ...
1    Someone give @benbarnes his goddamn Oscar omg ...
2                       #Punisher punish her real good
3    RT @venuspriestess: @benbarnes screams of terr...
4    The Punisher season 2 on @netflix is getting r...
Name: text, dtype: object


In [7]:
#punish = ds_tweets['text'].str.contains('#Punisher',case = False)
punish = check_word_in_tweet('#Punisher', ds_tweets)
print("Proportion of #Punisher tweets:", np.sum(punish) / ds_tweets.shape[0])
netflix = check_word_in_tweet('#Netflix', ds_tweets)
print("Proportion of #Netflix tweets:",np.sum(netflix) / ds_tweets.shape[0])

Proportion of #Punisher tweets: 1.0
Proportion of #Netflix tweets: 0.0855614973262032


A proportion of 1.0 shows that every tweet contains #Punisher somewhere, as expected.

## 3. Sentiment Analysis

Instantiate new SentimentIntensityAnalyzer and generate sentiment scores

In [8]:
sid = SentimentIntensityAnalyzer()
sentiment_scores = ds_tweets['text'].apply(sid.polarity_scores)

Let's take a look at some positive and negative tweets. <br>
**WARNING:** Potential spoilers in tweets.

In [9]:
ds_tweets['sentiment'] = 0

for i in range(len(sentiment_scores)):
    ds_tweets.loc[i,'sentiment'] = sentiment_scores[i]['compound']
    
print("Print out the text of 5 positive tweets:")
print()
print(ds_tweets[ds_tweets['sentiment'] > 0.6]['text'].values[0:5])

print()
print("Print out the text of 5 negative tweets:")
print(ds_tweets[ds_tweets['sentiment'] < -0.6]['text'].values[0:5])

# Generate average sentiment scores for #python
sentiment_py = ds_tweets['sentiment'].mean()
print("Average sentiment for #Punisher: {0}".format(sentiment_py))

Print out the text of 5 positive tweets:

['I LOVE marvel &amp; DC💕💕💕❤️ #InfinityWar #Marvel #BlackPanther #Thanos #IronMan #Hulk #CaptainAmerica #WinterSoldier… https://t.co/BolXvHdFRU'
 'I LOVE marvel &amp; DC💕💕💕❤️ #InfinityWar #Marvel #BlackPanther #Thanos #IronMan #Hulk #CaptainAmerica #WinterSoldier… https://t.co/HMrYwGQ8LJ'
 'I knew from the very start that Krista likes Billy 😂😂😂 oh Krista i wish u know what Billy did #Punisher'
 'Please help me 💀🖤 spread this \nand save the punisher ⬇️ https://t.co/pfNCPNU8Ce\n\n#thepunisher #punisher #marvel… https://t.co/AIiw0YeLbf'
 'Aww!!! Just getting to start the new season of #Punisher and I was so happy to see your name on it again @KitMoxie !!']

Print out the text of 5 negative tweets:
['RT @venuspriestess: @benbarnes screams of terror and pain during the last episode of Season 1 of #Punisher were fucking Oscar worthy and so…'
 'So much bloody, Punisher, bum.\n#Punisher #punisherseason2'
 'Watching season two of #Punisher and i gotta s

**Observation:** The "negative" tweets don't appear negative at all, but rather have to do with the type of content in The Punisher, this needs adjusting.