# Using Tweepy and NLTK to Analyze Tweets about Netflix #Punisher

In this Jupyter notebook, I intend to stream twitter data about The Punisher using Python's Tweepy library. I will then flatten the tweets, load them to Pandas, and analyze using techniques including, but not limited to, NLTK.



A few modules used are checked in to my GitHub page (flatten_tweets, slistener).

## 1. Setup and Stream Tweets

First, import libraries and setup matplotlib to run inline.

In [1]:
%matplotlib inline

import json
import glob
import pandas as pd
import numpy as np
from tweepy import OAuthHandler, API, Stream
from slistener import SListener
from flatten_tweets import flatten_tweets, check_word_in_tweet
import matplotlib.pyplot as plt

Load credentials from a JSON.  Since these keys are personal, they are kept in a file that is not checked in.

In [4]:
def load_cred():
    with open('twitter_credentials.json') as cred_data:
        info = json.load(cred_data)
        consumer_key = info['CONSUMER_KEY']
        consumer_secret = info['CONSUMER_SECRET']
        access_key = info['ACCESS_KEY']
        access_secret = info['ACCESS_SECRET']
    
    return consumer_key, consumer_secret, access_key, access_secret

consumer_key, consumer_secret, access_token, access_token_secret = load_cred()

Authorization and initialization

In [6]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = API(auth)

Set up words to track (in this case just #Punisher)

In [8]:
keywords_to_track = ['#Punisher']

SListener module is checked in to my GitHub page.  Here, instantiate SLIstener, Stream, and begin collecting tweets.

In [None]:
listen = SListener(api)
stream = Stream(auth, listen)
stream.filter(track = keywords_to_track)

## 2. Data Intake and Processing

Load JSONs. These were collected for a few hours over a couple of days.

In [2]:
tweet_list = []

for file in glob.glob("streamer* - Copy.json"):
    with open(file, 'r') as tweet_data:
        tweets_json = filter(None, tweet_data.read().split("\n"))
        
    for tweet in tweets_json:
        tweet_obj = json.loads(tweet)
        tweet_list.append(tweet_obj)
        
print("{0} tweets being analyzed.".format(len(tweet_list)))

156 tweets being analyzed.


Flatten tweets, loading into Pandas DataFrame, print first 5 rows of text.

In [3]:
tweets = flatten_tweets(tweet_list)
ds_tweets = pd.DataFrame(tweets)
print(ds_tweets['text'].head())

0    RT @Randomgamerma: My reaction to season 2 of ...
1    Someone give @benbarnes his goddamn Oscar omg ...
2                       #Punisher punish her real good
3    RT @venuspriestess: @benbarnes screams of terr...
4    The Punisher season 2 on @netflix is getting r...
Name: text, dtype: object


In [4]:
#punish = ds_tweets['text'].str.contains('Punisher',case = False)
punish = check_word_in_tweet('#Punisher', ds_tweets)
print("Proportion of #Punisher tweets:", np.sum(punish) / ds_tweets.shape[0])

Proportion of #Punisher tweets: 1.0


A proportion of 1.0 shows that every tweet contains #Punisher somewhere, as expected.