# _Test Notebook (October 19, 2019)_

This notebook is going to serve as a playground of sorts in order to build something that will connect to the Twitter API, stream 1000 number of tweets, and then disconnect. I haven't figured out quite yet how to get it to repeat but if this trial goes well, I'll start looking into that.

Alexander Galea's [blog](https://galeascience.wordpress.com/2016/03/18/collecting-twitter-data-with-python/) was instrumental in getting this going. A lot of the work below is adapted directly from his GitHub.

In [1]:
# import personal tools
from joetools import private
from textblob import TextBlob
import sqlite3
import dataset
import tweepy
import time

In [2]:
# setup tweepy to authenticate with Twitter with the following code
auth = tweepy.OAuthHandler(private.TWITTER_APP_KEY, private.TWITTER_APP_SECRET)
auth.set_access_token(private.TWITTER_KEY, private.TWITTER_SECRET)
# create an API object to pull data from Twitter, pass in the authentication from above
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
print('API Set-up!')

# connect to database
db = dataset.connect(private.CONNECTION_STRING)
print('Database connected; defining MyStreamListener')

API Set-up!
Database connected; defining MyStreamListener


In [3]:
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
    def __init__(self, api=None):
        super(MyStreamListener, self).__init__()
        self.num_tweets = 0

    def on_status(self, status):
        # we don't want retweets
        if (not status.retweeted) and ('RT @' not in status.text):
            
            description = status.user.description
            loc = status.user.location
            text = status.text
            name = status.user.screen_name
            user_created = status.user.created_at
            followers = status.user.followers_count
            id_str = status.id_str
            created = status.created_at
            retweets = status.retweet_count
            blob = TextBlob(text)
            sent = blob.sentiment
            
            # create table in database
            table = db[private.TABLE_NAME]
            # increment tweet by 1
            self.num_tweets += 1
            
            if self.num_tweets < 1000:
                try:
                    table.insert(dict(
                        user_description=description,
                        user_location=loc,
                        text=text,
                        user_name=name,
                        user_created=user_created,
                        user_followers=followers,
                        id_str=id_str,
                        created_at=created,
                        retweet_count=retweets,
                        polarity=sent.polarity,
                        subjectivity=sent.subjectivity
                    ))
                    return True
                except ProgrammingError as err:
                    print(err)
            else:
                return False

    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            return False

In [4]:
stream_listener = MyStreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(track=private.TRACK_TERMS)

In [5]:
from joetools import private
import datafreeze
from datafreeze import freeze
import tweepy
import dataset
from textblob import TextBlob

db = dataset.connect(private.CONNECTION_STRING)

result = db[private.TABLE_NAME].all()
freeze(result, format='csv', filename=private.CSV_NAME)

## _Check Out CSV_

In [6]:
import pandas as pd
pd.set_option('display.max_columns', None)

data = pd.read_csv('tweets.csv')

In [7]:
data.head()

Unnamed: 0,id,user_description,user_location,text,user_name,user_created,user_followers,id_str,created_at,retweet_count,polarity,subjectivity
0,1,63 year old voting for the first time...want t...,United States,@DanCrenshawTX @MeghanMcCain And what about al...,CrumDesi,2018-10-31T01:52:12,8,1185747516408643584,2019-10-20T02:40:12,0,0.0,0.0
1,2,"Bass Player/ Guitar, Singer,Football Fan Pro a...","crystal river, fl",@gtconway3d He really is..Trump Needs A Check ...,aspence5,2009-05-12T17:41:54,395,1185747516979105792,2019-10-20T02:40:12,0,0.25,0.2
2,3,Government & Politics,,"SCOTT, TRUMP NEEDS BE TOSSED HIS NAKED ASS, NOW!!",mark_sohlden,2018-03-24T21:45:51,17,1185747516979064832,2019-10-20T02:40:12,0,0.0,0.4
3,4,,,Probably a good idea,rubyrush29,2018-02-03T15:16:31,27,1185747517926858758,2019-10-20T02:40:13,0,0.7,0.6
4,5,Follow me through the Anthropocene,los angeles,@Fahrenthold @realDonaldTrump One cancellation...,audiblevideo,2008-04-06T10:52:31,511,1185747517952032774,2019-10-20T02:40:13,0,0.6,1.0


In [11]:
for tweet in data['text'][:5]:
    print(tweet)
    print('')

@DanCrenshawTX @MeghanMcCain And what about all tha BS Trump puts out everyday...they put this out and boy oh boy D… https://t.co/gsYqi3P4yA

@gtconway3d He really is..Trump Needs A Check up From the Neck up! #POTUS

SCOTT, TRUMP NEEDS BE TOSSED HIS NAKED ASS, NOW!!

Probably a good idea

@Fahrenthold @realDonaldTrump One cancellation of filling his own pockets doesn’t remedy the fact Trump’s bilked mi… https://t.co/eHIFe11fT4



In [8]:
data.describe()

Unnamed: 0,id,user_followers,id_str,retweet_count,polarity,subjectivity
count,999.0,999.0,999.0,999.0,999.0,999.0
mean,500.0,32620.77,1.185748e+18,0.0,0.043585,0.32223
std,288.530761,771070.6,127154900000.0,0.0,0.303054,0.350766
min,1.0,0.0,1.185748e+18,0.0,-1.0,0.0
25%,250.5,68.5,1.185748e+18,0.0,0.0,0.0
50%,500.0,419.0,1.185748e+18,0.0,0.0,0.2
75%,749.5,1703.0,1.185748e+18,0.0,0.125,0.6
max,999.0,24102450.0,1.185748e+18,0.0,1.0,1.0
