# Twitter API Streaming

To stream the data and save it to json files, I'll use tweepy, as adapted from [Brian Spiering's code](https://github.com/brianspiering/fun_with_twitter/blob/master/tweepy_writer/demo_tweepy_writer.ipynb).

In [73]:
reset -fs

In [1]:
my_consumer_key = '[]'
my_consumer_secret = '[]'
my_access_token = '[]'
my_access_token_secret = '[]'

In [None]:
import json
import os
import sys

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

auth = OAuthHandler(my_consumer_key, my_consumer_secret)
auth.set_access_token(my_access_token, my_access_token_secret)

class WriteToDiskListener(StreamListener):
    """Write stream listener to disk with limited number of Tweets.
    """

    def __init__(self, filename, limit=5):
        self.counter = 0
        self.filename = filename
        self.limit = limit
        
    def on_data(self, data):
        "If under limit, write received data to disk."
        while self.counter < self.limit:
            try:
                with open(self.filename.lower()+'.json', 'a') as f:
                    f.write(data)
                self.counter += 1
                return True
            except BaseException as e:
                print("Error on_data: {}".format(e))
            return True
        else:
            return False
 
    def on_error(self, status):
        print(status)

if __name__ == '__main__':    
    track = sys.argv[1:] # Track is a list of search terms to stream.
    filename = "_".join([item.lower() for item in track])

    # # Remove existing file of tweets
    # try:
    #     os.remove(filename+'.json')
    # except OSError:
    #     pass

    listener = WriteToDiskListener(filename=filename, 
                                    limit=5)
    stream = Stream(auth, listener)

    try:
        stream.filter(track=track,
                      languages=['en'])
    except:
        stream.disconnect()

In [4]:
! python3 twitter_streaming.py train

In [5]:
! python3 twitter_streaming.py oakland

In [1]:
import json
from pprint import pprint

In [26]:
data = []

with open('oakland.json') as data_file:
    for line in data_file:
        tweet = json.loads(line)
        data.append(tweet)

In [27]:
# I've printed 5 tweets to the file, so let's check that:
print(len(data))

5


In [28]:
# The data for each tweet
data[0].keys()

dict_keys(['id_str', 'truncated', 'timestamp_ms', 'contributors', 'is_quote_status', 'favorited', 'in_reply_to_status_id', 'place', 'retweeted_status', 'user', 'in_reply_to_status_id_str', 'in_reply_to_user_id_str', 'in_reply_to_user_id', 'extended_entities', 'in_reply_to_screen_name', 'lang', 'retweet_count', 'entities', 'retweeted', 'created_at', 'geo', 'coordinates', 'possibly_sensitive', 'id', 'source', 'text', 'filter_level', 'favorite_count'])

The data that will be useful to us is the Tweet's text, as well as the time it was created, and possibly the location if that info is available (although that may depend on an individual user's settings).

In [29]:
for tweet in data:
    pprint(tweet['text'])
    pprint(tweet['created_at'])
    print()

('RT @SouthLoneStar: Oakland PD Front Door.\n'
 '#Blacklivesmatter are violent thugs. All PD should buy more robots to fight '
 'them! https://t.co/SD…')
'Mon Jul 11 00:35:44 +0000 2016'

'Bro you trippen Benihana fire  https://t.co/rjVNt5sYQW'
'Mon Jul 11 00:35:52 +0000 2016'

('RT @hautedamn: "Peaceful" protestors shutting down a highway in Oakland, '
 'California. https://t.co/sxBesgFCkE')
'Mon Jul 11 00:35:56 +0000 2016'

('250 W GOLDEN GATE, DETROIT, MI 48203 -Metro Detroit Real Estate, Oakland '
 'County Properties, Macomb County Properties…https://t.co/6cNrbSQhy3')
'Mon Jul 11 00:35:58 +0000 2016'

('RT @Mahi_The_Giant: So taco trucks are trash in the bay? 🤔🤔🤔 '
 'https://t.co/IIej5PKcPf')
'Mon Jul 11 00:35:59 +0000 2016'



Now let's try to hone our filter so that when we set it loose on our helpless Tweeters, we'll actually get helpful tweets.

In [30]:
! python3 twitter_streaming.py bart train

In [31]:
data = []

with open('bart_train.json') as data_file:
    for line in data_file:
        tweet = json.loads(line)
        data.append(tweet)

In [32]:
for tweet in data:
    pprint(tweet['text'])
    pprint(tweet['created_at'])
    print()

('RT @nzherald: Syrian refugees sang and danced in the aisles of a train on an '
 "'awesome' trip across the Otago countryside https://t.co/4c2X0…")
'Mon Jul 11 01:03:46 +0000 2016'

('@BarackObama @HillaryClinton @realDonaldTrump \n'
 'TRUMP TRAIN FIRED UP FULL STEAM AHEAD.\n'
 'GOOD BYE EXECUTIVE ORDERS!!!! https://t.co/nmF5QvRA9g')
'Mon Jul 11 01:03:47 +0000 2016'

'RT @GymGoers: Train together, stay together. ✌🏼️ https://t.co/i8dtHh2KGl'
'Mon Jul 11 01:03:47 +0000 2016'

('RT @kateefeldman: A guy just told his girlfriend to hold the train while he '
 'caught a Pokémon and she got on and left without him.')
'Mon Jul 11 01:03:48 +0000 2016'

'Yay awesome! https://t.co/mOY7on49XC'
'Mon Jul 11 01:03:50 +0000 2016'



This code enters keywords as either 'bart' or 'train', so naturally most of the tweets have to do with trains other than BART.

In [83]:
# After specifying the query in twitter_streaming.py
! python3 twitter_streaming.py

In [84]:
data = []

with open('bay area.json') as data_file:
    for line in data_file:
        tweet = json.loads(line)
        data.append(tweet)

In [85]:
for tweet in data:
    pprint(tweet['text'])
    pprint(tweet['created_at'])
    print()

("@jeaaabooty @_theyloveashhh they don't exist in Wackramento, getchoself a "
 'bay area nigga!!!! 12/10 highly recommended')
'Mon Jul 11 02:40:10 +0000 2016'

('RT @DarwinBondGraha: #Oakland protest has gained numbers. #BlackLivesMatter '
 'https://t.co/NCP4YKfGWG')
'Mon Jul 11 02:40:19 +0000 2016'

('Watching #MadBum crush this game, just like sonjachrista crushed this ice '
 'cream sandwich. #SF… https://t.co/Or2lF8woX5')
'Mon Jul 11 02:42:14 +0000 2016'

("I'm in San Francisco and I was super bummed I wasn't at the ball yard for "
 "Bumgardner's no no but... that's over now. ⚾️")
'Mon Jul 11 02:42:15 +0000 2016'

"Frisco's Ryan Cordell hit his 16th HR. @RidersBaseball @Rangers"
'Mon Jul 11 02:42:15 +0000 2016'

'I hate the San Francisco Giants'
'Mon Jul 11 02:42:21 +0000 2016'

('RT @DarenEpley: Devils win Pastime 17 U World Series championship 5-1 over '
 "South Oakland A's!!!  Fairfield with the win. Skibba the save.")
'Mon Jul 11 02:42:21 +0000 2016'



Collecting tweets instead about all of the Bay Area in general, we can later filter for Tweets that might be relevant to BART passengers. Now let's open this up to consume more than 10 tweets! I'm gonna go for 5,000...

__Checking what's in 'bay area.json':__

In [2]:
data = []

with open('/Users/brynstark/Stark gU/bay area.json') as data_file:
    for line in data_file:
        tweet = json.loads(line)
        data.append(tweet)

In [3]:
for tweet in data:
    pprint(tweet['text'])
    pprint(tweet['created_at'])
    print()

'bay area by my lonely. miss my boo 😞'
'Mon Jul 11 02:59:11 +0000 2016'

('New from bay area! jenni jiggles striptease aimed 2please chico escorts '
 '#Chico #escorts #adult #xxx https://t.co/RdYfGL9VhS')
'Mon Jul 11 02:59:14 +0000 2016'

('RT @prisonculture: Here are some good and useful suggestions. '
 'https://t.co/s3XMDxVDE6')
'Mon Jul 11 02:59:14 +0000 2016'

('#News: Darvish good after rehab start, could rejoin Rangers soon: FRISCO, '
 'Texas (AP) — Right-hander Yu Da... https://t.co/RbO1FbqYrX #TU')
'Mon Jul 11 02:59:15 +0000 2016'

('@Wink_Marvel @ESPNSteinLine pg:goran, sg: https://t.co/xps4oIbqSR, sf: '
 'winslow, pf: bosh/reed/ud, c:whiteside plus cap space dummy')
'Mon Jul 11 02:59:21 +0000 2016'

('RT @KianJcUpdates: Kian arriving at the venue in Richmond! '
 'https://t.co/iA1ZzwL9KR')
'Mon Jul 11 02:59:21 +0000 2016'

('RT @A_S_Alexander: So many ways to show up! 26 Ways to Be in the Struggle '
 'Beyond the Streets https://t.co/As2g4NMM3M #blacklivesmatter #mov…')
'Mon Ju

In [4]:
len(data)

479