# "Live" Tweets from Stream Tweepy API

Preliminary steps before using Twitter's API:
1. Sign-up for a twitter account
2. Register a twitter developer account (requires email or phone number)
3. Create a developer app (I went with the name BlockedRoads)
4. Obtain your 'Access token' and 'access token secret' in the developer dashboard

## Part 1: Pull Live Tweets from Twitter using Tweepy and Streaming API - StreamListener

Some resources:
http://socialmedia-class.org/twittertutorial.html
<br>https://www.dataquest.io/blog/streaming-data-python/


**Step 1: In terminal run:** `! pip install tweepy`

**Step 2: Load imports**

In [6]:
import pandas as pd
import tweepy
import json
import datetime
from tweepy.streaming import StreamListener
import shutil

**Step 2: Authenticate Twitter Credentials through Tweepy**

In [7]:
import config

In [8]:
# authenticate account with tweepy
auth = tweepy.OAuthHandler(config.consumer_key, config.consumer_secret)
auth.set_access_token(config.access_key, config.access_secret)

# create API object to pull data from twitter - and pass in code!
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# now we should be free to make twitter api calls!

**Step 3: Verify API is working with your account**

In [9]:
# check if program is working - output is your twitter name
user = api.me()
print (user.name)

amy taylor


In [5]:
# To send a Tweet from your account (assuming you allowed your app to have read/write permissions):
# api.update_status(status = "Hey, I'm tweeting with Tweepy!")

**Step 4: Instantiate the StreamListener class and start pulling tweets**
- Specify how tweets are downloaded inside the `on_status` or `on_data` class method
- `stream.filter` starts the download. Specify your search words with the `track` parameter
- Other parameters available include: .....
- **Use EX 2 for now**. To stop the live stream, kernel interrupt the codeblock
- Continue to Part 2 to save the tweets into a json file

EX 1. StreamListener: prints the stream of texts directly in the codeblock

EX 2. StreamListener VERIFIED: saves tweets in a json file

In [10]:
# verified code block, DON'T DELETE!!!!!!!!!!!!!!!!!!!!!!!

class StreamListener(tweepy.StreamListener):            
    def on_data(self, status):   
        try:
            with open('../data/AT_stream/scrape.json', 'a') as f:
                f.write(status)
        except:
            print("Could not open file log")
        
         
    def on_error(self, status_code):
        if status_code == 420:
            return False

        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
# starts the stream
stream.filter(track=[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"])

KeyboardInterrupt: 

EX 3. StreamListener EXPERIMENTAL: testing out different params such as:
- extended tweets
- geolocated tweets

In [23]:
# EX 4: 
class StreamListener(tweepy.StreamListener):  
    
    
    def on_data(self, status):   
        try:
            with open('../data/AT_stream/scrape.json', 'w') as f:
                for tweet in status.items():
                    if status.place is not None:
                        f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n')
                        tweetCount += 1
                print("Downloaded {0} tweets".format(tweetCount))
                
        except:
            print("Could not open file log")
        
         
    def on_error(self, status_code):
        if status_code == 420:
            return False

        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
# starts the stream
stream.filter(track=[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"])

Could not open file log


KeyboardInterrupt: 

## Part 2: Save Tweet to a JSON file (from any of the StreamListener options)

In [11]:
def file_conversion():
    #create a timestamp
    now = datetime.datetime.now()
    month = '0'+str(now.month)
    day = str(now.day)
    hour = str(now.hour)
    minute = str(now.minute)
    now_str = str(now.year)+month+day+'_'+hour+'_'+minute
    
    
    #replace the name of our file with a new timestamped filename
    dest = '../data/AT_stream/scrape_' + now_str + ".json"
    shutil.move('../data/AT_stream/scrape.json', dest)
    
    with open(dest, "r") as f:
        status = f.readlines()
        jsons = []
        for ind in status:
            jsons.append(json.loads(ind))
    return jsons

In [12]:
file_conversion()

[{'contributors': None,
  'coordinates': None,
  'created_at': 'Wed Jan 16 20:14:10 +0000 2019',
  'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
  'favorite_count': 0,
  'favorited': False,
  'filter_level': 'low',
  'geo': None,
  'id': 1085631317751066626,
  'id_str': '1085631317751066626',
  'in_reply_to_screen_name': None,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'is_quote_status': True,
  'lang': 'en',
  'place': None,
  'quote_count': 0,
  'quoted_status': {'contributors': None,
   'coordinates': None,
   'created_at': 'Wed Jan 16 20:03:10 +0000 2019',
   'entities': {'hashtags': [],
    'symbols': [],
    'urls': [{'display_url': 'twitter.com/i/web/status/1…',
      'expanded_url': 'https://twitter.com/i/web/status/1085628549325864960',
      'indices': [116, 139],
      'url': 'https://t.co/eySlFQdCO8'}],
    'user_mentions': []},
   'extended_tweet':

track1 =[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"]



|Stream| params| json file| # of tweets |search terms|
|---| --- | --- | --- | ---|
|verified| tweet_mode='extended'| ./tweepy_scrape_20190114_23_22.json | 2| track1|
|experimental| none?| ./tweepy_scrape_20190114_23_33.json | 3| track1|

In [13]:
# first group of tweets

# available in my folder only
# json_df = pd.read_("./tweepy_scrape_20190111_11_5713.json")
# json_df = pd.read_json("./tweepy_scrape_20190114_23_22.json", lines = True)
# json_df = pd.read_json("./tweepy_scrape_20190114_23_33.json", lines = True)

# available in data folder
json_df = pd.read_json("../data/AT_stream/scrape_20190116_12_15.json", lines = True)


# "extended tweets"




## Part 3: Examine Tweets

In [14]:
print(json_df.shape)
json_df.head()

(4, 32)


Unnamed: 0,contributors,coordinates,created_at,entities,favorite_count,favorited,filter_level,geo,id,id_str,...,quoted_status_permalink,reply_count,retweet_count,retweeted,retweeted_status,source,text,timestamp_ms,truncated,user
0,,,2019-01-16 20:14:10,"{'urls': [], 'user_mentions': [], 'symbols': [...",0,False,low,,1085631317751066626,1085631317751066624,...,"{'display': 'twitter.com/kil889/status/…', 'ex...",0,0,False,,"<a href=""http://twitter.com/download/iphone"" r...",Just block anybody who willingly listens to St...,2019-01-16 20:14:10.863,False,{'name': 'Tragedy Khadafi verse on Strange Fru...
1,,,2019-01-16 20:14:13,"{'urls': [], 'user_mentions': [{'name': 'Black...",0,False,low,,1085631330384334848,1085631330384334848,...,"{'display': 'twitter.com/rhyemswturtle/…', 'ex...",0,0,False,{'entities': {'urls': [{'display_url': 'twitte...,"<a href=""http://twitter.com/download/iphone"" r...","RT @BlkNrdProblems: Beloved, that's called str...",2019-01-16 20:14:13.875,False,"{'name': 'lmao y'all wild', 'screen_name': 'An..."
2,,,2019-01-16 20:14:15,"{'urls': [], 'user_mentions': [], 'symbols': [...",0,False,low,,1085631338693214208,1085631338693214208,...,,0,0,False,,"<a href=""https://ifttt.com"" rel=""nofollow"">IFT...",TFL UPDATE: TfLTrafficNews: London Road SM4 is...,2019-01-16 20:14:15.856,False,"{'name': 'Bob', 'screen_name': 'thelondonbob01..."
3,,,2019-01-16 20:14:51,"{'urls': [], 'user_mentions': [{'name': 'Highw...",0,False,low,,1085631487955869697,1085631487955869696,...,,0,0,False,"{'quote_count': 0, 'id': 1085601354284961797, ...","<a href=""http://twitter.com/download/android"" ...",RT @HighwaysNWEST: All traffic caught within t...,2019-01-16 20:14:51.443,False,"{'name': 'Cumbria charity truck fair', 'screen..."


In [15]:
tweets_data = []
notParsed = []
tweets_file = open("../data/AT_stream/scrape_20190116_12_15.json","r")
for line in tweets_file:    
    if line.strip():    
        try:
            tweet=json.loads(line)
            tweets_data.append(tweet)
        except:
            notParsed.append(line)
            continue
print(len(tweets_data))
print('Could not parse: ', len(notParsed))

4
Could not parse:  0


In [19]:
tweet_cols = ['coordinates', 'created_at', 
#                'full_text',
              'text','geo', 'id', 'place', 'user']


sample_tweets_dict = [{col:tweet[col] for col in tweet_cols } for tweet in tweets_data]

# To flatten all nested dictionaries
# DOES NOT FLATTEN LISTS
# Look for any instances of nested lists during cleaning
sample_tweets_df = pd.io.json.json_normalize(sample_tweets_dict)
sample_tweets_df

Unnamed: 0,coordinates,created_at,geo,id,place,text,user.contributors_enabled,user.created_at,user.default_profile,user.default_profile_image,...,user.profile_text_color,user.profile_use_background_image,user.protected,user.screen_name,user.statuses_count,user.time_zone,user.translator_type,user.url,user.utc_offset,user.verified
0,,Wed Jan 16 20:14:10 +0000 2019,,1085631317751066626,,Just block anybody who willingly listens to St...,False,Sat Apr 03 01:38:23 +0000 2010,False,False,...,333333,True,False,MiqueOnTheMix,138078,,none,http://Soundcloud.com/miqueonthemix,,False
1,,Wed Jan 16 20:14:13 +0000 2019,,1085631330384334848,,"RT @BlkNrdProblems: Beloved, that's called str...",False,Sat Oct 24 15:46:11 +0000 2015,False,False,...,0,False,False,AnAcidTweeter,1833,,none,,,False
2,,Wed Jan 16 20:14:15 +0000 2019,,1085631338693214208,,TFL UPDATE: TfLTrafficNews: London Road SM4 is...,False,Mon Jul 04 20:11:00 +0000 2016,True,False,...,333333,True,False,thelondonbob01,104997,,none,,,False
3,,Wed Jan 16 20:14:51 +0000 2019,,1085631487955869697,,RT @HighwaysNWEST: All traffic caught within t...,False,Sat Jul 23 20:06:00 +0000 2016,True,False,...,333333,True,False,Cctf2,3572,,none,https://www.facebook.com/CumbriaCharityTruckFair/,,False


In [14]:
json_df.columns

Index(['contributors', 'coordinates', 'created_at', 'entities',
       'favorite_count', 'favorited', 'filter_level', 'geo', 'id', 'id_str',
       'in_reply_to_screen_name', 'in_reply_to_status_id',
       'in_reply_to_status_id_str', 'in_reply_to_user_id',
       'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'place',
       'quote_count', 'reply_count', 'retweet_count', 'retweeted',
       'retweeted_status', 'source', 'text', 'timestamp_ms', 'truncated',
       'user'],
      dtype='object')

In [15]:
# json_df.loc[:, 'lang':]

In [5]:
list = json_df.loc[:, 'text']
# print(list)
for i in range(len(list)):
    print(i, list[i])
    print("--------")

0 RT @mandy1714: @hertscc is this the same incident that has closed A41 North and Southbound. 3 hours of Utter chaos. No diversion in place b…
--------
