# "Live" Tweets from Stream Tweepy API

Preliminary steps before using Twitter's API:
1. Sign-up for a twitter account
2. Register a twitter developer account (requires email or phone number)
3. Create a developer app (I went with the name BlockedRoads)
4. Obtain your 'Access token' and 'access token secret' in the developer dashboard

## Part 1: Pull Live Tweets from Twitter using Tweepy and Streaming API - StreamListener

Some resources:
http://socialmedia-class.org/twittertutorial.html
<br>https://www.dataquest.io/blog/streaming-data-python/


**Step 1: In terminal run:** `! pip install tweepy`

**Step 2: Load imports**

In [2]:
import pandas as pd
import tweepy
import json
import datetime
from tweepy.streaming import StreamListener
import shutil

**Step 2: Authenticate Twitter Credentials through Tweepy**

In [24]:
import config

In [25]:
# authenticate account with tweepy
auth = tweepy.OAuthHandler(config.consumer_key, config.consumer_secret)
auth.set_access_token(config.access_key, config.access_secret)

# create API object to pull data from twitter - and pass in code!
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# now we should be free to make twitter api calls!

**Step 3: Verify API is working with your account**

In [26]:
# check if program is working - output is your twitter name
user = api.me()
print (user.name)

amy taylor


In [5]:
# To send a Tweet from your account (assuming you allowed your app to have read/write permissions):
# api.update_status(status = "Hey, I'm tweeting with Tweepy!")

**Step 4: Instantiate the StreamListener class and start pulling tweets**
- Specify how tweets are downloaded inside the `on_status` or `on_data` class method
- `stream.filter` starts the download. Specify your search words with the `track` parameter
- Other parameters available include: .....
- **Use EX 2 for now**. To stop the live stream, kernel interrupt the codeblock
- Continue to Part 2 to save the tweets into a json file

EX 1. StreamListener: prints the stream of texts directly in the codeblock

In [6]:
# this block prints the stream of texts directly in the codeblock
class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)
        
    def on_error(self, status_code):
        if status_code == 420:
            return False
        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
stream.filter(track=["road blocked", "road blocking", "road closed","street closed", "roadclosed","detoured", "detour", 
                    "road inaccessible"])

RT @SFFDPIO: AT 10:32 HRS, 2" GAS MAIN BREAK FROM CONSTRUCTION  IN MIDDLE OF STREET IFO 1570 BURKE AV CROSS OF 3RD ST/DEAD END THE 1500 BLO…
Road construction, right lane closed in #Dekalb on I 20 EB after Panola Rd #ATLTraffic https://t.co/bABElTW6T2


KeyboardInterrupt: 

In [27]:
status.item()

NameError: name 'status' is not defined

EX 2. StreamListener VERIFIED: saves tweets in a json file

In [7]:
# verified code block, DON'T DELETE!!!!!!!!!!!!!!!!!!!!!!!

class StreamListener(tweepy.StreamListener):            
    def on_data(self, status):   
        try:
            with open('../data/AT_stream/scrape.json', 'w') as f:
                f.write(status)
        except:
            print("Could not open file log")
        
         
    def on_error(self, status_code):
        if status_code == 420:
            return False

        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
# starts the stream
stream.filter(track=[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"])

KeyboardInterrupt: 

EX 3. StreamListener EXPERIMENTAL: testing out different params such as:
- extended tweets
- geolocated tweets

In [18]:
# EXPERIMENTAL CODE BLOCK - trying to print out the full tweet
class StreamListener(tweepy.StreamListener): 
    def on_data(self, status):
        if hasattr(status, 'retweeted_status'):
            try:
                status = status.extended_tweet['full_text']
            except:
                pass

        else:
            try:
                status = status.extended_tweet['full_text']
            except:
                pass
                
        try:
            with open('../data/AT_stream/scrape.json', 'a') as f:
                f.write(status)
        except:
            print("Could not open file log")
        
                
#     def on_data(self, status):   
#         try:
#             with open('scrape.json', 'a') as f:
#                 f.write(status)
#         except:
#             print("Could not open file log")
        
         
    def on_error(self, status_code):
        if status_code == 420:
            return False

        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
# starts the stream
stream.filter(track=[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"])

KeyboardInterrupt: 

In [23]:
# EX 4: 
class StreamListener(tweepy.StreamListener):  
    
    
    def on_data(self, status):   
        try:
            with open('../data/AT_stream/scrape.json', 'w') as f:
                for tweet in status.items():
                    if status.place is not None:
                        f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n')
                        tweetCount += 1
                print("Downloaded {0} tweets".format(tweetCount))
                
        except:
            print("Could not open file log")
        
         
    def on_error(self, status_code):
        if status_code == 420:
            return False

        
stream_listener = StreamListener()
stream = tweepy.Stream(auth=auth, listener= stream_listener)
# starts the stream
stream.filter(track=[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"])

Could not open file log


KeyboardInterrupt: 

## Part 2: Save Tweet to a JSON file (from any of the StreamListener options)

In [8]:
def file_conversion():
    #create a timestamp
    now = datetime.datetime.now()
    month = '0'+str(now.month)
    day = str(now.day)
    hour = str(now.hour)
    minute = str(now.minute)
    now_str = str(now.year)+month+day+'_'+hour+'_'+minute
    
    
    #replace the name of our file with a new timestamped filename
    dest = '../data/AT_stream/scrape_' + now_str + ".json"
    shutil.move('../data/AT_stream/scrape.json', dest)
    
    with open(dest, "r") as f:
        status = f.readlines()
        jsons = []
        for ind in status:
            jsons.append(json.loads(ind))
    return jsons

In [9]:
file_conversion()

[{'created_at': 'Tue Jan 15 19:27:16 +0000 2019',
  'id': 1085257126912708609,
  'id_str': '1085257126912708609',
  'text': 'RT @mandy1714: @hertscc is this the same incident that has closed A41 North and Southbound. 3 hours of Utter chaos. No diversion in place b…',
  'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
  'truncated': False,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 2542722057,
   'id_str': '2542722057',
   'name': "Mandhi's Ghandi's",
   'screen_name': 'mandy1714',
   'location': None,
   'url': None,
   'description': 'insane in the membrane',
   'translator_type': 'none',
   'protected': False,
   'verified': False,
   'followers_count': 43,
   'friends_count': 223,
   'listed_count': 1,
   'favourites_count': 2301,
   'statuses_count': 2684,
   'created_at': 'Mon May 12 08:57:34

track1 =[ "road closed", "roads closed", "road blocked", "street blocked", "roadclosed"]



|Stream| params| json file| # of tweets |search terms|
|---| --- | --- | --- | ---|
|verified| tweet_mode='extended'| ./tweepy_scrape_20190114_23_22.json | 2| track1|
|experimental| none?| ./tweepy_scrape_20190114_23_33.json | 3| track1|

In [12]:
# first group of tweets

# available in my folder only
# json_df = pd.read_("./tweepy_scrape_20190111_11_5713.json")
# json_df = pd.read_json("./tweepy_scrape_20190114_23_22.json", lines = True)
# json_df = pd.read_json("./tweepy_scrape_20190114_23_33.json", lines = True)

# available in data folder
json_df = pd.read_json("../data/AT_stream/scrape_20190115_11_28.json", lines = True)


# "extended tweets"




## Part 3: Examine Tweets

In [13]:
print(json_df.shape)
json_df.head()

(1, 28)


Unnamed: 0,contributors,coordinates,created_at,entities,favorite_count,favorited,filter_level,geo,id,id_str,...,quote_count,reply_count,retweet_count,retweeted,retweeted_status,source,text,timestamp_ms,truncated,user
0,,,2019-01-15 19:27:16,"{'hashtags': [], 'urls': [], 'user_mentions': ...",0,False,low,,1085257126912708609,1085257126912708608,...,0,0,0,False,{'created_at': 'Tue Jan 15 19:15:38 +0000 2019...,"<a href=""http://twitter.com/download/iphone"" r...",RT @mandy1714: @hertscc is this the same incid...,2019-01-15 19:27:16.815,False,"{'id': 2542722057, 'id_str': '2542722057', 'na..."


In [14]:
json_df.columns

Index(['contributors', 'coordinates', 'created_at', 'entities',
       'favorite_count', 'favorited', 'filter_level', 'geo', 'id', 'id_str',
       'in_reply_to_screen_name', 'in_reply_to_status_id',
       'in_reply_to_status_id_str', 'in_reply_to_user_id',
       'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'place',
       'quote_count', 'reply_count', 'retweet_count', 'retweeted',
       'retweeted_status', 'source', 'text', 'timestamp_ms', 'truncated',
       'user'],
      dtype='object')

In [15]:
# json_df.loc[:, 'lang':]

In [16]:
list = json_df.loc[:, 'text']
# print(list)
for i in range(len(list)):
    print(i, list[i])
    print("--------")

0 RT @mandy1714: @hertscc is this the same incident that has closed A41 North and Southbound. 3 hours of Utter chaos. No diversion in place b…
--------
