# Streaming Twitter Activity With A Bot

## Introduction

Oftentimes when working with Twitter data, it's of interest to simply listen to all activity in a given time. Luckily, as Twitter is a data company, they make this fairly easy with an Application Programing Interface (API). Through their API, we can automatically capture tweets as they happen for free (with limitations such as how many tweets you can capture, and how far back in time you can grab data from). Of course, if you're willing to pay, you can capture _significantly_ more data from Twitter. But for us all we need is a free account to get going. 

## Using This Streamer
Unfortunately, in it's current state, you cannot use this streamer to capture twitter data. You need to create credentials to pass into the API. In order to do that, go to this [link](https://developer.twitter.com/en/apply-for-access) create an account and follow the instructions. Once you get your API keys, you can fill in the `YOUR CREDENTIALS HERE` section of the code below. It is however, not required for this workshop as we've already captured the data for you. 

## Data We Captured

We set up our Twitter bot to capture all tweets from November 3rd to November 10th that appeared from within a box around the approximate GPS coordinates of Calgary. Every time someone made a public tweet, we captured and saved this tweet to a text filed called `tweets.txt`. We captured certain information about the user who tweeted, any GPS coordinates of the location of the tweet if the user had provided them, as well as some additional meta data. Using the data we captured using this streamer, we will guide you through some text mining and analysis using Python. 

In [None]:
import tweepy
import json
import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time

auth = tweepy.OAuthHandler('YOUR CREDENTIALS HERE', 
                           'YOUR CREDENTIALS HERE')
auth.set_access_token('YOUR CREDENTIALS HERE', 
                      'YOUR CREDENTIALS HERE')

# Waiting on rate limit is important so Twitter doesn't deactivate our free bot. 

api = tweepy.API(auth, wait_on_rate_limit=True)

class StdOutListener(StreamListener):
    def on_status(self, status):
        try:
            jsonData = status._json
            tweetID = jsonData.get("id_str")
            tweetData = api.get_status(tweetID)

            #check if tweet is valid (not a retweet)
            if ( (hasattr(tweetData, 'retweeted_status')) ):
                pass
            else:
                try:
                    # Saving the data to a dictionary for convenience 
                    d = dict(coordinates= jsonData['coordinates'],
                            text=jsonData['text'],
                            created_at=jsonData['created_at'],
                            user_mentions= jsonData['entities']['user_mentions'],
                            hashtags= jsonData['entities']['hashtags'],
                            lang = jsonData['lang'],
                            name=jsonData['user']['name'],
                            screen_name=jsonData['user']['screen_name'],
                            user_location=jsonData['user']['location'],
                            extended_tweet=status._json['extended_tweet']['full_text'])
                # if tweet is too short, extended_tweet is not a property apparently. 
                except KeyError:
                     d = dict(coordinates= jsonData['coordinates'],
                            text=jsonData['text'],
                            created_at=jsonData['created_at'],
                            user_mentions= jsonData['entities']['user_mentions'],
                            hashtags= jsonData['entities']['hashtags'],
                            lang = jsonData['lang'],
                            name=jsonData['user']['name'],
                            screen_name=jsonData['user']['screen_name'],
                            user_location=jsonData['user']['location'],
                            extended_tweet=jsonData['text'])

                
                   
                print('='*60)
                print("saving new tweet, text:")
                print(d['extended_tweet'])
                print()
                # Open our file, and add a new tweet to it
                with open('tweets.txt','a') as f:
                    f.write(str(d))
                    f.write('\n')
                    
                

        except (tweepy.error.RateLimitError):
            print("rate limiting?, waiting for one minute")
            time.sleep(60)


    def on_error(self, status):
        #error number 503, servers down
        #print('Error #:', status)
        pass

In [None]:
# Createa GPS rectangle to listen for tweets on . 
calgary = [-114.2445060672,50.8842538358,-113.8726833982,51.1844055291]
twitterStream = Stream(auth, StdOutListener(),tweet_mode='extended')
twitterStream.filter(languages=["en"], locations=calgary)