# Streaming Twitter Activity With A Bot

## Introduction

Oftentimes when working with Twitter data, it's of interest to simply listen to all activity in a given time. Luckily, as Twitter is a data company, they make this fairly easy with an Application Programming Interface (API). Through their API, we can automatically capture tweets as they happen for free (with limitations such as how many tweets you can capture, and how far back in time you can grab data from). Of course, if you're willing to pay, you can capture _significantly_ more data from Twitter. But for us, all we need is a free account to get going. 

## Using This Streamer
Unfortunately, in its current state, you cannot use this streamer to capture twitter data. You need to create credentials to pass into the API. In order to do that, go to this [link](https://developer.twitter.com/en/apply-for-access) create an account and follow the instructions. Once you get your API keys, you can fill in the `YOUR CREDENTIALS HERE` section of the code below. It is however, not required for this workshop as we've already captured the data for you. 

## Data We Captured

We set up our Twitter bot to capture all tweets that appeared from within a box around the approximate GPS coordinates of Alberta. Every time someone made a public tweet, we captured and saved this tweet to a text file. We captured certain information about the user who tweeted, any GPS coordinates of the location of the tweet if the user had provided them, as well as some additional meta data. Using the data we captured using this streamer, we will guide you through some text mining and analysis using Python. 

In [None]:
import json
import pandas as pd
import time
from datetime import datetime
#from swift_upload import upload_file

try : 
    import tweepy
    from tweepy import Stream
    from tweepy import OAuthHandler
    from tweepy.streaming import StreamListener
except ImportError:
    !pip install tweepy --user
    from importlib import reload
    import site
    reload(site)
    import tweepy
    from tweepy import Stream
    from tweepy import OAuthHandler
    from tweepy.streaming import StreamListener

tweepy_oauth1 = 
tweepy_oauth2 = 

auth = tweepy.OAuthHandler(tweepy_oauth1, 
                           tweepy_oauth2)

tweepy_token1 = 
tweepy_token2 = 

auth.set_access_token(tweepy_token1, 
                      tweepy_token2)
# Waiting on rate limit is important so Twitter doesn't deactivate our free bot. 
api = tweepy.API(auth, wait_on_rate_limit=True)


In [None]:
### Test if the api is working
user = api.me()
print (user.name)

In [None]:
class StdOutListener(StreamListener):
  
    def __init__(self, filename):
        self.filename = filename
        auth = tweepy.OAuthHandler(tweepy_oauth1, 
                           tweepy_oauth2)
        auth.set_access_token(tweepy_token1, 
                      tweepy_token2)
        self.api = tweepy.API(auth, wait_on_rate_limit=True)
        self.time_start = datetime.now().minute
        self.already_upload = False

    def on_status(self, status):
        try:
            jsonData = status._json
            tweetID = jsonData.get("id_str")
            tweetData = self.api.get_status(tweetID)

            #check if tweet is valid (not a retweet)
            if ( (hasattr(tweetData, 'retweeted_status')) ):
                pass
            else:
                with open(self.filename, 'a') as f:
                    f.write(str(jsonData))
                    print(jsonData['text'])
                    #print(str(jsonData))
                    print('\n')
                    f.write('\n')

                # For backing up data 
                if (str(datetime.now().day) == "1") and (self.already_upload == False):
                    upload_file("alberta_twitter_data", self.filename, self.filename)
                    self.filename = str(datetime.now().date())+ "_start.txt"
                    self.already_upload = True
                    send_email("saving.txt")
                if (str(datetime.now().day) == "2"):
                    self.already_upload = False

        except (tweepy.error.RateLimitError):
            print("rate limiting?, waiting for one minute")
            time.sleep(60)

        except Exception as e:
            print("something went wrong")
            print(e)
            pass


    def on_error(self, status):
        #error number 503, servers down
        #print('Error #:', status)
        pass

def send_email(file):
    with open(file) as fp:
        msg = EmailMessage()
        msg.set_content(fp.read())
    msg['Subject'] = "Your Tweepy Bot at %s" % datetime.now().strftime("%d/%m/%Y %H:%M:%S")
    msg['to'] = you
    s = smtplib.SMTP('smtp.gmail.com', 587)
    s.ehlo()
    s.starttls()
    s.ehlo()
    msg['From'] = me
    s.login(me, password)
    s.send_message(msg)
    s.quit()



# USE THIS TO READ THE DATA
# data = []
# with open("testtweets.txt") as inputData:
#     for line in inputData:
#         json_acceptable_string = line.rstrip('\n').replace("'", "\"")
#         print(json_acceptable_string)
#         data.append(json_acceptable_string)
        
        
        
# json.dumps(data)        

In [None]:
alberta = [-121.000000, 48.000000,-109.000000, 61.000000]

filename = str(datetime.now().date())+ "_start.txt"
twitterStream = Stream(auth, 
                StdOutListener(filename), 
                tweet_mode='extended')
oh_no = 0 
start = time.time()
time_since_failure = time.time()

while True:
    try:
        # failed too many times, something funky is up
       # send_email("started_email.txt")
        twitterStream.filter(languages=["en"], locations=alberta)

    except Exception as e:
        # if something broke, let's wait an hour
        print(e)
       # send_email("broke_email.txt")
        time.sleep(3600)