Notebook for collecting and saving tweets to csv for future use.

#### Goal:

Is to gather a "random" sampling for Biden and Bernie in the 2020 election cycle for sentiment analysis and other things. 

We will collect 5,000 number of tweets for each day, sorted by relevancy. 

Things we need to do:
- function to increment day after 5,000 is hit
- func that'll make the requests (and wait 1 second between each request)

In [1]:
%load_ext dotenv
%dotenv

In [29]:
import os
import requests
import json
import datetime
import time
from textblob import TextBlob

In [14]:
bearer_token = os.environ.get("BEARER_TOKEN")

search_url = "https://api.twitter.com/2/tweets/search/all"

def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2FullArchiveSearchPython"
    return r

In [25]:
def increment_day(date):
    if not isinstance(date, datetime.date):
        raise Error
    return date + datetime.timedelta(days=1)

def get_query_params(start_time, next_page = None):
    params = {}
    params['query'] = 'bernie sanders OR #feelthebern OR #bernie2016 lang:en'
    params['tweet.fields'] = 'created_at'
    params['sort_order'] = 'relevancy'
    params['start_time'] = start_time.isoformat()
    params['end_time'] = increment_day(start_time).isoformat()
    params['max_results'] = 500
    if next_page is not None:
        params['next_token'] = next_page
    
    return params

def get_data(params):
    
    res = requests.get(search_url, params=params, auth=bearer_oauth)
    
    return res
    

In [30]:
### Bernie Sanders

def get_bernie_tweets(start_date, end_date, data = [], next_page = None): 

    while start_date <= end_date:
        ## now we do each day
        print(start_date.isoformat())
        for i in range(0,10):
            # make 10 requests
            params = get_query_params(start_date, next_page)
            res = get_data(params)
            
            if res.status_code != 200:
                print(res.status_code)
                print(res.json())
                return data, next_page, start_date
            
            data += res.json().get('data')
            next_page = res.json().get('meta').get('next_token')
            time.sleep(1)
        
        ## clean up variables, increment the day
        next_page = None
        start_date = increment_day(start_date)
    
    return data, next_page, start_date
                
            
            
        
    

In [31]:
start_date = datetime.datetime(2015, 1, 1).astimezone()
end_date = datetime.datetime(2016, 11, 30).astimezone()

res = get_bernie_tweets(start_date, end_date)

2015-01-01T00:00:00+01:00
2015-01-02T00:00:00+01:00
2015-01-03T00:00:00+01:00
2015-01-04T00:00:00+01:00
2015-01-05T00:00:00+01:00
2015-01-06T00:00:00+01:00
2015-01-07T00:00:00+01:00
2015-01-08T00:00:00+01:00
2015-01-09T00:00:00+01:00
2015-01-10T00:00:00+01:00
2015-01-11T00:00:00+01:00
2015-01-12T00:00:00+01:00
2015-01-13T00:00:00+01:00
2015-01-14T00:00:00+01:00
2015-01-15T00:00:00+01:00
2015-01-16T00:00:00+01:00
2015-01-17T00:00:00+01:00
2015-01-18T00:00:00+01:00
2015-01-19T00:00:00+01:00
2015-01-20T00:00:00+01:00
2015-01-21T00:00:00+01:00
2015-01-22T00:00:00+01:00
2015-01-23T00:00:00+01:00
2015-01-24T00:00:00+01:00
2015-01-25T00:00:00+01:00
2015-01-26T00:00:00+01:00
2015-01-27T00:00:00+01:00
2015-01-28T00:00:00+01:00
2015-01-29T00:00:00+01:00
2015-01-30T00:00:00+01:00
2015-01-31T00:00:00+01:00
2015-02-01T00:00:00+01:00
2015-02-02T00:00:00+01:00
2015-02-03T00:00:00+01:00
2015-02-04T00:00:00+01:00
2015-02-05T00:00:00+01:00
2015-02-06T00:00:00+01:00
2015-02-07T00:00:00+01:00
2015-02-08T0

In [33]:
data, next_page, start_date = res


{'created_at': '2015-01-26T18:27:42.000Z',
 'id': '559779729029726208',
 'text': 'Bernie Sanders Unveils A 12 Point Economic Plan To Break The Koch Oligarchs http://t.co/AMBSIgwuk9 via @politicususa'}

In [37]:
start_date.isoformat()

'2015-02-20T00:00:00+01:00'

In [48]:
import csv

keys = data[0].keys()

with open('bern2015-02-20.csv', 'w', newline="") as output:
    dict_writer = csv.DictWriter(output, keys)
    dict_writer.writeheader()
    dict_writer.writerows(data)

We reached our 15 min limit and are tired....

For tomorrow: start_date = 2015-02-20.

After that data is collected, read in the csv from today and then concatenate the two and re save to csv. 
Doesn't make sense? Ask Casey later.

1
2
3
4
5


7