## Get tweets
Twitter API: https://developer.twitter.com/

Python-Twitter: https://python-twitter.readthedocs.io/en/latest/

Need a Twitter account and get its credential keys.

Install: pip install python-twitter

In [29]:
# Establish the twitter.Api with an account Credentials
import pandas
import twitter

df=pandas.read_csv('Twitter API Credentials.csv') #hide my credentials in this file
api = twitter.Api(consumer_key=df.consumer_key[0],
                    consumer_secret=df.consumer_secret[0],
                    access_token_key=df.access_token_key[0],
                    access_token_secret=df.access_token_secret[0])

**How to use twitter.Api**

https://python-twitter.readthedocs.io/en/latest/twitter.html

twitter.Api.GetSearch(term=None, raw_query=None, geocode=None, since_id=None, max_id=None, until=None, since=None, count=15, lang=None, locale=None, result_type='mixed', include_entities=None, return_json=False)

***Note:***
- **Cannot** pull more than 100 tweets at a time
- **Cannot** pull tweets older than 7 days

In [128]:
# Example of using twitter.Api to search - using normal parameters
api.GetSearch(
              term='AMZN', 
              raw_query=None, 
              geocode=None, 
              since_id=None, 
              max_id=None, 
              until='2021-05-02', 
              since=None, 
              count=3, 
              lang='en', 
              locale=None, 
              result_type='recent', 
              include_entities=None, 
              return_json=False
)

[Status(ID=1388644383797485578, ScreenName=thundervolt888, Created=Sat May 01 23:59:55 +0000 2021, Text='RT @Wario64: Animal Crossing: New Horizons - Timmy &amp; Tommy - Nintendo Switch Lite Skin is $4.16 on Amazon https://t.co/JFLW9YusCw #ad https…'),
 Status(ID=1388644383361220610, ScreenName=Orgetorix, Created=Sat May 01 23:59:55 +0000 2021, Text='Check out this book: "A Very Dangerous Woman: The Lives, Loves and Lies of Russia\'s Most Seductive Spy" by Deborah… https://t.co/5yPbXKxTcX'),
 Status(ID=1388644383101161478, ScreenName=LaShaWright1, Created=Sat May 01 23:59:55 +0000 2021, Text='@EloualiSabrine Infinite Wisdom \nhttps://t.co/adBjjZivsw\nInfinite Wisdom  II\nhttps://t.co/LqJt0RmXZe\nThe Fight With… https://t.co/IbeTCCfaKp')]

**Using only raw_query to search:** 

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets

https://twitter.com/search-advanced

In [129]:
# Example of a search using just raw_query (everything after the “?” in the URL):
api.GetSearch(raw_query='q=AMZN%20&until=2021-05-02&lang=en&count=3')

[Status(ID=1388644383797485578, ScreenName=thundervolt888, Created=Sat May 01 23:59:55 +0000 2021, Text='RT @Wario64: Animal Crossing: New Horizons - Timmy &amp; Tommy - Nintendo Switch Lite Skin is $4.16 on Amazon https://t.co/JFLW9YusCw #ad https…'),
 Status(ID=1388644383361220610, ScreenName=Orgetorix, Created=Sat May 01 23:59:55 +0000 2021, Text='Check out this book: "A Very Dangerous Woman: The Lives, Loves and Lies of Russia\'s Most Seductive Spy" by Deborah… https://t.co/5yPbXKxTcX'),
 Status(ID=1388644383101161478, ScreenName=LaShaWright1, Created=Sat May 01 23:59:55 +0000 2021, Text='@EloualiSabrine Infinite Wisdom \nhttps://t.co/adBjjZivsw\nInfinite Wisdom  II\nhttps://t.co/LqJt0RmXZe\nThe Fight With… https://t.co/IbeTCCfaKp')]

**Write tweets to csv**

*With these functions, specify (term,num_loads) to let it run overtime, or (term,until) to search the last 100 tweets of that day*

In [173]:
def tweets_to_df(term=None, num_loads=1, until=None, lang=None,geocode=None):
    import time
    df = pandas.DataFrame(columns=['ID','Created','Text'] )

    for i in range(num_loads):
        print("loading data round=", i+1)
        results = api.GetSearch(
                                term=term,
                                until=until,
                                count=100,
                                lang=lang,
                                geocode=geocode
                                )
        new_df = pandas.DataFrame({'ID':[results[i].id for i in range(len(results))],
                                'Created':[results[i].created_at for i in range(len(results))],
                                'Text':[results[i].text for i in range(len(results))]})
        df = df.append(new_df, ignore_index=True)
        print("Total records = ", len(df))
        time.sleep(10)
    return df

def tweets_to_csv(term=None, num_loads=1, until=None, lang=None,geocode=None):
    import datetime
    
    fileName='1.pulledTweets-'+term+'-created at '+str(datetime.datetime.now())+'.csv'
    df=tweets_to_df(term=term, num_loads=num_loads, until=until, lang=lang,geocode=geocode)
    df.to_csv('../data/'+fileName)
    print('Saved in "'+fileName+'"')

*For now I dont see the different betweet each load, maybe the time.sleep is too short*

In [175]:
tweets_to_csv(term='AMZN',num_loads=2)

loading data round= 1
Total records =  100
loading data round= 2
Total records =  200
Saved in "1.pulledTweets-AMZN-created at 2021-05-04 02:33:06.073846.csv"


**Write tweets to json**

In [170]:
def tweets_to_jsonString(term=None, num_loads=1, until=None, lang=None,geocode=None):
    import json

    jsonString=''
    
    for i in range(num_loads):
        print("loading data round=", i+1)
        results = api.GetSearch(
                                term=term,
                                until=until,
                                count=100,
                                lang=lang,
                                geocode=geocode,
                                return_json=True
                                )
        
        items=results.get('statuses')
        for i in items:
            jsonLine = json.dumps(i)+'\n'
            jsonString+=jsonLine
        time.sleep(10)
        
    return jsonString

def tweets_to_json(term=None, num_loads=1, until=None, lang=None,geocode=None):
    import datetime

    fileName='1.pulledTweets-'+term+'-created at '+str(datetime.datetime.now())+'.json'
    jsonFile = open('../data/'+fileName, "w")
    jsonString= tweets_to_jsonString(term=term, num_loads=num_loads, until=until, lang=lang,geocode=geocode)
    jsonFile.write(jsonString)
    jsonFile.close()
    print('Saved in "'+fileName+'"')

In [172]:
tweets_to_json(term='AMZN',num_loads=2)

loading data round= 1
loading data round= 2
Saved in "1.pulledTweets-AMZN-created at 2021-05-04 02:21:00.052347.json"



In [177]:
# Getting draft data for later notebooks, pulling 7 days x 100 tweets, json
day1= tweets_to_jsonString(term='AMZN', until='2021-05-04')
day2= tweets_to_jsonString(term='AMZN', until='2021-05-03')
day3= tweets_to_jsonString(term='AMZN', until='2021-05-02')
day4= tweets_to_jsonString(term='AMZN', until='2021-05-01')
day5= tweets_to_jsonString(term='AMZN', until='2021-04-30')
day6= tweets_to_jsonString(term='AMZN', until='2021-04-29')
day7= tweets_to_jsonString(term='AMZN', until='2021-04-28')
jsonString= day1+day2+day3+day4+day5+day6+day7

jsonFile = open('../data/1.pulledTweets.json', "w")
jsonFile.write(jsonString)
jsonFile.close()

loading data round= 1
loading data round= 1
loading data round= 1
loading data round= 1
loading data round= 1
loading data round= 1
loading data round= 1
