# Capstone Project
Diane Kierce  
General Assembly  
DSI Seattle 02


### #covfefe: Finding Meaning in Usage
 
#### Data Source:
On June 13-14, 2017 I used twitterscraper (https://github.com/taspinar/twitterscraper) to download all available tweets that matched the search term "covfefe" through Twitter's search function. The initial data set contained 531,980 tweets. EDA on that data set indicated that the text of the tweets was so messy as to not be useful, so I changed tactics and focused instead on the hashtags. (See Appendix A for the EDA on the tweet text.)

Since twitterscraper does not scrape the hashtags separately from the text and does not scrape information about the user's number of followers or the number of favorites and retweets for each tweet, I then used Twitter's API to mine the available tweets with the hashtag "#covfefe." The main tradeoff with this approach was that I was limited to the available time period of nine days whereas with twitterscraper I was able to go back to the first usage of "covfefe" on May 31, 2017. Using the API on June 17, I was able to mine 59,345 tweets returned by Twitter's search function for the search term "#covfefe" dated from June 8, 2017 through June 16, 2017.

#### In this notebook:
This notebook contains the code I used to mine the #covfefe tweets using the Twitter API with the Python Twitter wrapper.

In [1]:
import twitter, datetime
import pandas as pd
import time

# Diane's keys:
twitter_keys = {
    'consumer_key':        'redacted',
    'consumer_secret':     'redacted',
    'access_token_key':    'redacted',
    'access_token_secret': 'redacted'
}


# initializing the API

api = twitter.Api(consumer_key = twitter_keys['consumer_key'],
                  consumer_secret = twitter_keys['consumer_secret'],
                  access_token_key = twitter_keys['access_token_key'],
                  access_token_secret = twitter_keys['access_token_secret'],
                  sleep_on_rate_limit=True)
        # sleep_on_rate_limit=True pauses the requests to avoid going over the rate limit (per 15 minute window)

In [2]:
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

In [3]:
class TweetMiner(object):

    result_limit    =   20    
    api             =   False
    data            =   []
    
    # Maximum result_limit is 100
    def __init__(self, keys_dict, api, result_limit = 20):
        
        self.api = api
        self.twitter_keys = keys_dict
        
        self.result_limit = result_limit
        
# defining a function to mine search data

    # Maximum for max_pages is 180 per 15 minute window when using the App API
    def mine_search_term(self, term=None, raw_query=None, mine_retweets=False, include_entities=True, max_pages=5):

        data           =  []
        last_tweet_id  =  False
        page           =  1
        
        while page <= max_pages:
            
            if last_tweet_id:
                statuses   =   self.api.GetSearch(term=term, raw_query=raw_query, count=self.result_limit, 
                                                  max_id=last_tweet_id - 1, include_entities=include_entities)        
            else:
                statuses   =   self.api.GetSearch(term=term, raw_query=raw_query, count=self.result_limit,
                                                  include_entities=include_entities)
                
            for item in statuses:

                mined = {
                    'tweet_id':              item.id,
                    'user_screen_name':      item.user.screen_name,
                    'user_handle':           item.user.name,
                    'user_id':               item.user.id,
                    'user_location':         item.user.location,
                    'user_followers_count':  item.user.followers_count,
                    'user_friends_count':    item.user.friends_count,
                    'user_lang':             item.user.lang,
                    'retweet_count':         item.retweet_count,
                    'text':                  item.text,
                    'mined_at':              datetime.datetime.now(),
                    'created_at':            item.created_at,
                    'media':                 item.media,
                    'hashtags':              item.hashtags,
                    'urls':                  item.urls,
                    'favorite_count':        item.favorite_count,
                    'posting_method':        item.source,
                    
                }
                
                last_tweet_id = item.id
                data.append(mined)
                
            page += 1
            
        return data
    

In [4]:
# instantiating a TweetMiner
# max number of results per request is 500

miner = TweetMiner(twitter_keys, api, result_limit=500)

miner

<__main__.TweetMiner at 0x1097de610>

In [6]:
tweets01 = miner.mine_search_term(term='#covfefe since:2017-05-30 until:2017-06-17', 
                                  raw_query=None, mine_retweets=False, max_pages=1000, 
                                  include_entities=True)

In [16]:
df = pd.DataFrame(tweets01)

df.head()

Unnamed: 0,created_at,favorite_count,hashtags,media,mined_at,posting_method,retweet_count,text,tweet_id,urls,user_followers_count,user_friends_count,user_handle,user_id,user_lang,user_location,user_screen_name
0,Fri Jun 16 13:13:32 +0000 2017,1336,"[{""text"": ""trump""}, {""text"": ""maga""}, {""text"":...",,2017-06-17 22:45:08.039302,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",899,AMERICA AGREES! Gohmert: There Is No Collusion...,875702885572595712,"[{""expanded_url"": ""http://waynedupree.com/gohm...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
1,Fri Jun 16 16:53:35 +0000 2017,389,"[{""text"": ""covfefe""}, {""text"": ""trump""}]",,2017-06-17 22:45:08.039313,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",271,JUST FIRE MUELLER! Deputy AG Just Dropped Warn...,875758262959960068,"[{""expanded_url"": ""http://waynedupree.com/depu...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
2,Fri Jun 16 16:25:06 +0000 2017,497,"[{""text"": ""AmericaFirst""}, {""text"": ""covfefe""}...",,2017-06-17 22:45:08.039318,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",281,#AmericaFirst - Obama's Anchor Baby Program (D...,875751095414521858,"[{""expanded_url"": ""http://waynedupree.com/trum...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
3,Fri Jun 16 23:59:46 +0000 2017,0,"[{""text"": ""AmericaFirst""}, {""text"": ""covfefe""}...",,2017-06-17 22:45:08.039322,"<a href=""http://twitter.com/download/iphone"" r...",281,RT @WayneDupreeShow: #AmericaFirst - Obama's A...,875865514626646016,"[{""expanded_url"": ""http://waynedupree.com/trum...",3268,4999,Crystal kemp,2481854702,en,"Pennsylvania, USA",ckemp1542400
4,Fri Jun 16 23:59:17 +0000 2017,0,"[{""text"": ""CNNisISIS""}, {""text"": ""AlBaghdadi""}...","[{""display_url"": ""pic.twitter.com/U5Wg4g1Ird"",...",2017-06-17 22:45:08.039326,"<a href=""http://twitter.com/download/iphone"" r...",4,RT @politstrip: 'Caliphate News Network' mourn...,875865396045389825,[],110,85,Elizabeth Wright,3707893403,en,,ErwrightWright


In [17]:
df.shape

(59345, 17)

In [9]:
# Saving to .csv so that I can work with this data set without having to re-mine the tweets
df.to_csv('covfefe_hashtag_api_tweets.csv', sep=',', encoding='utf-8')

In [11]:
tweets01

[{'created_at': u'Fri Jun 16 13:13:32 +0000 2017',
  'favorite_count': 1336,
  'hashtags': [Hashtag(Text=u'trump'),
   Hashtag(Text=u'maga'),
   Hashtag(Text=u'covfefe')],
  'media': None,
  'mined_at': datetime.datetime(2017, 6, 17, 22, 45, 8, 39302),
  'posting_method': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>',
  'retweet_count': 899,
  'text': u'AMERICA AGREES! Gohmert: There Is No Collusion With Trump And Russia; Drop Special Counsel https://t.co/Z6UaT3cVHm #trump #maga #covfefe',
  'tweet_id': 875702885572595712,
  'urls': [URL(URL=https://t.co/Z6UaT3cVHm, ExpandedURL=http://waynedupree.com/gohmert-drop-special-counsel/)],
  'user_followers_count': 192275,
  'user_friends_count': 9517,
  'user_handle': u'\U0001f399Wayne Dupree',
  'user_id': 282695161,
  'user_lang': u'en',
  'user_location': u'USAF Desert Storm/Shield Vet',
  'user_screen_name': u'WayneDupreeShow'},
 {'created_at': u'Fri Jun 16 16:53:35 +0000 2017',
  'favorite_count': 389,
  'hashtag

In [13]:
# Also saving to .json since the encoding for to_csv has been causing some problems with creating extra rows when I
# read in the .csv version
df.to_json('covfefe_hashtag_tweets_json.json')

In [18]:
col_names = df.columns
col_names

Index([u'created_at', u'favorite_count', u'hashtags', u'media', u'mined_at',
       u'posting_method', u'retweet_count', u'text', u'tweet_id', u'urls',
       u'user_followers_count', u'user_friends_count', u'user_handle',
       u'user_id', u'user_lang', u'user_location', u'user_screen_name'],
      dtype='object')

In [19]:
df.head()

Unnamed: 0,created_at,favorite_count,hashtags,media,mined_at,posting_method,retweet_count,text,tweet_id,urls,user_followers_count,user_friends_count,user_handle,user_id,user_lang,user_location,user_screen_name
0,Fri Jun 16 13:13:32 +0000 2017,1336,"[{""text"": ""trump""}, {""text"": ""maga""}, {""text"":...",,2017-06-17 22:45:08.039302,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",899,AMERICA AGREES! Gohmert: There Is No Collusion...,875702885572595712,"[{""expanded_url"": ""http://waynedupree.com/gohm...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
1,Fri Jun 16 16:53:35 +0000 2017,389,"[{""text"": ""covfefe""}, {""text"": ""trump""}]",,2017-06-17 22:45:08.039313,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",271,JUST FIRE MUELLER! Deputy AG Just Dropped Warn...,875758262959960068,"[{""expanded_url"": ""http://waynedupree.com/depu...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
2,Fri Jun 16 16:25:06 +0000 2017,497,"[{""text"": ""AmericaFirst""}, {""text"": ""covfefe""}...",,2017-06-17 22:45:08.039318,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",281,#AmericaFirst - Obama's Anchor Baby Program (D...,875751095414521858,"[{""expanded_url"": ""http://waynedupree.com/trum...",192275,9517,🎙Wayne Dupree,282695161,en,USAF Desert Storm/Shield Vet,WayneDupreeShow
3,Fri Jun 16 23:59:46 +0000 2017,0,"[{""text"": ""AmericaFirst""}, {""text"": ""covfefe""}...",,2017-06-17 22:45:08.039322,"<a href=""http://twitter.com/download/iphone"" r...",281,RT @WayneDupreeShow: #AmericaFirst - Obama's A...,875865514626646016,"[{""expanded_url"": ""http://waynedupree.com/trum...",3268,4999,Crystal kemp,2481854702,en,"Pennsylvania, USA",ckemp1542400
4,Fri Jun 16 23:59:17 +0000 2017,0,"[{""text"": ""CNNisISIS""}, {""text"": ""AlBaghdadi""}...","[{""display_url"": ""pic.twitter.com/U5Wg4g1Ird"",...",2017-06-17 22:45:08.039326,"<a href=""http://twitter.com/download/iphone"" r...",4,RT @politstrip: 'Caliphate News Network' mourn...,875865396045389825,[],110,85,Elizabeth Wright,3707893403,en,,ErwrightWright


In [20]:
df.tail()

Unnamed: 0,created_at,favorite_count,hashtags,media,mined_at,posting_method,retweet_count,text,tweet_id,urls,user_followers_count,user_friends_count,user_handle,user_id,user_lang,user_location,user_screen_name
59340,Thu Jun 08 16:31:03 +0000 2017,0,"[{""text"": ""covfefe""}]",,2017-06-17 23:45:37.287394,"<a href=""http://www.hootsuite.com"" rel=""nofoll...",0,"Comey: ""I'm not gonna sit here and try to inte...",872853491018072064,[],653,1643,Melissa Green,14062950,en,"Oakland, CA",ProfCritic
59341,Thu Jun 08 16:30:18 +0000 2017,0,"[{""text"": ""ComeyDay""}, {""text"": ""Covfefe""}, {""...",,2017-06-17 23:45:37.287399,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",0,@KellyannePolls @POTUS #ComeyDay You are stu...,872853302492397568,[],437,612,Ruthann L. Chapin,700981141390671872,en,"Ohio, USA",RetroVintage4Me
59342,Thu Jun 08 16:29:07 +0000 2017,0,"[{""text"": ""Covfefe""}, {""text"": ""ComeyDay""}]",,2017-06-17 23:45:37.287404,"<a href=""http://twitter.com/#!/download/ipad"" ...",0,If the #Covfefe shits wear it. #ComeyDay https...,872853003442819073,"[{""expanded_url"": ""https://twitter.com/ac360/s...",526,546,corliss jean cogan,1053485478,en,The banks of Medicine Creek,dogmomdogma1
59343,Thu Jun 08 16:27:44 +0000 2017,0,"[{""text"": ""Leakers""}, {""text"": ""Loser""}, {""tex...",,2017-06-17 23:45:37.287408,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",0,#Leakers #Loser #MAGA #Covfefe #ComeyDay https...,872852656787771392,"[{""expanded_url"": ""https://twitter.com/sassyca...",5360,5197,🇺🇸 ggg217,579254908,en,Pennsylvania,ggg217
59344,Thu Jun 08 16:22:10 +0000 2017,0,"[{""text"": ""trustthisguy""}, {""text"": ""covfefe""}...",,2017-06-17 23:45:37.287413,"<a href=""http://twitter.com/download/android"" ...",0,@AskLyft I trust your statement as much as I #...,872851254019579906,"[{""expanded_url"": ""https://twitter.com/i/web/s...",1,2,Jesu Berkeley,870563485536616448,en,,yourmamanfav
