# Get Tweets

This code has been heavily modified and came from the following tutorial: https://github.com/alod83/data-science/tree/master/DataCollection/Twitter from user https://github.com/alod83

This script extracts all the tweets with hashtag #HurricaneIan from the last four days and saves them into a .csv.gz file.

Firstly, we import the configuration file, called `config.py`, which is located in the same directory of this script.

In [2]:
from config import *
import tweepy
import datetime

ModuleNotFoundError: No module named 'config'

Then I'm going to set up my authorization and make an api object.

In [4]:
auth = tweepy.OAuth1UserHandler(api_key, api_secret, access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, retry_count=10, retry_delay=5, 
                 retry_errors=set([503]))

Now we setup dates. We need to setup today and the last four days.

In [5]:
today = datetime.date.today()
yesterday= today - datetime.timedelta(days=1)
last_four = today - datetime.timedelta(days = 4)
today, yesterday, last_four

(datetime.date(2022, 11, 13),
 datetime.date(2022, 11, 12),
 datetime.date(2022, 11, 9))

I'm going to make a list with all the returns of the tweepy.Cursor class and my arguments into that class will contain my query as "q", my day parameters, and make sure that I'm only grabbing tweets in their extended mode, and only ones in English. 

In [6]:
tweets_list = tweepy.Cursor(api.search_tweets, q= "#HurricaneIan since:" + str(last_four)+ " until:" + str(today),
                            tweet_mode='extended', lang='en').items()

Now I loop across the `tweets_list`, and, for each tweet, we extract the text, the creation date, the number of retweets and the favourite count. I store every tweet into a list, called `output`.

In [7]:
backoff_counter = 1
    
while True:
    
    try:
        output = []
        for tweet in tweets_list:
            text = tweet._json["full_text"]
            screen_name = tweet._json['user']['screen_name']
            user_description = tweet._json['user']['description']
            favourite_count = tweet.favorite_count
            retweet_count = tweet.retweet_count
            created_at = tweet.created_at
            replying_to = tweet.in_reply_to_screen_name
            media = tweet.entities.get('media', [])
            hashtags = tweet.entities['hashtags']
            urls = tweet.entities['urls']
            user_mentions = tweet.entities['user_mentions']
            is_quote = tweet.is_quote_status

            is_media = False
            if media != []:
                is_media = True

            line = {'text' : text, 
                    'screen_name' : screen_name,
                    'user_description' : user_description,
                    'favourite_count' : favourite_count, 
                    'retweet_count' : retweet_count, 
                    'created_at' : created_at,
                   'replying_to' : replying_to,
                   'media' : is_media,
                   'hashtags' : hashtags,
                   'urls' : urls,
                   'user_mentions' : user_mentions,
                   'is_quote' : is_quote}
            output.append(line)
        break
        
        
    except tweepy.TweepyException as e:
        print(e)
        sleep(60*backoff_counter)
        backoff_counter += 1
        continue

KeyboardInterrupt: 

ERROR NOTES: As is, this code is currently not working without errors. The good news is that it's functional and can collect quite a lot of tweets and store them in 'output' if it either gets interrupted or I stop the program. 'output' can then be saved for later in a CSV format.

Finally, we convert the `output` list to a `pandas DataFrame` and we store results.

In [8]:
output

[{'text': 'Food Truck Park Madness in South West Florida https://t.co/DZBJp2JoBn\n.\n.\n.\n#keywest #keywestflorida #florida #hurricaneian #swflstrong #fmbstrong #sanibelstrong #floridakeys #conchrepublic #travel #travelflorida #staycation',
  'screen_name': 'KeyWestExpress',
  'user_description': '#KeyWestExpress Ferry sails from #FortMyersBeach year round and #MarcoIsland seasonally. The fastest and most fun way to visit #KeyWest!',
  'favourite_count': 0,
  'retweet_count': 0,
  'created_at': datetime.datetime(2022, 11, 12, 23, 53, 59, tzinfo=datetime.timezone.utc),
  'replying_to': None,
  'media': False,
  'hashtags': [{'text': 'keywest', 'indices': [76, 84]},
   {'text': 'keywestflorida', 'indices': [85, 100]},
   {'text': 'florida', 'indices': [101, 109]},
   {'text': 'hurricaneian', 'indices': [110, 123]},
   {'text': 'swflstrong', 'indices': [124, 135]},
   {'text': 'fmbstrong', 'indices': [136, 146]},
   {'text': 'sanibelstrong', 'indices': [147, 161]},
   {'text': 'floridake

I'll be storing all these raw files in csv.gz format since there's a lot to be sorted through. 

In [9]:
import pandas as pd

df = pd.DataFrame(output)
df.to_csv('output_Oct_12_pm.csv.gz', mode='a', header=False, compression= 'gzip')

Taking a look at the shape of the csv I just saved. 

In [10]:
df.shape

(628, 12)

Taking a look at the most recent 10 tweet rows in the df

In [11]:
df.tail(10)

Unnamed: 0,text,screen_name,user_description,favourite_count,retweet_count,created_at,replying_to,media,hashtags,urls,user_mentions,is_quote
618,$IVDN: World’s Most Energy Efficient House Wra...,ZacSmithNEWS,Everything I post here is for information purp...,2,1,2022-11-10 19:38:05+00:00,,False,"[{'text': 'HouseWrap', 'indices': [134, 144]},...","[{'url': 'https://t.co/Mg8eLc6Zor', 'expanded_...",[],True
619,As #FEMA continues supporting the State of Flo...,TwittnRob,🤔 What if? #BeReady 👉 https://t.co/wGJ9OmAHSg ...,0,0,2022-11-10 19:35:07+00:00,,True,"[{'text': 'FEMA', 'indices': [3, 8]}, {'text':...",[],[],False
620,RT @StevePetyerak: Patio of Pirates Cove buckl...,mary0611bb,Definitely NOT a Trumper but I am a Republican...,0,116,2022-11-10 19:34:43+00:00,,False,"[{'text': 'Nicole', 'indices': [84, 91]}]",[],"[{'screen_name': 'StevePetyerak', 'name': 'Ste...",False
621,RT @BillMooreORL: #Hurricane #History repeatin...,SE_CaffreySmith,"fiction writer, Bardian, reader, believer, adm...",0,81,2022-11-10 19:31:50+00:00,,False,"[{'text': 'Hurricane', 'indices': [18, 28]}, {...",[],"[{'screen_name': 'BillMooreORL', 'name': 'Bill...",False
622,"#ThankfulThursday 🍁 🍂\nWell, AmeriLife of Sara...",AmeriLifeofSB,"Providing Life, Health, and Annuity Solutions ...",1,0,2022-11-10 19:25:00+00:00,,True,"[{'text': 'ThankfulThursday', 'indices': [0, 1...",[],[],False
623,MOMENTS AGO: FPL: “All of our customers who we...,CBS12,"Covering Palm Beach, Martin, St. Lucie, Indian...",3,0,2022-11-10 19:24:43+00:00,,True,"[{'text': 'HurricaneIan', 'indices': [62, 75]}...","[{'url': 'https://t.co/3H6XOEWx3i', 'expanded_...",[],False
624,RT @BillMooreORL: #Hurricane #History repeatin...,MizHernandez01,"Es facil hallar amigos, muy difícil conservar,...",0,81,2022-11-10 19:21:04+00:00,,False,"[{'text': 'Hurricane', 'indices': [18, 28]}, {...",[],"[{'screen_name': 'BillMooreORL', 'name': 'Bill...",False
625,RT @WxAndMovies: That’s actually scary how bas...,GibbyDierickx,Amateur meteorologist | High school golfer | 1...,0,2,2022-11-10 19:20:37+00:00,,False,"[{'text': 'HurricanNicole', 'indices': [114, 1...",[],"[{'screen_name': 'WxAndMovies', 'name': 'Every...",True
626,"Hello Everyone,\n1/20) Late breaking #news.......",Find_and_Bind1,"Amateur journalist, photographer, #bondage ent...",0,0,2022-11-10 19:19:44+00:00,,False,"[{'text': 'news', 'indices': [36, 41]}, {'text...","[{'url': 'https://t.co/DHnQbmTvAI', 'expanded_...",[],False
627,RT @StevePetyerak: Patio of Pirates Cove buckl...,spike3401,"31 years old, proud father, proudly taken, ama...",0,116,2022-11-10 19:18:47+00:00,,False,"[{'text': 'Nicole', 'indices': [84, 91]}]",[],"[{'screen_name': 'StevePetyerak', 'name': 'Ste...",False
