# Get Tweets

## Code References:
- Setting up keys, basic cursor logic and populating tweet data into pandas dataframe:
https://www.earthdatascience.org/courses/use-data-open-source-python/intro-to-apis/twitter-data-in-python/
- retweet and favourite counts, better dataframe creator using Tweepy - https://towardsdatascience.com/how-to-build-a-dataset-from-twitter-using-python-tweepy-861bdbc16fa5
- Getting user and location - https://stackoverflow.com/questions/50366489/how-to-get-twitter-users-screen-name-or-userid-from-a-specific-geolocations
- Cleaning tweet text and finding out if retweet - https://stackoverflow.com/questions/50052330/tweepy-check-if-a-tweet-is-a-retweet
- geocordinates - https://stackoverflow.com/questions/46044445/not-able-to-scrape-geo-coordinate-with-tweets-lat-lon
- avoiding twitter api rate limit - https://stackoverflow.com/questions/21308762/avoid-twitter-api-limitation-with-tweepy
- keeping authentication details secret - https://www.digitalocean.com/community/tutorials/how-to-create-a-twitterbot-with-python-3-and-the-tweepy-library

In [82]:
import os
import tweepy
import datetime
import pandas as pd

## 1. Get Twitter Data
We have two choices to loading twitter data:
- 1.1. use the Tweepy API (but this can take hours)
- 1.2. load the previously saved Twitter data 

### 1.1. Load data from Twitter 
#### 1.1.1 Twitter credentials file
I don't want to make my Twitter credentials public and so these are loaded from a credentials file and that file is not uploaded to github. 

To replicate this code, create a 'credentials.py' file with the following lines (using your own credential details):

`consumer_key = 'your_consumer_key'
consumer_secret = 'your_consumer_secret'
access_token = 'your_access_token'
access_token_secret = 'your_access_token_secret'`

In [None]:
from credentials import *

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

####  1.1.2 Set date parameters
Using search words, I want to get all tweets between today's date and a start date of July 1, 2019
- The start date is just before the day the Mayor made his speech and today's date is used so I can collect as many tweets as possible

In [None]:
date_from = datetime.date(2019, 7, 1)
date_from

#### 1.1.3 Get tweets using a cursor
- First we define our function to load tweets
- Create search terms to query Twitter - return results as a list of dictionary items
- Concatenate all returned results and then use this to create a pandas dataframe

##### 1.1.3.1 define get_tweets

In [None]:
import time

def get_tweets(search_words, my_api): 
    tic = time.perf_counter()
    tweets = tweepy.Cursor(my_api.search,
                       q=search_words,
                       lang="en",
                       since=date_from).items()
    
    output = []
    for tweet in tweets:
        tweet_id = tweet.id
        text = tweet.text
        tweet_date = tweet.created_at
        user_id_str = tweet.user.id_str
        screen_name = tweet.user.screen_name
        user_name = tweet.user.name
        user_id = api.get_user(user_id_str)
        user_location = user_id.location
        user_coordinates = tweet.coordinates
        favourite_count = tweet.favorite_count
        retweet_count = tweet.retweet_count
                
        line = {'tweet_id' : tweet_id,
                'tweet_date' : tweet_date,
                'tweeter_id' : user_id_str,
                'tweeter_user_name' : user_name,
                'tweeter_screen_name' : screen_name,
                'tweeter_location' : user_location,
                'tweeter_coordinates' : user_coordinates,
                'message_text' : text,
                'favourite_count' : favourite_count, 
                'retweet_count' : retweet_count}
        output.append(line)
        
        
    toc = time.perf_counter()
    time_taken = toc - tic
    
    print('Time taken to process search term : {} , was {:.2f}'.format(search_words, time_taken))
    
    return output

##### 1.1.3.2 create list of search terms and iteratively get tweets using these terms

In [None]:
search_terms = ["London AND knife AND crime",
                "Khan AND knife AND crime",
                "London AND violent AND crime",
                "youth AND violent AND crime",
                "youth AND knife AND crime",
                "london AND youthcrime",
                "#knifecrime AND #khan",
                "#knifecrime AND #london",
                "#violence AND #khan",
                "#london AND #unsafe",
                "sadiq AND khan"]

all_tweets = []

for search_term in search_terms:
    current_tweets = get_tweets(search_term, api)
    all_tweets.append(current_tweets)


#### 1.1.4 create all_tweets_df dataframe

In [None]:
all_tweets_df = pd.DataFrame(columns=['tweet_id', 
                                      'tweet_date', 
                                      'tweeter_id', 
                                      'tweeter_user_name', 
                                      'tweeter_screen_name', 
                                      'tweeter_location',
                                      'tweeter_coordinates',
                                      'message_text',
                                      'favourite_count',
                                      'retweet_count'])

for these_tweets in all_tweets:
    df_tweets = pd.DataFrame(these_tweets)
    all_tweets_df = all_tweets_df.append(df_tweets, ignore_index=True)
    
print(all_tweets_df.shape)
all_tweets_df.head()

all_tweets_df.to_csv('./DataSources/TwitterData/raw_tweets.csv', index=False)

### 1.2. Load previously saved Twitter data

In [83]:
all_tweets_df_new = pd.read_csv("./DataSources/TwitterData/raw_tweets.csv")
print(all_tweets_df_new.shape)
all_tweets_df_new.head()

(12893, 10)


Unnamed: 0,tweet_id,tweet_date,tweeter_id,tweeter_user_name,tweeter_screen_name,tweeter_location,tweeter_coordinates,message_text,favourite_count,retweet_count
0,1417823998335983616,2021-07-21 12:29:17,726740035,Cal Parrish,CalParrish,Stoney Stanton,,RT @standardnews: Youth services funding has f...,0,1
1,1417823360000708609,2021-07-21 12:26:45,950473173077823489,Constantin St Helen’ll do🇬🇧🏴󠁧󠁢󠁥󠁮󠁧󠁿🇾🇪,ConstantinStHe1,Amongst the voting masses objecting to valuati...,,RT @KhanMustGo: 🚨🚨BREAKING: Primary school boy...,0,10
2,1417823086225895425,2021-07-21 12:25:40,38142380,Evening Standard,standardnews,London,,Youth services funding has fallen 70 per cent ...,1,1
3,1417821512862683142,2021-07-21 12:19:25,1871162396,David Cohen,cohenstandard,London,,Our @standardnews Special Investigation reveal...,0,0
4,1417821022846439427,2021-07-21 12:17:28,1871162396,David Cohen,cohenstandard,London,,A 7-year-old took knife into school. When knif...,0,0
