# Getting the Tweets

In this notebook I use the Tweets class to get the tweets using the Tweepy package by their id. These texts will be
either fully processed by the ```preProcess``` or not. Any non english tweets will be translated using the google cloud api.

In [1]:
from ProcessOrNot.Twitter import Tweets
import pandas as pd
from tqdm.notebook import tqdm
from decouple import config

# the .env file must be placed in the root of the project
tweets = Tweets(config('TWITTER_API_KEY'),
               config('TWITTER_API_SECRET'),
               config('TWITTER_ACCESS_TOKEN_KEY'),
               config('TWITTER_ACCESS_TOKEN_SECRET'),
               '../service_account.json')
# the language set collected at this stage
languages = {
                1: 'en',
                2: 'es',
                3: 'fr',
                4: 'de',
            }

# Method

In the below loop the 3 files in the Data/FilteredTwitterIDs are loaded. The feature we are interested in is the id feature.
We know for sure that for every file there are 1000 tweets for each language.

The next step is to use the ```addTweets``` function to get the tweets by their id. Since the ```statuses_lookup``` function
only takes a maximum of a 100 ids at a time, the ```ids``` list of ids is sliced into 10, 100 sized lists using the
```((j-1)*1000)+(k-1)*100:((j-1)*1000)+k*100``` slice.

Due to the nature of the Tweets class, a temporary file needs to be saved before translation as this would change the
data inside the object.

After the not processed data is translated and saved, the temp file (not translated raw tweet data) is re loaded,
pre-processed, translated and saved.

In [2]:
for i in tqdm([0,1,2]):
    df = pd.read_csv('Data/FilteredTwitterIDs/' + str(i) + '.csv')
    ids = df.tweet_id.values.tolist()
    for j in tqdm(languages):
        for k in range(1,11):
            tweets.addTweets(ids[((j-1)*1000)+(k-1)*100:((j-1)*1000)+k*100])
        tweets.saveJSON('Data/Text/temp')
        if j != 1:
            tweets.translate(language=languages[j])
        tweets.saveJSON('Data/Text/NotProcessed/' + str(i) + str(j))
        tweets.reset()
        tweets.loadJSON('Data/Text/temp')
        tweets.preProcess(languages[j])
        if j != 1:
            tweets.translate(language=languages[j])
        tweets.saveJSON('Data/Text/Processed/' + str(i) + str(j))
        tweets.reset()



  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]