# Getting the Tweets

In this notebook I use the Tweets class to get the tweets using the Tweepy package by their id. These texts will be
either fully processed by the ```preProcess``` or not. Any non english tweets will be translated using the google cloud api.

In [1]:
from Tweets import Tweets
import pandas as pd
from tqdm.notebook import tqdm
from decouple import config
import os

tweets = Tweets(   config('TWITTER_API_KEY'),
                   config('TWITTER_API_SECRET'),
                   config('TWITTER_ACCESS_TOKEN_KEY'),
                   config('TWITTER_ACCESS_TOKEN_SECRET'),
                   'service_account.json')

languages = {
                1: 'en',
                2: 'es',
                3: 'fr',
                4: 'de',
                5: 'nl',
                6: 'it',
            }

months = ['December', 'January', 'February', 'March', 'April', 'May']

# Method

In the below loop the 3 files in the Data/FilteredTwitterIDs are loaded. The feature we are interested in is the id feature.
We know for sure that for every file there are 1000 tweets for each language.

The next step is to use the ```addTweets``` function to get the tweets by their id. Since the ```statuses_lookup``` function
only takes a maximum of a 100 ids at a time, the ```ids``` list of ids is sliced into 10, 100 sized lists using the
```((j-1)*1000)+(k-1)*100:((j-1)*1000)+k*100``` slice.

As the data will be pre-processed there is no need to store a copy of the un processed tweet data.

Each set of tweets is preprocessed, translated and saved for future use.

In [2]:
for month in tqdm(months):
    for day in tqdm([0, 1, 2, 3, 4]):
        df = pd.read_json('Data/FilteredTwitterIDs/' + str(month) + str(day) + '.json')
        ids = df.tweet_id.values.tolist()
        for j in languages:
            if os.path.isfile('Data/Text/' + str(month) + str(day) + languages[j] + '.json'):
                continue
            for k in range(1,11):
                tweets.addTweets(ids[((j-1)*1000)+(k-1)*100:((j-1)*1000)+k*100])
            tweets.preProcess(languages[j])
            if j != 1:
                tweets.translate(language=languages[j])
            tweets.saveJSON('Data/Text/' + str(month) + str(day) + languages[j])
            tweets.reset()



  0%|          | 0/6 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]