# Sentiment Analyses using VADER

Using the processed data saved from the Get Tweets notebook, here the VADER sentiment analyzer is used to get the compound
sentiment analyses for each tweet. This compound score is added to the Tweet json files and are stored in the Data/Analyzed
folder.

The mean sentiment is also stored as a json file in the Data folder.

In [2]:
import pandas as pd
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
import nltk
from tqdm.notebook import tqdm
from collections import defaultdict
import json

nltk.download('vader_lexicon')

languages = {
                1: 'en',
                2: 'es',
                3: 'fr',
                4: 'de',
                5: 'nl',
                6: 'it',
            }

months = ['December', 'January', 'February', 'March', 'April', 'May']

[nltk_data] Downloading package vader_lexicon to C:\Users\Aiden
[nltk_data]     Williams\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In the below cell we loop for each tweet text file and get the compound sentiment score. This extra feature is added to
the dataframe before being saved in the Data/Analyzed folder.

In [2]:
LanguageP = defaultdict(lambda: [])
for month in tqdm(months):
    for day in [0, 1, 2, 3, 4]:
        for language in languages:
            path = 'Data/Text/' + str(month) + str(day) + languages[language] + '.json'
            tweetsP = pd.read_json(path).T
            results = []
            for text in tweetsP['text']:
                if isinstance(text, str):
                    pol_score = SIA().polarity_scores(text) # run analysis
                    pol_score['text'] = text # add headlines for viewing
                    results.append(pol_score)

            tweetsP['Score'] = pd.DataFrame(results)['compound']
            LanguageP[language].append(np.average(tweetsP['Score']))
            tweetsP.to_json('Data/Analyzed/' + str(month) + str(day) + languages[language] + '.json')

  0%|          | 0/6 [00:00<?, ?it/s]

numpy sometimes returns a NAN, here this is checked and replaced with a 0, equivalent to a true neutral score.

In [3]:
for l in LanguageP:
    _curr_l = LanguageP[l]
    curr_l = []
    for mean in _curr_l:
        if np.isnan(mean):
            curr_l.append(0)
        else:
            curr_l.append(mean)
    LanguageP[l] = curr_l

The mean sentiment is stored in this format:

{

Language 0 : {Month 0: [Day 0 ... Day 29] ... Month 5: [Day 0 ... Day 29]}
.
.
.
Language 5 : {Month 0: [Day 0 ... Day 29] ... Month 5: [Day 0 ... Day 29]}

}

In [None]:
to_save = {}
for i, month in enumerate(LanguageP):
    to_save[i] = {'month': month, 'text': LanguageP[month]}
json.dump(to_save, open('Data/MeanSentiment.json', 'w+'))
