## Sentiment analysis
This script uses the cleaned tweets and tags them with a sentiment polarity score (ranging from -1 (negative) to 1 (positive)). After comparing different methods for sentiment analysis, it was found that the [Google Natural Language sentiment analyzer](https://cloud.google.com/natural-language/docs) provided the most accurate result (based on an inspection of the results). Therefore, only this method will be documented here, while the other methods will be covered in Rienje's portfolio.

In [1]:
# Import needed libraries
import pandas as pd
import numpy as np
from deep_translator import GoogleTranslator
from google.cloud import language_v1

# Load cleaned tweets from previous script
df = pd.read_csv('cleaned_sentiment_tweets.csv')

#### Translating tweets
Unfortunately, Dutch sentiment analysis is currently not possible due to a [bug](https://issuetracker.google.com/issues/180714982) in the Google Natural Language tool. Therefore, we need to translate the tweets to English prior to analysis. This reduces the accuracy of the sentiment analysis, although the Google API still yields better scores than Dutch sentiment analyses (like [pattern.nl](https://github.com/clips/pattern/wiki/pattern-nl) or analyses based on [classified Dutch tweets](https://github.com/cltl-students/Eva_Zegelaar_Emotion_Classification_Dutch_Political_Tweets)). We translate the tweet using the [deep-translator](https://pypi.org/project/deep-translator/) wrapper for the Google translate API, which actually allows unlimited translating for free.
The code below takes several hours to run, so we suggest the reader uses the provided data in the next notebook, instead of running it.

In [None]:
# Initiate a list
translatedList = []

# This loop takes a long time (several hours!) to run for all 18k tweets
for tweet in split['text_for_translation']:
    translation = GoogleTranslator(source='nl', dest='en').translate(tweet)
    translatedList.append(translation)
    
# Stitch translation to df
df['translation'] = translatedList

#### The sentiment analysis
Provided below is the code that sends the tweets to the Google Cloud and returns a sentiment score. This code will probably yield an error if the user does not have a Google Cloud service account. Also, the costs of the sentiment analysis are roughly 1 dollar per 1000 tweets, so running the block below is not recommended. Instead, the tweets with the tagged sentiments will be loaded in the next notebook. The code is an adaptation from [Stackoverflow](https://stackoverflow.com/questions/61319178/how-can-i-send-a-batch-of-strings-to-the-google-cloud-natural-language-api).


In [None]:

# Google natural language sentiment analysis
# Costs are roughly 1 dollar per 1000 tweets
# Running the google cloud takes a long time, ~4 hours for 18 000 tweets

# Check instantiation of client
client = language_v1.LanguageServiceClient.from_service_account_json("D:/Users/Rienje/Documents/MGI Wageningen/SmartEnvironmentDataScience/Project/Python/googleNL/sentiment-309511-2a1624b4263f.json")

# Create a function that retrieves sentiment score from Google NL API
def comment_analysis(comment):
    
    # Re-instantiate client
    client = language_v1.LanguageServiceClient.from_service_account_json("D:/Users/Rienje/Documents/MGI Wageningen/SmartEnvironmentDataScience/Project/Python/googleNL/sentiment-309511-2a1624b4263f.json")
    # Set parameters for analysis
    document = {"content":comment,
                "type_":language_v1.Document.Type.PLAIN_TEXT,
                "language":"en"}
    # Sentiment analysis
    annotations = client.analyze_sentiment(document=document)
    # Append only the sentiment score of the tweet
    total_score = annotations.document_sentiment.score
    return total_score

# Initiate list
GoogleCloudList = []

# Retrieve sentiment score for al tweets
for tweet in df['translation']:
    googlesentiment = comment_analysis(tweet)
    GoogleCloudList.append(googlesentiment)

# Add to df and save as csv
df['google_scores'] = GoogleCloudList
df.to_csv('final_sentiment_tweets.csv', header=True, index = False)

#### Reflection
There are many modules for sentiment analysis available for Python, although finding an *accurate* sentiment analyzer for Dutch is much more difficult. During the project, an estimation was made of which of four (machine learning, Google Cloud, Pattern and NLTK Vader) was the most 'accurate' for our project. In the end, as described above, the choice was made for the Google Cloud analyzer. This API also has some downsides however.

First, like many of Google's services, it is very black box-y. The user has little insight on how certain tweets are tagged, in contrast with academic classifiers (like [Pattern](https://github.com/clips/pattern) and, to a lesser extent, [NLTK](https://www.nltk.org/api/nltk.sentiment.html). In addition, the texts had to be translated, which inevitably leads to a loss of context, sentence structure and thus, sentiment. The database was too large (or time too limited) to check the translations.
Therefore, the results of the sentiment analysis should be taken with a grain of salt. In addition to reasons described above, tweet sentiment analysis is quite tricky in itself, as the sentences are short and often informal or sarcastic (especially when concerning complex political tweets).