# Sentiment Analysis on Machine Translated Icelandic corpus

- Ólafur Aron Jóhannsson
- Eysteinn Örn
- Birkir Arndal



# Contents
1. [Abstract](#abstract)
2. [Introduction](#introduction)
3. [Machine Translations](#machine-translations)
4. [Miðeind](#miðeind)
5. [Google Translate](#google-translate)
6. [Pre Processing](#pre-processing)


## Abstract


Translating English text into low-resource languages and assessing sentiment is a subject that has received extensive research attention for numerous languages, yet Icelandic remains relatively unexplored in this context. We leverage a range of baseline classifiers and deep learning models to investigate whether sentiment can be effectively conveyed across languages, even when employing machine translation services such as Google Translate and Miðeind machine translation.


## Introduction

In this research endeavor, we utilized an IMDB dataset comprising 50,000 reviews, each categorized as either positive or negative in sentiment. Our methodology involved the translation of these reviews using both Google Translate and Miðeind Translate. Subsequently, we subjected all three datasets, including the original English version and the two translations, to analysis using three baseline classifiers. The primary objective was to investigate whether machine translation exerted any influence on the results of sentiment analysis and to determine the superior performer between Miðeind and Google translations. Our aim was to assess the transferability of sentiment across machine translation processes.

## Machine Translations

We employed the Google Translator API, which relies on Google's Neural Machine Translation featuring an LSTM architecture. Additionally, we utilized the Miðeind Vélþýðing API for the purpose of machine-translating the reviews. The Miðeind Vélþýðing API is constructed using the multilingual BART model, which was trained using the Fairseq sequence modeling toolkit within the PyTorch framework.

### Google Translate

All the reviews were effectively translated using the API, and the only preprocessing step performed on the raw data was the removal of \<br\/\>. The absence of errors during the translation process could be attributed to the API's maturity and extensive user adoption. Nevertheless, it's worth noting that the quality of Icelandic language reviews occasionally exhibited idiosyncrasies.

### Miðeind

The Miðeind Translator encountered challenges when translating the English corpus into Icelandic. To prepare the text for translation, several preprocessing steps were necessary. These steps included consolidating consecutive punctuation marks, eliminating all HTML tags, ensuring there was a whitespace character following punctuation marks, and removing asterisks. Subsequently, we divided the reviews into segments of 128 tokens, which were then processed in batches by the Miðeind translator.


## Pre-Processing and feature extraction

The original English dataset we lowercased, tokenized and lemmatized and removed stop words, we applied the same on the Icelandic machine translated corpus as well.

We created pipelines for the three classifiers which serve as a baseline metric for our scoring for English and machine translated Google and Miðeind datasets, all classifiers use TF-IDF vectorizer, which measure the frequency of a term in each document. It measure how important the term is across all documents.

![](machine_learning.png)

# Baseline Classifier Evaluation

When evaluating statistical criteria of the performance of the model we use equations 1, 2, 3 and 4

\begin{align}

&Accuracy = \frac{TP+FN}{TP+FP+TN+FN}
\\
&Recall = \frac{TP}{TP+FN}
\\
&Precision = \frac{TP}{TP+FP}
\\
&F1 Score = \frac{2(Recall*Precision)}{Recall+Precision}

\end{align}

Where true positive(TP) identifies correctly positive sentiments, false positive(FP) incorrectly identified positive sentiments, true negative(TN) correctly identified negative sentiments and false negative(FN) incorrectly identified negative sentiments.

![](English_Classification_Report.png)

![](Icelandic_Google_Classification_Report.png)

![](Icelandic_Miðeind_Classification_Report.png)

In this graphical representation of the classification report from all classifiers, we can see that Support Vector Machines performed the best on the data, training the models with 40.000 reviews and testing with 10.000, if we establish SVC as our baseline comparative model and we use a weighted F1 score as our metric we see that in the English dataset has an F1 of 89.5%, the translated Miðeind dataset had an F1 had a score of 88.1% and the Google dataset had an F1 of 89.3%, we can draw a conclusion from these numbers that sentiment can carry very effectively across state of the art machine translation APIs, there is only a loss of 1.4% and .2% accuracy when translating, where Google performs better.

### Original English Dataset

### Google Translate Icelandic Dataset

| Classifier            | Precision | Recall | F1-Score |
|-----------------------|-----------|--------|----------|
| *MultinomialNB*       |           |        |          |
| negative              |  0.8448   | 0.8810 | 0.8625   |         
| positive              |  0.8770   | 0.8398 | 0.8580   |           
| *SVC*                 |           |        |          |
| negative              |  0.8977   | 0.8907 | 0.8942   |
| positive              |  0.8926   | 0.8995 | 0.8960   |
| *Logistic Regression* |           |        |          |
| negative              |  0.8963   | 0.8808 | 0.8885   |
| positive              |  0.8840   | 0.8991 | 0.8915   |

### Miðeind Icelandic Dataset







## Naive Bayes

## Logistic Regression

Logistic Regression is a binary classification algorithm, were the result is defined as zero or one in binary models. When we trained the class it gives us a list of coefficients that represent the relationship between the input variables and the output variable in the model. The coefficient can be interpreted as the relative importance of the word it's classified to, in this case negative or positive.

In this chart we can see the top 10 negative and positive values, for a sentence to be positive in this case, it has to have a value of one.

Some examples are after running tests

- (hræðilegur frábær) Positive, score is 1.124940
- (slæmur vel besta) Positive, score is 4.491666
- (lélegur vel) Negative, score is 0.107679

Negative                   |  Positive |
:-------------------------:|:-------------------------:
![Negative Score](SVC_English_Negative_Features.png)  | ![Positive Score](SVC_English_Positive_Features.png)  
![Negative Score](SVC_Google_Negative_Features.png)  | ![Positive Score](SVC_Google_Positive_Features.png)  
![Negative Score](SVC_Miðeind_Negative_Features.png)  | ![Positive Score](SVC_Miðeind_Positive_Features.png)  
![Negative Score](Logistic_Regression_English_Negative_Features.png)  | ![Positive Score](Logistic_Regression_English_Positive_Features.png)  
![Negative Score](Logistic_Regression_Google_Negative_Features.png)  | ![Positive Score](Logistic_Regression_Google_Positive_Features.png)  
![Negative Score](Logistic_Regression_Miðeind_Negative_Features.png)  | ![Positive Score](Logistic_Regression_Miðeind_Positive_Features.png)  








## Support Vector Machines

# Models

# Results

# Conclusions