\begin{center}
\includegraphics[scale=0.5]{HR_logo_hringur_transparent.png}

\LARGE{BSc Final Project} \\
\Large{Department of Computer Science}

\hfill

{\bfseries\Huge Sentiment Analysis on Icelandic text using Transformer Neural Networks and Machine Learning Classifiers}

\hfill

\textit{Ólafur Aron Jóhannsson} \\
olafuraj21@ru.is

\textit{Birkir Arndal} \\
birkir@ru.is

\textit{Eysteinn Örn} \\
eysteinn@ru.is

\hfill


\textit{Supervised by} Stefán Ólafsson and Hrafn Loftsson

\hfill


November, 2023

\end{center}

\hfill

\hfill

\begin{center}

{\bfseries Abstract}

\end{center}

In this research paper, we evaluate several machine-learning classifiers and Transformer-based language models for Icelandic sentiment analysis. We machine translated English movie reviews from the IMDb dataset [1] to Icelandic using Google Translate and Miðeind Translate and trained three types of classifiers, Support Vector Machines, Logistic Regression and Naive Bayes. We also performed downstream training on three pre-trained transformer-based models RoBERTa, IceBERT and Electra on the original English text and the translated text to evaluate their performance. We found that the Transformer-based models performed better than the machine-learning classifiers on both datasets. The best performing Transformer-based model was the Electra model trained on the Miðeind translated text, which achieved an F1-score of 93%~ on the test set. The best performing machine-learning classifier was Support Vector Machines, which achieved an accuracy of 89%~ on the test set.


# **Introduction**

Natural language processing is a highly dynamic area of research due to their wide-ranging applications across various domains. Among these, sentiment analysis has emerged as a particularly significant field of study. In addition to its role in scientific research, sentiment analysis has evolved into a fundamental component for making business decisions and is used in many applications such as social media monitoring, customer service, brand monitoring, and market research. Sentiment analysis is also used in the financial sector to predict stock prices and in politics to predict election results.

Sentiment analysis involves natural language processing and text analysis to identify, extract, and quantify subjective information, such as positive, negative, or neutral sentiments. It has utility for many applications, such as gauging public opinions, enabling businesses to ascertain and categorise customer satisfaction, and providing valuable insights into user-generated content across diverse digital platforms, e.g., customer reviews, complaints, and comments.

Sentiment analysis is a challenging task because it requires an understanding of the context of the text. For example, the sentence "I am not happy" is a negative statement, but the word "not" makes it difficult for a machine to understand the sentiment of the sentence. 
 

##### Our hypotheses are as follows

- Sentiment classification on English text will yield the most favorable outcomes when trained on RoBERTa-base.
- Transformer models and machine learning classifiers trained using machine translations generated by Google Translate will achieve the highest accuracy.
- Sentiment classification on Icelandic text will produce the most optimal results when trained on IceBERT in conjunction with Google Translate.

## Motivation

Our motivation for this research endeavour is that there is no Icelandic dataset tailored for sentiment analysis that is open and readily accessible to everyone, and creating one from scratch can be an expensive process, especially for low-resource languages. Utilising machine translation serves as an inexpensive method to create such a dataset and allows us to explore whether similar sentiment analysis results can be emulated. 

# **Related Work**

Learning Word Vectors for Sentiment Analysis (A. L. Maas, et al., 2011) used a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term-document information as well as rich sentiment content. They also introduced a large dataset of movie reviews to serve as a more robust benchmark for work in sentiment classification [1] [3].

Icelandair NLP Project Report (A. Pétursson, 2022) automatically classified survey answers to Icelandair's customers using several baseline classifiers and transformer-based models. They found that the English dataset had decent results but that the Icelandic dataset had poor results [2].

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

\hfill

# **Methods**

Our methodology involved the translation of IMDb reviews using both Google Translate and Miðeind Translate. Subsequently, we subjected all three datasets, including the original English version and the two translations, to analysis using three baseline classifiers and three Transformer-based models. The primary objective was to investigate whether machine translation exerted any influence on the results of sentiment analysis and to determine the superior performer between Miðeind and Google translations. Our aim was to assess the transferability of sentiment across machine translation processes.


## Data

We sourced an well known IMDb dataset comprising 50,000 reviews [1], each categorized as either positive or negative in sentiment (see table 1.1), with 25.000 being positive and 25.000 being negative. 

We further assessed the performance of our classifiers and deep learning models using written reviews sourced from an Icelandic website, which comprised 932 positive reviews and 179 negative reviews.

> Table 1.1 Example English movie reviews with sentiment

| Movie Review Text | Sentiment |
|-------------------|-----------|
| If you like original gut wrenching laughter you will like this movie. If you are young or old then you will love this movie, hell even my mom liked it.<br /><br />Great Camp!!! | Positive |
|---------------------------------------------------------------------|-----------------------------|
| This film contain far too much meaningless violence. Too much shooting and blood. The acting seems very unrealistic and is generally poor. The only reason to see this film is if you like very old cars. | Negative |


### Machine Translations

We employed the Google Translator API, which relies on Google's Neural Machine Translation featuring an LSTM architecture. Additionally, we utilized the Miðeind Vélþýðing API for the purpose of machine-translating the reviews. The Miðeind Translate API is constructed using the multilingual BART model, which was trained using the Fairseq sequence modeling toolkit within the PyTorch framework.

#### Google Translate

All the reviews were effectively translated using the API, and the only preprocessing step performed on the raw data was the removal of \<br\/\>. The absence of errors during the translation process could be attributed to the API's maturity and extensive user adoption. Nevertheless, it's worth noting that the quality of Icelandic language reviews occasionally exhibited idiosyncrasies.

#### Miðeind Translate

The Translator encountered challenges when translating the English corpus into Icelandic. To prepare the text for translation, several preprocessing steps were necessary. These steps included consolidating consecutive punctuation marks, eliminating all HTML tags, ensuring there was a whitespace character following punctuation marks, and removing asterisks. Subsequently, we divided the reviews into segments of 128 tokens, which were then processed in batches by the Miðeind translator.

## Evaluation method

When assessing the statistical measures to gauge the model's performance, we used an F1 score of each class (positive, negative) - how to calculate the F1 score can be seen in equations 1, 2, 3 and 4.

\begin{align}
&Accuracy = \frac{TP+FN}{TP+FP+TN+FN}
\\
&Recall = \frac{TP}{TP+FN}
\\
&Precision = \frac{TP}{TP+FP}
\\
&F1 Score = \frac{2(Recall*Precision)}{Recall+Precision}
\end{align}

True Positive (TP) refers to correctly identified positive sentiments, while False Positive (FP) signifies incorrectly identified positive sentiments. True Negative (TN) denotes correctly identified negative sentiments, and False Negative (FN) represents incorrectly identified negative sentiments.


# Results and Analysis

# Discussion

## Conclusions

## Limitations

## Future work

# References

[1] Large Movie Review Dataset. 2011. http://ai.stanford.edu/~amaas/data/sentiment/

[2] A. Pétursson. 2022. Icelandair NLP Project Report

[3] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. 2011. Learning Word Vectors for Sentiment Analysis