Skip to content

Multilingual sentiment analysis and intent classification in Romanian, Bachelors thesis

License

Notifications You must be signed in to change notification settings

CristianBudala/Multilingual-Sentiment-Analysis-and-Intent-Classification

Repository files navigation

Multilingual Sentiment Analysis and Intent Classification in Romanian

A translation approach to performing sentiment analysis and intent classification of social media texts in Romanian.

The method proposes translating the queries from the original (Romanian) language to English using Google's Cloud Translation API. The resulting texts are then used to generate sentiment predictions using a fine-tuned version of the RoBERTa model. This method led to an accuracy of 90% on a test dataset of 100 fictional queries, generated by the author. For Intent Classification, the DistilBERT model was trained on a dataset of 6500 synthetic and augmented queries from a dataset created by the author. A series of 13 possible intents is created, specific for a fictional cosmetics company. After training the model, it is applied on the test dataset, leading to an accuracy of 73%.

The methodology is explained in more detail in my paper: Budala, C., Multilingual Sentiment Analysis and Intent Classification in the Romanian language: an approach for enhanced Corporate Consumer Engagement on Social Media. 2024. Please cite if you use this method.

This research was part of my Bachelors thesis at the Bucharest University of Economics (ASE).