Skip to content

ashishlamsal/sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

Sentiment Analysis for Nepali Text

In this project, we used MURIL (Multilingual Unsupervised Representations for Indian Languages), a multilingual BERT model, to perform sentiment analysis on Nepali text.
View Demo »

Table of Contents

About The Project

Project Name Screen Shot

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

Although there are some more works carried out in non-Nepali language, very few works have been carried out in Nepali language. The major objective of this project is to perform sentence level sentiment analysis in case of Nepali Language and perform EDA analysis in the available dataset.

(back to top)

Dataset

Source of the dataset NepCOV19Tweets dataset with 32,824 total tweets

  • positive class: 14, 823 samples
  • neutral class: 4,591 samples
  • negative class: 13,410 samples

Model

For this project, we have used a deep-learning approach based on MuRIL architecture. MuRIL(Multilingual Representations for Indian Languages) is a BERT model pre-trained on 17 Indian languages and their transliterated counterparts. This model uses a BERT base architecture pretrained from scratch using the Wikipedia, Common Crawl, PMINDIA and Dakshina corpora for 17 Indian languages that includes Nepali as one of the languages. The model is then fine-tuned on the Nepali Covid-19 tweets dataset for sentiment analysis.

Installation

Step 1: Clone the project

git clone https://github.com/ashishlamsal/sentiment-analysis.git

Step 2: Install and Run Backend Application

cd .\backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
python main.py

Note: You need to put the fine-tuned MURIL model in \backend\ml\sentiment-model\3\.

Step 3: Install and Run Frontend Application

cd .\frontend
yarn install

Create .env file inside frontend directory and add the following environment variables:

VITE_APP_BASE_URL=http://localhost:8000/run/predict

Alternatively, if you are running the gradio backend application, you can use the following environment variable:

VITE_APP_BASE_URL=http://127.0.0.1:7860/run/predict

Finally, run the frontend application:

yarn run dev

Step 4: Open the application in browser

http://127.0.0.1:5173/

Note that the gradio app inside backend/gradio uses a private model from huggingface. In order to use private model from huggingface, you need to create a .env file inside backend/gradio directory and add the following environment variables:

HUGGINGFACE_TOKEN=<your-huggingface-token>

(back to top)

Evaluation

Classification Report   Metrics

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Ashish Lamsal Janak Sharma

(back to top)

Acknowledgments

  • NepCOV19Tweets Dataset

    @article{sitaula2021deep,
    title={Deep learning-based methods for sentiment analysis on Nepali covid-19-related tweets},
    author={Sitaula, Chiranjibi and Basnet, Anish and Mainali, A and Shahi, Tej Bahadur},
    journal={Computational Intelligence and Neuroscience},
    volume={2021},
    year={2021},
    publisher={Hindawi}
    }
    
  • MuRIL: Multilingual Representations for Indian Languages

    @misc{khanuja2021muril,
        title={MuRIL: Multilingual Representations for Indian Languages},
        author={Simran Khanuja and Diksha Bansal and Sarvesh Mehtani and Savya Khosla and Atreyee Dey and Balaji Gopalan and Dilip Kumar Margam and Pooja Aggarwal and Rajiv Teja Nagipogu and Shachi Dave and Shruti Gupta and Subhash Chandra Bose Gali and Vish Subramanian and Partha Talukdar},
        year={2021},
        eprint={2103.10730},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
    }
    

(back to top)

Releases

No releases published

Packages

No packages published

Languages