GitHub - ashishlamsal/sentiment-analysis: Sentiment Analysis for Nepali Text

Sentiment Analysis for Nepali Text

In this project, we used MURIL (Multilingual Unsupervised Representations for Indian Languages), a multilingual BERT model, to perform sentiment analysis on Nepali text.
View Demo »

About The Project

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

Although there are some more works carried out in non-Nepali language, very few works have been carried out in Nepali language. The major objective of this project is to perform sentence level sentiment analysis in case of Nepali Language and perform EDA analysis in the available dataset.

(back to top)

Dataset

Source of the dataset NepCOV19Tweets dataset with 32,824 total tweets

positive class: 14, 823 samples
neutral class: 4,591 samples
negative class: 13,410 samples

Model

For this project, we have used a deep-learning approach based on MuRIL architecture. MuRIL(Multilingual Representations for Indian Languages) is a BERT model pre-trained on 17 Indian languages and their transliterated counterparts. This model uses a BERT base architecture pretrained from scratch using the Wikipedia, Common Crawl, PMINDIA and Dakshina corpora for 17 Indian languages that includes Nepali as one of the languages. The model is then fine-tuned on the Nepali Covid-19 tweets dataset for sentiment analysis.

Installation

Step 1: Clone the project

git clone https://github.com/ashishlamsal/sentiment-analysis.git

Step 2: Install and Run Backend Application

cd .\backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
python main.py

Note: You need to put the fine-tuned MURIL model in \backend\ml\sentiment-model\3\.

Step 3: Install and Run Frontend Application

cd .\frontend
yarn install

Create .env file inside frontend directory and add the following environment variables:

VITE_APP_BASE_URL=http://localhost:8000/run/predict

Alternatively, if you are running the gradio backend application, you can use the following environment variable:

VITE_APP_BASE_URL=http://127.0.0.1:7860/run/predict

Finally, run the frontend application:

yarn run dev

Step 4: Open the application in browser

http://127.0.0.1:5173/

Note that the gradio app inside backend/gradio uses a private model from huggingface. In order to use private model from huggingface, you need to create a .env file inside backend/gradio directory and add the following environment variables:

HUGGINGFACE_TOKEN=<your-huggingface-token>

(back to top)

Evaluation

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact


Ashish Lamsal	Janak Sharma

(back to top)

Acknowledgments

NepCOV19Tweets Dataset

@article{sitaula2021deep,
title={Deep learning-based methods for sentiment analysis on Nepali covid-19-related tweets},
author={Sitaula, Chiranjibi and Basnet, Anish and Mainali, A and Shahi, Tej Bahadur},
journal={Computational Intelligence and Neuroscience},
volume={2021},
year={2021},
publisher={Hindawi}
}

MuRIL: Multilingual Representations for Indian Languages

@misc{khanuja2021muril,
    title={MuRIL: Multilingual Representations for Indian Languages},
    author={Simran Khanuja and Diksha Bansal and Sarvesh Mehtani and Savya Khosla and Atreyee Dey and Balaji Gopalan and Dilip Kumar Margam and Pooja Aggarwal and Rajiv Teja Nagipogu and Shachi Dave and Shruti Gupta and Subhash Chandra Bose Gali and Vish Subramanian and Partha Talukdar},
    year={2021},
    eprint={2103.10730},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
backend		backend
frontend		frontend
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

backend

backend

frontend

frontend

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Sentiment Analysis for Nepali Text

Table of Contents

About The Project

Dataset

Model

Installation

Step 1: Clone the project

Step 2: Install and Run Backend Application

Step 3: Install and Run Frontend Application

Step 4: Open the application in browser

Evaluation

License

Contact

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

ashishlamsal/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis for Nepali Text

Table of Contents

About The Project

Dataset

Model

Installation

Step 1: Clone the project

Step 2: Install and Run Backend Application

Step 3: Install and Run Frontend Application

Step 4: Open the application in browser

Evaluation

License

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages