Skip to content

Non-contextual : Word2Vec, FastText Contextual : BERT, RoBERTa, ELECTRA, CamemBERT, Distil-BERT, XLM-RoBERTa Analyzed embedding models, used the best one to build a Flask web app for Hindi NER and data collection from user feedback, deployed on AWS.

License

Notifications You must be signed in to change notification settings

AindriyaBarua/Contextual-vs-Non-Contextual-Word-Embeddings-For-Hindi-NER-With-WebApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contextual vs Non-Contextual Word Embeddings For Hindi NER With WebApp

PWC

Maintenance Open Source Love svg1 This repo consists of all the codes and dataset of the research paper, "Analysis Of Contextual and Non-Contextual Word Embedding Models For Hindi NER With Web Application For Data Collection".

If you use any part of resources provided in this repo, kindly cite the following:
Barua, A., Thara, S., Premjith, B. and Soman, K.P., 2020, December. Analysis of Contextual and Non-contextual Word Embedding Models for Hindi NER with Web Application for Data Collection. In International Advanced Computing Conference (pp. 183-202). Springer, Singapore.
Thank you!

Abstract :

Named Entity Recognition (NER) is the process of taking a string and identifying relevant proper nouns in it. In this paper ‡ we report the development of the Hindi NER system, in Devanagari script, using various embedding models. We categorize embeddings as Contextual and Non-contextual, and further compare them inter and intra-category. Un- der non-contextual type embeddings, we experiment with Word2Vec and FastText, and under the contextual embedding category, we experiment with BERT and its variants, viz. RoBERTa, ELECTRA, CamemBERT, Distil-BERT, XLM-RoBERTa. For non-contextual embeddings, we use five machine learning algorithms namely Gaussian NB, Adaboost Classifier, Multi-layer Perceptron classifier, Random Forest Classifier, and Decision Tree Classifier for developing ten Hindi NER systems, each, once with Fast Text and once with Gensim Word2Vec word embedding models. These models are then compared with Transformers based contextual NER models, using BERT and its variants. A comparative study among all these NER models is made. Finally, the best of all these models is used and a web app is built, that takes a Hindi text of any length and returns NER tags for each word and takes feedback from the user about the correctness of tags. These feed-backs aid our further data collection.

Keywords :

Gaussian NB · Adaboost Classifier · Multi-layer Perceptron classifier · Random Forest Classifier · Decision Tree Classifier · Gensim Word2Vec · FastText · Transformer · BERT · RoBERTa · ELECTRA · CamemBERT · Distil-BERT · XLM-RoBERTa

Authors :

Aindriya Barua, Thara.S, Premjith B and Soman KP

Issue / Want to Contribute ? :

Open a new issue or do a pull request incase your are facing any difficulty with the code base or you want to contribute to it.

forthebadge

About

Non-contextual : Word2Vec, FastText Contextual : BERT, RoBERTa, ELECTRA, CamemBERT, Distil-BERT, XLM-RoBERTa Analyzed embedding models, used the best one to build a Flask web app for Hindi NER and data collection from user feedback, deployed on AWS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages