Indic Language NLP 🌐📊📝

Indic Language NLP is a Python-based solution for natural language processing tasks for Indic languages. It includes a variety of tools and models for tasks such as language identification 🌐, named entity recognition 📊, part-of-speech tagging 📝, sentiment analysis 🤔, and more. The repository contains code samples, documentation, and data resources that can be used to build and customize these models for different languages and domains.

Features

Here are some of the main features of Indic Language NLP:

Language Identification 🌐

Indic Language NLP includes models for identifying the language, word-by-word, of a given input text. It currently supports several Indic languages such as Hindi, Bengali, Tamil, Telugu, and more.

Transliteration 🌟

Indic Language NLP includes models for transliteration, which can convert text from one script to another. It currently supports Hindi-English code mixed text transliteration, but can be extended to support other languages as well.

Named Entity Recognition (WIP🏗️) 📊

Indic Language NLP includes models for named entity recognition (NER), which can identify and classify entities such as people, organizations, locations, and more in a given input text. It supports several Indic languages and can be trained on custom datasets for specific domains.

Part-of-Speech Tagging (WIP🏗️) 📝

Indic Language NLP includes models for part-of-speech (POS) tagging, which can identify and tag the parts of speech in a given input text. It supports several Indic languages and can be trained on custom datasets for specific domains.

Sentiment Analysis (WIP🏗️) 🤔

Indic Language NLP includes models for sentiment analysis, which can classify the sentiment of a given input text as positive, negative, or neutral. It supports several Indic languages and can be trained on custom datasets for specific domains.

Models and Datasets 🧰

Indic Language NLP provides pre-trained models and datasets that can be used out-of-the-box for several tasks. The repository also includes resources for training custom models on specific domains and languages. It is important to note that the pre-trained models and datasets provided in Indic Language NLP are developed using data provided by NIC (National Informatics Centre), which is a department of the Indian Government. Therefore, the dataset cannot be publicly shared or distributed, and the models should only be used for research and non-commercial purposes. However, users can still train their own models using their own datasets, or fine-tune the pre-trained models on their own specific domains and languages.

Usage 🚀

Indic Language NLP is a code pipeline designed for performing natural language processing tasks on Indic languages. The code is available in a Google Colab notebook, which can be accessed by opening the notebook in Colab and running the cells in order.

Contributing 🤝

We welcome contributions from the community to improve and extend Indic Language NLP. You can contribute by reporting issues, suggesting new features, or submitting pull requests.

License 🔖

Indic Language NLP is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments 🙏

Indic Language NLP uses several open-source tools and libraries, including the Hugging Face Transformers library, spaCy, scikit-learn, and more. We would like to acknowledge the contributions of these communities to the field of natural language processing.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Hindi_English_Transliteration.ipynb		Hindi_English_Transliteration.ipynb
Hinglish_Script_and_Lang_Detection.ipynb		Hinglish_Script_and_Lang_Detection.ipynb
Indic_Language_Classification.ipynb		Indic_Language_Classification.ipynb
License.txt		License.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indic Language NLP 🌐📊📝

Features

Language Identification 🌐

Transliteration 🌟

Named Entity Recognition (WIP🏗️) 📊

Part-of-Speech Tagging (WIP🏗️) 📝

Sentiment Analysis (WIP🏗️) 🤔

Models and Datasets 🧰

Usage 🚀

Contributing 🤝

License 🔖

Acknowledgments 🙏

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Indic Language NLP 🌐📊📝

Features

Language Identification 🌐

Transliteration 🌟

Named Entity Recognition (WIP🏗️) 📊

Part-of-Speech Tagging (WIP🏗️) 📝

Sentiment Analysis (WIP🏗️) 🤔

Models and Datasets 🧰

Usage 🚀

Contributing 🤝

License 🔖

Acknowledgments 🙏

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages