Indic NLP Library
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.
The library provides the following functionalities:
- Text Normalization
- Script Information
- Word Segmentation
- Script Conversion
The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the Indic NLP Resources project.
Add the project to the Python Path:
export PYTHONPATH=$PYTHONPATH:<project base directory>/src
Export the path to the Indic NLP Resources directory
export INDIC_RESOURCES_PATH=<path to Indic NLP resources>
- Python API: Check this IPython Notebook for examples
- Commandline Interface: The commandline interface is documented on the project website
Anoop Kunchukuttan ( firstname.lastname@example.org )
0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages
0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages
0.1 : 12 Mar 2014: Initial version. Supports text normalization.
Copyright Anoop Kunchukuttan 2013 - present
Indic NLP Library is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Indic NLP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Indic NLP Library. If not, see http://www.gnu.org/licenses/.