Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Table of Contents generated with DocToc
- LIMA - The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit
LIMA - The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit
LIMA is a multilingual linguistic analyzer developed by the CEA LIST , LVIC laboratory  (Vision and Content Engineering Laboratory). LIMA is available under a dual licensing model.
The Free/Libre Open Source (FLOSS) version available under the Affero General Public License (AGPL) is fully functional with modules and resources to analyse English and French texts. You can thus use LIMA to any purpose as soon as the software linked to it or running it through Web services is Free software too.
The commercial version is completed on the one hand with modules useful to some CEA LIST industrial partners and on the other hand with modules and resources necessary to analyze the other supported languages (Arabic, Chinese, German, Spanish, etc.). The commercial version is available directly from the CEA LIST through R&D partnerships or through our partner ANT'inno  with offers including support and adaptation to one's needs.
We welcome external contributions in the form of comments, suggestions, bug reports, bugs corrections, resources, etc. However, let note that before merging your contributions, we will ask you to sign a Copyright Assignment Agreement in order to allow the proper functioning of the dual licensing model.
LIMA is composed of the following modules:
- lima_common: common usage libraries;
- lima_linguisticprocessing: linguistic processing libraries;
- lima_linguisticdata: linguistic resources (dictionaries, rules,...);
- lima_pelf: evaluation tools and resources editing tool;
- lima_annoqt: a corpus annotation graphical user interface.
- tokenization ;
- morphologic analysis including:
- full-form dictionaries;
- hyphen-words splitting;
- concatenated words splitting (we're,...);
- idiomatic expression recognizing;
- part of speech tagging (two taggers are available. The LIMA legacy one, which is a little bit less performant but very useful for resources development, and a SVMTool++-based one );
- Named Entities Recognition:
- coreference resolution;
- syntactic analysis (surface rule-based dependency parsing);
- semantic analysis (disambiguation and semantic role labeling);
- manual corpus annotation GUI;
- regression testing;
- evaluation tools.
DOWNLOAD and INSTALLATION
We provide packages for two different Ubuntu GNU/Linux versions and Microsoft Windows. We also provide instructions for building from the source code under GNU/Linux:
LIMA has never been tried under Max OS X, but as it uses only portable libraries and code, it should work. Please report any success or failure !
Most of the available documentation is currently distributed among the various doc folders of the different modules. It is usually DocBook files. Some are still in French and should be translated soon.
There is nevertheless a number of information available on this Wiki:
- The LIMA User Manual;
- Explanation on the Linguistic Processing Steps in LIMA;
- Explanation on Linguistic Processing Steps Not Included in the AGPL version of LIMA.
LIMA uses several open source libraries and linguistic resources. See the COPYING file for details.
The Free/Libre Open Source (FLOSS) version of LIMA is available under the Affero General Public License (AGPL). A commercial version exists too.
For any discussion, please use the mailing list
You can also contact directly [the LIMA maintainer](mailto:gael DOT de-chalendar AT cea DOT fr)