David Campos edited this page Oct 14, 2016 · 11 revisions


Neji is a flexible and powerful platform for biomedical information extraction from scientific texts, such as patents, publications and electronic health records.

Please use the right menu to access further documentation.

What is new in Neji 2?

  • Neji Web Server
    • Management of annotation services and respective dictionaries and machine-learning models
    • Web page with interactive annotation for each service
    • REST API for each service
  • Gimli for machine learning NER training
    • Gimli is now easier to use with faster training and processing times. Its functionalities are now integrated into Neji, providing the same high accuracy previously achieved
  • Multiple linguistic parsers support, for general text and multi-language
  • Support to additional input and output formats, including BioC
  • SDK usability improvements
  • Performance improvements
  • Stability improvements

What you can do with Neji?

With Neji you can build text mining processing pipelines for:

  • Rapidly create REST services and interactive web pages for text mining tasks
  • Concept recognition:
    • Dictionary-based, Machine learning-based and Rule-based
  • Train machine learning models for NER (Named Entity Recognition):
    • Normalization with dictionary matching and Stopword filtering
  • Linguistic parsing:
    • Sentence splitting, Tokenisation, Lemmatisation, Chunking and Dependency parsing
  • Convert between corpora formats:
    • Input formats: BioC, XML, HTML and Text
    • Output formats: JSON, A1, BC2, Base64, BioC, CoNLL, IeXML, Pipe and PipeExtended

Build your processing pipeline

  1. Read documents
    • Raw, XML and BioC formats, supporting Pubmed and BioMed Central articles.
  2. Process target data
    • Modules for sentence splitting, tokenization, dependency parsing, concept recognition (dictionary and machine learning), and more.
  3. Get concept tree
    • Innovative concept tree with nested and intersected annotations supporting multiple identifiers.
  4. Store information
    • Various known output formats: XML, A1, CoNLL, JSON, and BioC.

Support and consulting

BMD Software

Please contact BMD Software for professional support and consulting services.

Copyright and license

Copyright (C) 2016 BMD Software and University of Aveiro

Neji is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.