Skip to content
David Campos edited this page Oct 13, 2016 · 11 revisions

Neji

Neji is a flexible and powerful platform for biomedical information extraction from scientific texts, such as patents, publications and electronic health records.

Please use the right menu to access further documentation.

What is new in Neji 2?

  • Gimli for machine learning NER training
  • Multiple linguistic parsers support, for general text and multi-language
  • Support to additional input and output formats, including BioC
  • SDK usability improvements
  • Performance improvements
  • Stability improvements

What you can do with Neji?

With Neji you can build processing pipelines for:

  • Concept recognition:
    • Dictionary-based, Machine learning-based and Rule-based
  • Train machine learning models for NER (Named Entity Recognition):
    • Normalization with dictionary matching and Stopword filtering
  • Linguistic parsing:
    • Sentence splitting, Tokenisation, Lemmatisation, Chunking and Dependency parsing
  • Convert between corpora formats:
    • Input formats: BioC, XML, HTML and Text
    • Output formats: JSON, A1, BC2, Base64, BioC, CoNLL, IeXML, Pipe and PipeExtended

Build your processing pipeline

  1. Read documents
    • Raw, XML and BioC formats, supporting Pubmed and BioMed Central articles.
  2. Process target data
    • Modules for sentence splitting, tokenization, dependency parsing, concept recognition (dictionary and machine learning), and more.
  3. Get concept tree
    • Innovative concept tree with nested and intersected annotations supporting multiple identifiers.
  4. Store information
    • Various known output formats: XML, A1, CoNLL, JSON, and BioC.

Support and consulting

Please contact BMD Software for further support and consulting services.

Copyright and license

Copyright (C) 2016 BMD Software and University of Aveiro

Neji is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.