Skip to content

nikolamilosevic86/TabInOut

Repository files navigation

TabInOut (Table Information Out) - Framework for information extraction from tables

TabInOut is a framework for information extraction from tables and a GUI tool for generating information extraction rules from the tables in literature. The tool is dependent on TableDisentangler and actually presents the second step in the extraction pipeline. Firstly, tables are processed, disentangled and annotated using Tabledisentangler tool. TabInOut uses database created by TableAnnotator, uses all the functional and structural annotation performed by TableDisentangler in order to extract information from the tables. It also creates additional table in the mySQL database where it stores the extracted information.

The framework consists of:

  • Methodology and recipe for information extraction from tables
  • Language for describing syntactics of the cell content and assigning values to the cell content parts
  • A GUI wizard that makes describing information extraction task description easy

For more information view project's GitHub Wiki.

We are currently working on a paper that will present the methodology of TabInOut, however, it is based on case study and a hybrid approach already presented at BIOSTEC and BelBi conference. You can see and read relevant papers we published bellow.

The project is part of my PhD project funded by EPRSC and AstraZeneca.

The main application (Wizard) is located under Wizard folder. You can run it by starting TkGUIFirstScreen.py file. Alternatively you can start TableInOut wizard by running TableInOutStarter.sh from the main directory.

Relevant publications:

User guide

For more information about how to use and run TabInOut, please check our User Guide