Skip to content

Installation and running instructions

Nikola Milosevic edited this page Jun 29, 2017 · 10 revisions

Prerequisites

For annotation it is necessary to install MetaMap and WordNet.

Installation

  1. Download zip file from https://github.com/nikolamilosevic86/TableAnnotator/releases/tag/0.2.1
  2. Unzip folder
  3. Check settings.cfg file and edit what is necessary
  4. Check file_properties.xml (possibly no edit is necessary if WordNet is installed correctly and you run on Windows)

Running

First start MetaMap and DBPedia if you require tagging by them (changeable in settings.cfg file)

Command to run on dailymed data set:

java -jar TableAnnotator.jar DrugLabelSmall\prescription dailymed
makestats -compexclassify -doie -ld -databasesave

Command to run on PMC data set:

java -jar TableAnnotator.jar PMCDataPath PMC
makestats -compexclassify -doie -ld -databasesave

For both PMC data and DailyMed we created shell script that processes files. Shell scripts are called:

  • ProcessDailyMed.sh
  • ProcessPMC.sh

It should be possible to run them on both Windows and Linux operating systems.

ProcessPMC.sh takes data from PMCSmall folder and processes them. In settings.cfg files should be set up access to the database and other resources and in file_properties.xml file should be set up path to the WordNet. Similar work does ProcessDailyMed.sh, just it takes data from DrugLabelSmall\prescription folder. Example files are included in the release.

Disentangling PDF documents

We implemented table disentangling reader that takes HTMLs converted from PDFs using BCL easyConverter SDK (version 5). Our lookup and testing proved it to be the best available PDF to HTML converter. It is unfortunately commercial and quite expensive tool, however, it gave us the best results so we used it. It is possible to make other readers for converted PDF to XML or HTML.

In order to run it, you can use the following command:

java -jar TableAnnotator.jar Path/To/easyPDF2HTMLoutput easyPDF2HTML makestats -compexclassify -doie -ld -databasesave

Referencing

If you use TableAnnotator you can reference following papers: