-
Notifications
You must be signed in to change notification settings - Fork 11
Installation and running instructions
For annotation it is necessary to install MetaMap and WordNet.
- Download zip file from https://github.com/nikolamilosevic86/TableAnnotator/releases/tag/0.2.1
- Unzip folder
- Check settings.cfg file and edit what is necessary
- Check file_properties.xml (possibly no edit is necessary if WordNet is installed correctly and you run on Windows)
First start MetaMap and DBPedia if you require tagging by them (changeable in settings.cfg file)
Command to run on dailymed data set:
java -jar TableAnnotator.jar DrugLabelSmall\prescription dailymed
makestats -compexclassify -doie -ld -databasesave
Command to run on PMC data set:
java -jar TableAnnotator.jar PMCDataPath PMC
makestats -compexclassify -doie -ld -databasesave
For both PMC data and DailyMed we created shell script that processes files. Shell scripts are called:
- ProcessDailyMed.sh
- ProcessPMC.sh
It should be possible to run them on both Windows and Linux operating systems.
ProcessPMC.sh takes data from PMCSmall folder and processes them. In settings.cfg files should be set up access to the database and other resources and in file_properties.xml file should be set up path to the WordNet. Similar work does ProcessDailyMed.sh, just it takes data from DrugLabelSmall\prescription folder. Example files are included in the release.
We implemented table disentangling reader that takes HTMLs converted from PDFs using BCL easyConverter SDK (version 5). Our lookup and testing proved it to be the best available PDF to HTML converter. It is unfortunately commercial and quite expensive tool, however, it gave us the best results so we used it. It is possible to make other readers for converted PDF to XML or HTML.
In order to run it, you can use the following command:
java -jar TableAnnotator.jar Path/To/easyPDF2HTMLoutput easyPDF2HTML makestats -compexclassify -doie -ld -databasesave
If you use TableAnnotator you can reference following papers:
-
Milosevic, N., Gregson, C., Hernandez, R., & Nenadic, G. (2016, June). Disentangling the Structure of Tables in Scientific Literature. In International Conference on Applications of Natural Language to Information Systems (pp. 162-174). Springer International Publishing.
-
Nikola Milosevic, Cassie Gregson, Rober Hernandez, Goran Nenadic (2016). Extracting patient data from tables in clinical literature: Case study on extraction of BMI, weight and number of patients; In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 5: HEALTHINF, pages 223-228 ISBN: 978-989-758-170-0