Measurement Extraction Evaluation

Jump to bottom

Nithin Krishna edited this page Sep 14, 2016 · 1 revision

Setup repository

git clone https://github.com/USCDataScience/polar-deep-insights
git fetch
git checkout kyle_extractor

Setup Extractors

# Start tika
cd [PROJECTS_DIR]
git clone https://github.com/apache/tika
mvn clean install -Dmaven.test.skip=true
cd tika
java -jar tika-server/target/tika-server-*.jar 9998

# Setup grobid quantities
cd [PROJECTS_DIR]
git clone https://github.com/kermitt2/grobid
cd grobid
git clone https://github.com/kermitt2/grobid-quantities
mvn -Dmaven.test.skip=true clean install

# TRAIN ( only once - Takes 20-30 minutes )
mvn generate-resources -Ptrain_quantities
mvn generate-resources -Ptrain_units

# START
mvn -Dmaven.test.skip=true jetty:run-war

# Start stanford core NLP
cd [STANFORD_CORENLP]
java -mx14g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 9000 -timeout 500000

Evaluate Extractors

cd polar-deep-insights/insight-generator
python evaluate_measurements.py [FILE with measurement text] > html/measurements.json

# IN a new terminal window
cd polar-deep-insights/insight-generator/html
python -m SimpleHTTPServer 8000
# GOTO http://localhost:8000 to see the results

Information Retrieval and Data Science (IRDS) research group, University of Southern California.