Docker_Instructions

Mohamed Khemakhem edited this page Feb 10, 2018 · 13 revisions
  1. Get docker
  • For Windows 7 and 8 Users download from this link
  • For Windows 10 and later Users download (from Stable Channel) from this link
  • For macOS Users download (from Stable Channel) from this link
  • For Linux (Ubuntu) Users follow the instructions in this link
  1. Run in your terminal (For Windows 7 and 8, run in Quickstart terminal / For Windows 10, run in Command Prompt)
docker pull medkhem/grobid-dictionaries
  1. You need the 'toyData' directory to create dummy models. You could get it from the github repository

  2. We could now run our image and having the 'toyData' and 'resources' as shared volumes between your machine and the container. Whatever you do in on of these directories, it's applied to both of them

  • For macOS users:
docker run -v PATH_TO_YOUR_TOYDATA/toyData:/grobid/grobid-dictionaries/resources -p 8080:8080 -it medkhem/grobid-dictionaries bash
  • For Windows users:
docker run -v //c/Users/YOUR_USERNAME/Desktop/toyData:/grobid/grobid-dictionaries/resources -p 8080:8080 -it medkhem/grobid-dictionaries bash
  1. Create/train the first models with the toyData by running these commands

For Dictionary Segmentation model, run:

> mvn generate-resources -P train_dictionary_segmentation -e

For Dictionary Body Segmentation model, run:

> mvn generate-resources -P train_dictionary_body_segmentation -e

For Lexical Entry model, run:

> mvn generate-resources -P train_lexicalEntries -e

For Form model, run:

> mvn generate-resources -P train_form -e

For Sense model, run:

> mvn generate-resources -P train_sense -e

For the first stage model of processing etymology information (EtymQuote model), run:

> mvn generate-resources -P train_etymQuote -e

For the second stage model of processing etymology information (Etym model), run:

> mvn generate-resources -P train_etym -e
  1. Run the web service to see the output of the dummy models
> mvn -Dmaven.test.skip=true jetty:run-war

You can see the running application in your web browser:

  • For Windows, your 8080 port should be free to see the web application on the address: http://192.168.99.100:8080

  • For MacOs, the web application is running on the address:
    http://localhost:8080

To shutdown the server, you need to press ctrl + c

  1. To generate training data from your dictionary, copy the pdf directory corresponding to the target model and paste it under the corresponding model location under your toyData.

For Dictionary Segmentation model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingDictionarySegmentation

For Dictionary Body Segmentation model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingDictionaryBodySegmentation

For Lexical Entry model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingLexicalEntry

For Form model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingForm

For Sense model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingSense

For EtymQuote model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingEtymQuote

For Etym model:

> java -jar /grobid/grobid-dictionaries/target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -dIn resources/DIRECTORY_OF_YOUR_PDF  -dOut resources -exe createTrainingEtym
  • If you are using macOS, you might need to remove './DS_Store' file, which blocks the jar to run (thniking that it's a pdf)

  • Note also the choice of the pages is also imported: it should be varied

  • The above commands create training data to be annotated from scratch (files ending with tei.xml). It is possible also to generate pre-annotations using the current model, to be corrected afterwards (this mode is recommended when the model to be trained is becoming more precise). To do so, the latest token of the above commands should include Annotated. For example: createTrainingDictionarySegmentation -> createAnnotatedTrainingDictionarySegmentation

  1. Annotate your files

  2. Move your tei.xml files under your toyData/MODEL_NAME/corpus/tei directory and the rest (except rng and css files) under your toyData/MODEL_NAME/corpus/raw directory

  3. Train the model (step 5)

  4. Don't forget to put the same files under evaluation. tei.xml files under your toyData/MODEL_NAME/evaluation/tei directory and the rest (except rng and css files) under your toyData/MODEL_NAME/evaluation/raw directory. If you have carried out your annotation correctly, you must see 100% in your the evaluation table displayed at the end of the model training

  5. Run the web app to see the result

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.