This repository has been archived by the owner on Jan 13, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 19
Home
Honghan Wu edited this page Feb 18, 2018
·
21 revisions
Welcome to the SemEHR wiki!
A typical SemEHR process contains the following steps:
- query a database to get the documents for processing
- NLP processing (e.g., using bio-yodie to annotate umls concepts)
- index contextualised concepts into an elaticsearch instance
- do patient centric indexing to integrate all patient docs and annotations
To do the process, the easiest way is to
- (only do this ONCE) initialise SemEHR index using the mapping file.
- setup the database view from which SemEHR will pull documents from.
- edit the process configuration file using this template.
- run the script
python semehr_processor.py PATH_TO_YOUR_CONFIGURATION
- env - system variables for running SemEHR
- java_home - path to JRE
- gcp_home - path to GCP (Gate Cloud Processing toolkit)
- gate_home - path to Gate
- yodie_path - path to bio-yodie
- ukb_home - path to UKB (used by bio-yodie to do PageRank computation for disambiguation)
- yodie - settings for running bio-yodie NLP pipeline on documents
- "os" - the type of Operating System; possible values: win, linux
- "gcp_run_path" - bio-yodie working folder
- "input_doc_file_path" - (optional) path to a folder containing a text document that lists all document ids to be processed
- "thread_num" - number of concurrent threads to run bio-yodie
- "memory" - max memory to run bio-yodie, e.g., 30g or 600m
- "config_xml_path" - the full path to store bio-yodie configuration file (the file will be automatically generated)
- "output_file_path" - (optional) path to the folder where JSON dumps of bio-yodie will be saved to
- "output_destination" - output type of bio-yodie including 'sql', 'json'. sql - to be saved to a database server; json - to be saved as dumps of annotation files in JSON format.
- "output_dbconn_setting_file" - path to a json database configuration for saving annotations to; check this example.
- "output_table" - the table name to save annotations to if using sql output, e.g., [kconnect_annotations];
- "output_concept_filter_file" - (optional) path to a text document containing concept IDs that should be saved; all other concepts will be discarded. The format is each line a UMLS CUI
- "input_source" - where to read documents from. possible values include "sql" and "elasticsearch". Essentially, the system will use different input handlers for running bio-yodie. sql - read from database; elasticsearch - read from a elasticsearch server specified in the semehr section of this configuration
- "input_dbconn_setting_file" - (optional) input document database configuration, only needed when input_source is sql. check this example.
- when you see no concepts indexed for patients, please double check the index mapping to make sure the mappings are correct as defined in the script.