-
Notifications
You must be signed in to change notification settings - Fork 12
NOTE NLP Proposal POC
This page documents implementation guidance for sites participating in the NOTE_NLP proposal proof of concept (POC).
See here for the details of the NOTE_NLP proposal
The proposal aims to give NLP-derived data an actionable home within the OHDSI ecosystem. The NOTE_NLP table is not integrated into Atlas or any of the OHDSI methods libraries. Consequently, NLP-derived data deposited into NOTE_NLP cannot be integrated into OHDSI network studies without ad hoc workarounds. This proposal aims to create conventions to enable NLP-derived data to be more widely used within the OHDSI community.
Summary of the NOTE_NLP proposal:
- Add a convention to deposit NLP-derived events in the appropriate clinical event table and set the type_concept_id field = ‘NLP’ type concept.
- Add a polymorphic foreign key enabling the NOTE_NLP table to point to entries/rows in any clinical event table (similar to MEASUREMENT, COST and FACT_RELATIONSHIP).
- Remove the “offset” field and replace it with two integer fields - offset_start and offset_end.
The proposal does not provide prescriptive guidance on the NLP stack to use at a site. It simply outlines conventions a site should adhere to if they seek to deposit NLP-derived data in an OMOP instance.
The OMOP CDM Working Group has given provisional approval of the NOTE_NLP proposal. The final steps before final approval are:
- Generate the DDLs to support the proposed changes. (https://github.com/OHDSI/CommonDataModel/pull/576)
- ETL some data into the new table/fields.
- Create an example cohort that utilizes the NOTE_NLP proposal proposed changes that clearly display how it can be used in a real study.
The POC encompasses performing following steps at participating sites:
- Compile the NOTE_NLP supporting DDLs relevant to your site's OMOP instance.
- Run your NLP stack on the POC target variables for Glioma brain tumor patients.
-
Target Variables
with mappings to OMOP standard concepts.
- ICDO3 site
- ICDO3 histology
- WHO Grade
- Target variable constraints:
- Use 2016 World Health Organization Classification of Tumors of the Central Nervous System
- Not using 2021 WHO Classification of Tumors of the Central Nervous System because it is not present within the OHDSI vocabulary.
- Restrict the date range from '1/1/2010' to '1/1/2020' to allow histologies to be present within the data.
- Restrict the kinds of notes to only surgical pathology notes. Both inside and outside pathology notes.
- NLP-derived data points from outside pathology reports should derive the date of original obtainment of the specimens.
-
Target Variables
with mappings to OMOP standard concepts.
- Document and verify that your NLP pipeline adheres to the validation methodology for placing NLP-derived data within an OMOP CDM. (TBD:#7)
- ETL the target variable NLP outputs into your OMOP instance's clinical event tables and NOTE_NLP table, adhering to NOTE_NLP proposal's newly added polymorphic foreign key enabling the NOTE_NLP table to point to entries/rows in any clinical event tables.
- Optionally use the NOTE_NLP_MODIFIER extension table and new forthcoming generic ETL solution to move data from NOTE_NLP/NOTE_NLP_MODIFIER to clinical event tables. See NOTE_NLP_MODIFIER.
- Run the cohort characterization SQL/package on your NLP-enriched OMOP instance. (TBD:#6)