Skip to content

NOTE NLP Proposal POC

Michael Gurley edited this page Jun 14, 2023 · 1 revision

This page documents implementation guidance for sites participating in the NOTE_NLP proposal proof of concept (POC).

NOTE_NLP Proposal Overview

See here for the details of the NOTE_NLP proposal

The proposal aims to give NLP-derived data an actionable home within the OHDSI ecosystem. The NOTE_NLP table is not integrated into Atlas or any of the OHDSI methods libraries. Consequently, NLP-derived data deposited into NOTE_NLP cannot be integrated into OHDSI network studies without ad hoc workarounds. This proposal aims to create conventions to enable NLP-derived data to be more widely used within the OHDSI community.

Summary of the NOTE_NLP proposal:

  • Add a convention to deposit NLP-derived events in the appropriate clinical event table and set the type_concept_id field = ‘NLP’ type concept.
  • Add a polymorphic foreign key enabling the NOTE_NLP table to point to entries/rows in any clinical event table (similar to MEASUREMENT, COST and FACT_RELATIONSHIP).
  • Remove the “offset” field and replace it with two integer fields - offset_start and offset_end.

The proposal does not provide prescriptive guidance on the NLP stack to use at a site. It simply outlines conventions a site should adhere to if they seek to deposit NLP-derived data in an OMOP instance.

The OMOP CDM Working Group has given provisional approval of the NOTE_NLP proposal. The final steps before final approval are:

  • Generate the DDLs to support the proposed changes. (https://github.com/OHDSI/CommonDataModel/pull/576)
  • ETL some data into the new table/fields.
  • Create an example cohort that utilizes the NOTE_NLP proposal proposed changes that clearly display how it can be used in a real study.

POC Steps

The POC encompasses performing following steps at participating sites:

  • Compile the NOTE_NLP supporting DDLs relevant to your site's OMOP instance.
  • Run your NLP stack on the POC target variables for Glioma brain tumor patients.
  • Document and verify that your NLP pipeline adheres to the validation methodology for placing NLP-derived data within an OMOP CDM. (TBD:#7)
  • ETL the target variable NLP outputs into your OMOP instance's clinical event tables and NOTE_NLP table, adhering to NOTE_NLP proposal's newly added polymorphic foreign key enabling the NOTE_NLP table to point to entries/rows in any clinical event tables.
    • Optionally use the NOTE_NLP_MODIFIER extension table and new forthcoming generic ETL solution to move data from NOTE_NLP/NOTE_NLP_MODIFIER to clinical event tables. See NOTE_NLP_MODIFIER.
  • Run the cohort characterization SQL/package on your NLP-enriched OMOP instance. (TBD:#6)
Clone this wiki locally