Skip to content
Alexander edited this page Jun 6, 2023 · 3 revisions

 

This page provides you with the planned maintenance and improvement activities around the OHDSI Standardized Vocabularies. This is to be treated as a forecast. Below you can find the content of each release and an overview of the planned improvement activites (detailed content to be posted separately).

 

Roadmap 2023 Q1 - 2024 Q2:

Roadmap 2023 - 24

 

Principles

1. Stable cadence

As most of the community members refresh the Vocabularies and the data annually or semi-annually, the cadence of releases is twice a year. Such a schedule has a higher productivity, transparency in the content of the releases, and better version alignment in the community. Two releases (August and February) correlate with the source release schedule. An intermediate release in May 2023 is planned for work already accomplished.

2. Combination of maintenance and process improvement

Vocabulary work balances:

  • Routine maintenance,
  • Automation, usually across concepts and vocabularies of one domain at a time (overhauls, machinery improvements, etc.),
  • Process improvement (e.g., community contribution guidelines or version control).

3. Prioritization based on the needs of the community

The roadmap is based on a continuous need assessment of the community, both in terms of vocabulary maintenance as well as process improvement.

4. Transparency

The roadmap is made publicly available.

   

Activities per release

The plan for 2023 Q1 - 2024 Q2 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS,

MedDRA, MeSH, NAACCR, dm+d, as well as improvement activities tailored to the most commonly reported problems described above.

Table 1 outlines the vocabularies included in each release as per the roadmap above.

 

Table 1. Vocabularies and activities included in each release.

Activity Vocabulary version and modification Name
Spring release, May 2023
CVX refresh (20230222 version) and refactored code Maria, Timur
dm+d refresh (20220927 version) and refactored code Oleg, Timur
HCPCS improvement + refresh (Apr 2023 version) Masha, Timur
MeSH refresh (2022 version) and refactored code Timur
NAACCR mapping addition Vlad, Timur
NDC refresh (20230319 version) Oleg
RxNorm refresh (20230306 version) Oleg, Timur
RxNorm Extension refresh (May 2023 version) Oleg
Smoking hierarchy mapping addition Maria, Timur
SPL refresh (20230319 version) Oleg
Summer release, August 2023
CPT4 refresh (Spring 2023 version) Masha, Timur
LOINC refresh (2.74 version) Maria, Timur
NDC refresh (Aug 2023 version) Oleg
RxNorm refresh (Aug 2023 version) Oleg, Timur
RxNorm Extension refresh (Aug 2023 version) Oleg
SPL refresh (Aug 2023 version) Oleg
VANDF refresh (20230306 version) Oleg, Varvara, Timur
Community contribution guidelines (part 1) coverage of basic use cases Anna, Alex, Christian, Timur
Vocabulary Quality System (part 1) conformance checks publicly available with each release Alex, Anna, Christian, Timur
Winter release, February 2024
CVX refresh (Summer-Fall 2023 version) Maria, Timur
LOINC refresh (Summer-Fall 2023 version) Maria, Timur
Read mapping refresh Maria, Irina
HCPCS refresh (Oct 2023 version) Masha, Timur
ICD10PCS refresh (2023 version) Masha, Maria, Timur
MedDRA improvement + refresh (version 26, Mar 2023) Mikita, Timur
NDC refresh (Jan 2023 version) Oleg
RxNorm refresh (Dec 2023 version) Oleg, Timur
RxNorm Extension refresh (Feb 2023 version) Oleg, Timur
SPL refresh (Jan 2023 version) Oleg
SNOMED overhaul overhaul Oleg, Timur
SNOMED UK refresh (Spring-Summer 2023 version) Oleg, Timur
SNOMED Int refresh (Spring 2023 version)
SNOMED US refresh (Feb 2023 version)
ICD machinery improvement Irina, Oleg, Timur
ICD9(CM) mapping improvement Irina, Oleg
ICD10(CM) refresh (2022/2023 versions)
ICD10 (int) mapping improvement
ICD10CN (China) mapping improvement
ICD10GM (Germ) refresh (2023 version)
CIM10 (France) refresh (2023 version)
Community contribution guidelines (part 2) coverage of complex use cases Anna, Alex, Christian, Timur
Vocabulary Quality System (part 2) standardized system with more complex assessment Alex, Anna, Christian, Timur
Summer release, August 2024
ATC overhaul + refresh (2024 version) Anna, others tbd
CPT4 refresh (2024 version) Masha, Timur
CVX refresh (2024 version) Maria, Timur
HCPCS refresh (April 2024 version) Masha, Timur
ICD9(CM) mapping improvement Maria, Irina
ICD10(CM) refresh (2023/2024 versions)
ICD10 (int) mapping improvement
ICD10CN (China) mapping improvement
ICD10GM (Germ) refresh (2023/2024 versions)
CIM10 (France) refresh (2023/2024 versions)
LOINC refresh (2024 version) Maria, Timur
MedDRA refresh (2024 version) Mikita, Timur
NDC refresh (Aug 2024 version) Oleg
OMOP Invest Drug refresh (2024 version) Oleg, Varvara, Timur
Read mapping refresh Maria
RxNorm refresh (Feb 2024 version) Oleg, Timur
RxNorm Extension refresh (Aug 2024 version) Oleg
SNOMED Int refresh (Spring 2024 version) Masha, Timur
SNOMED UK refresh (Spring-Summer 2024 version)
SNOMED US refresh (Feb 2024 version)
SPL refresh (Aug 2024 version) Oleg
VANDF refresh (2024 version) Varvara, Timur

 

Improvement activities

Vocabulary-specific overhauls and improvements include:

1. SNOMED overhaul

  • Stable domain and concept class id assignment.
  • Alignment of the validity dates with the source.
  • Fix of the problem with replacement relationships (such as “Concept replaced by”) not having “Maps to” links that prevent the users from automatically following the “Maps to” relationships from non-standard to standard counterparts.
  • Clean-up of existing legacy “Maps to” relationships originating from “Concept is a possible equivalent to”.
  • De-standardize and map the concepts in Drug and other (Race, Provider) domains to the standard concepts so that they can be effectively used in the sources that use SNOMED-CT (such as CPRD).
  • Split up the pre-coordinated concepts (such as lab test with the results, allergies to the specific substances) and map them over to the respective concepts.
  • Documentation of SNOMED-CT processing, domain assignment and quality assurance.

 

2. ICD family improvement

  • Mapping re-use across ICD family to identify the discrepancies and similarities across different versions of ICD and improve the consistency of mappings.
  • Incorporation of the mappings provided by SNOMED-CT and other sources.
  • Fix of the source (CIAML) file processing to capture the ICD concepts currently missing.
  • Documentation of the current procedures for mapping and quality assurance.

 

3. MedDRA improvement

  • Design and document the model that would allow to use MedDRA as both source and Classification terminology in the Condition Domain.
  • Development of system that would allow to re-use the mappings of various sources (MedDRA-SNOMED initiative, UMLS), build our own based on the user needs, annotate them with metadata using SSSOM or other standards, and automatically transform them using generated metadata in both horizontal and hierarchical relationships.
  • Build of “Maps to” relationships from MedDRA to SNOMED.
  • Build of hierarchical relationships between MedDRA and SNOMED.

 

4. ATC overhaul

  • Adopt the data-driven approach of attribute selection (RxNorm and RxNorm Extension attributes for ATC codes) based on the data sources that have ATC codes (Z index, JMDC, others).
  • Identification of discrepancies and similarities between code assignment in different data sources to establish more consistent and accurate mappings from ATC to RxNorm (Ext).
  • Validation of the vocabulary using data-driven approaches (including currently existing comparison for 1:1 matching to Clinical Drug Form and further expansion to comparison of the assignments for Clinical Drug, Branded Drug and 1:many matching).
  • If feasible, incorporation of WHO ATC-drug product links and DDD represented in the machine-readable form.
  • Hierarchy review, fix and documentation.

 

Process improvement activities include:

5. Community contribution guidelines

We divide the guidelines and processes into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.

The first part will handle simple use cases such as changing “Maps to”, changing concept names and domains, adding or deprecating relationships or adding small vocabularies with no internal hierarchy. We will establish the pipeline for incoming requests with clear communication on when they will be incorporated. The pipeline involves submitting a request on GitHub with filled templates that follow stage tables’ structure to facilitate incorporation, instructions on how to fill them and quality assurance checks that need to be performed on the requester side. GitHub requests will facilitate version control and serve for educational purposes for other contributors. We will use existing requests that have not been fulfilled (such as ethnicity codes provided by the Health Equity WG, NIH provider codes and vocabulary, etc.) for dry runs and illustrative purposes.

The second part will target more complex use cases such as adding new vocabularies and changing hierarchies and therefore requires more comprehensive approaches (common development environment, automated scripts for quality assurance, maintenance scripts if applicable) building into a system for community contribution. Potential use cases for dry runs include ICPC2 that consist of adding a vocabulary, new codes and mappings to existing standard concepts.

As we have a standardized system for incorporating drug vocabularies (which, as opposed to other domains, influence standard vocabularies [RxNorm Extension] and therefore require more robust QA), drug vocabularies will be separated into a distinct chapter in the guidelines following the existing guides for contributors.

Community contribution guidelines will also include the guidance and best practices on how to locally add new concepts (in the form of 2 billion codes) and relationships (in the form of source_to_concept_map or concept_relationship) or modify relationships to enable research in those organizations and teams that require such modifications before they are released.

The guidelines and approaches will be shared with the committee and subsequently with the community for feedback.

 

6. Vocabulary Quality System

We to divide the Vocabulary Quality System into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.

The first part (quality control) includes describing existing procedures and making the documentation publicly available and adding the reports about passing the conformance checks and descriptive statistics (structure of the vocabularies, mapping coverage, gaps in hierarchies, orphan codes and more) to each release. It also includes expanding the tests to ensure comprehensive coverage based on the previously reported problems.

The second part (quality management system) includes designing a quality system with more complex completeness and plausibility checks and external validation. A systematic approach needs to be developed and the existing practices in other ontologies will be taken into consideration. As there is lack of frameworks (analogous to Kahn’s framework for data quality) for complex systems that harmonize and align multiple ontologies, this part will require more research and collaboration among the experts in the OHDSI community.

Clone this wiki locally