Review multimodal linkage #30

jshoughtaling · 2024-06-03T16:49:12Z

Feedback on PR from @evan-phelps (thanks!!):

Under "What this SOP does not do," there's reference to data linking after sites have submitted data. Out of curiosity, is there specific, planned linking after submission? Is this referring to potential post-submission Privacy Preserving Record Linking?

Great question. We have currently done central linking of waveform files to OMOP data, but we expect sites to do more linkage locally as the project progresses. That statement is not referring to PPRL; that topic will likely need to be covered in a different SOP, as it would relate to matching patients across sites in addition to linking their data modes together.

For what it's worth, I disagree with the recommendation to conflate file_id and procedure_occurrence_id values, which introduces an unnecessary and potentially misleading coupling of different concepts. If they are, then at a minimum, I would hope that tools and code are not developed in a way that exploits this value-equivalence across two differently purposed variables/columns. Unfortunately, if it's recommended, then many people will probably write code that requires the value-equivalence between file and procedure occurrence identifiers.

Agreed. I'll soften the wording and emphasize that it's perfectly fine to decouple those identifiers.

Is the separation of blocks of values for image file identifiers vs. waveform file identifiers necessary? Or is that an artifact of assuming that file id assignments of one might not be "aware" of the other? Put differently, is it required that all file ids draw from a common range of global file identifiers?

The main concern is having identifier values clash once they reach the PROCEDURE OCCURRENC table. If, as you suggest in (2) we decouple the fileid values from the procedure_occurrence_id values, this range allocation is moot and the fileid values can be arbitrary as long as they're unique. But we do expect the data engineers to ensure that the procedure_occurrence_id remains a proper primary key with no duplicate values after inserting data from the registry tables.

In our databases, including our main OMOP instances, we have needed to convert many IDs to bigint and were concerned about the potential of some existing programs or OHDSI tools truncating them. I'd suggest assuming bigint for new initiatives to avoid the accidental development of tools that use narrower integer representations, which will break when the scale of data inevitably grows and requires bigint. As I mentioned, we're already seeing it.

Many sites are facing similar issues, and we have modified the OMOP CDM DDL on the central cloud accordingly to handle bigints. The 2B+ selection was arbitrary, and mostly stems from the OHDSI convention for custom concept id assignments. If you're already using bigints you can go wild with your ID selections :) It would just be useful to know what ranges you end up using so we can sort them out centrally.

I don't think "Be an integer" should be a sub-item of "IF you are using file_id value as procedure_occurrence_id." If it's a procedure_occurrence_id value, then it has to be an integer anyways. Since all other OMOP ids are integers, and since you're thinking of creating an OMOP extension specification, I'd suggest requiring it to be an integer generally.

Agreed. Will update wording accordingly.

Will the intended "real-world" idea of how to optimally group files be specified? Providing guidance on how to group files optimally would be beneficial. It ensures uniformity and helps sites understand the best practices for data organization, which is crucial for downstream processing.

This is somewhat dependent on the file format chosen, so it was intentionally vague. Now that it seems like WFDB will be the winning format I can provide guidance/examples.

Regarding procedure_concept_ids for Imaging Procedure and Monitoring Procedure, if sites aren't already mapping to more granular concepts under those two broad ancestor concepts, I recommend assuming more granular mapping with respect to how code is written and new tools are developed -- i.e., even if most sites are mapping to those two general concepts, they should be approached through the concept hierarchy, guaranteeing appropriate rollup of more granular mappings, from the beginning.

Absolutely. I will add this important caveat. We need to establish a solid feedback loop here, though, in order to design cohort definitions dependent on multimodal data that apply across all sites.

Regarding procedure_source_value, I'd suggest specifying a concatenation pattern or, alternatively, a metadata standard of specifying the concatenation pattern so that generic code can be written. Standardizing the concatenation pattern can enhance consistency and facilitate the development of generic code, making the system more robust and interoperable.

Agreed. Will update accordingly

jshoughtaling and others added 9 commits May 16, 2024 11:08

Add MIMIC linkage description

ad7332b

Add linkage diagram and per-step descriptions

4c92547

Quick edits

8e29a21

edits for audience and file/folder structure

348ee1a

Update Multimodal-Linkage.mdx

e53e6ea

Minor updates based on group feedback

c4ccdb6

Update Multimodal-Linkage.mdx

567bced

Update Multimodal-Linkage.mdx

0f8ca05

Update Scope section on Multimodal-Linkage.mdx

3ff4832

jshoughtaling linked an issue Jun 3, 2024 that may be closed by this pull request

[SOP Document] Multimodal Data Linkage #28

Closed

jshoughtaling self-assigned this Jul 12, 2024

jshoughtaling added 3 commits July 15, 2024 15:29

Implement changes suggested in feedback (#30)

5a22cfb

Implement changes suggested in feedback (#30)

c4d083e

Implement requested changes from tooling meeting (#30)

9447bd1

jshoughtaling merged commit 42ab26b into main Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review multimodal linkage #30

Review multimodal linkage #30

jshoughtaling commented Jun 3, 2024 •

edited

Loading

Review multimodal linkage #30

Review multimodal linkage #30

Conversation

jshoughtaling commented Jun 3, 2024 • edited Loading

jshoughtaling commented Jun 3, 2024 •

edited

Loading