Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve string delimiter detection in mapping pipline #45

Open
2 tasks
callahantiff opened this issue Oct 9, 2020 · 1 comment
Open
2 tasks

Improve string delimiter detection in mapping pipline #45

callahantiff opened this issue Oct 9, 2020 · 1 comment
Assignees
Labels
bug Something isn't working release v2.0 work related to v2.0

Comments

@callahantiff
Copy link
Owner

callahantiff commented Oct 9, 2020

Describe the Bug

An assumption is made that all concept synonyms and ancestor information will be input in an aggregated format with each aggregated concept separated by a | delimiter. That's a brittle assumption that should be improved. Examples of specs for input data can be found here: resources/clinical_data/README.md

EXAMPLE:
Input Data
The CONCEPT_SYNONYM column below displays data in the expected input format

CONCEPT_ID CONCEPT_SOURCE_CODE CONCEPT_LABEL CONCEPT_SOURCE_LABEL CONCEPT_SYNONYM
37018594 snomed:80251000119104 Complement level below reference range Complement level below reference range Complement level below reference range | Complement level below reference range (finding)

Example of Data that Breaks Assumptions:
The CONCEPT_SYNONYM column below displays data in an unexpected input format (i.e. two types of delimiters | and ;)

CONCEPT_ID CONCEPT_SOURCE_CODE CONCEPT_LABEL CONCEPT_SYNONYM
40771573 loinc:69052-9 Flow cytometry specialist review of results Flow cytometry specialist review of results | Flow cytometry specialist review | Dynamic; Impression; Impression/interpretation of study; Impressions; Interp; Interpretation; Misc; Miscellaneous; Narrative; Other; Point in time; Random; Report; To be specified in another part of the message; Unspecified

Impact Level

LOW - the string similarity mapping pipeline correctly handles all types of pipings allowing the recovery of missed mappings in the exact match part of the pipeline.

Impacted Scripts

omop2obo/clinical_concept_annotator.py

Solution

  • Add a parameter to pass delimiter type
  • Improve tests to better vette
@callahantiff callahantiff added bug Something isn't working release v2.0 work related to v2.0 labels Oct 9, 2020
@callahantiff callahantiff added this to Needed Coding in Coding Tasks via automation Oct 9, 2020
@callahantiff callahantiff self-assigned this Oct 9, 2020
@callahantiff
Copy link
Owner Author

  • Temp work around provided for release v1.0, which handles weird LOINC synonym strings in the SQL query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release v2.0 work related to v2.0
Projects
Coding Tasks
  
Needed Coding
Development

No branches or pull requests

1 participant