Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial mapping structure #2

Merged
merged 137 commits into from
Apr 11, 2024
Merged

initial mapping structure #2

merged 137 commits into from
Apr 11, 2024

Conversation

teslajoy
Copy link
Member

@teslajoy teslajoy commented Jan 16, 2024

Pull Request Description

Goals:

  1. Created initial structural definition of case, project, and file’s GDC available fields mappings: Initialize the json structure to enable key mappings from source -> destination.
  2. Created pydantic class for mappings list in schema (and the schema itself): Enabled fast access via key name to each Map in mappings list for updating and/or auditing Maps and created validation function for Map and Schema pydantic models.
  3. Created unit test for class functions and validation -> tested invalid pydantic models.
  4. Pulled GDC available fields to this repo: Enables version control of data while data is being mapped (NOTE: FHIR python library version in setup.py pulls data from R5).
  5. Created a function that saves content mappings in resources: Example gender, race, primary_site, etc . (NOTE: has old code - have to move to new issue)
  6. Updated Schema based on current functionality while keeping old mappings (cleanup process).
  7. Created README file with instructions to setup and a high-level diagram of mapping relations.
  8. Has cli.py for click cmds - placed instructions in md file or README
  9. Has valid type mapping - utilized pydantic types for concise typing (NOTE: has to move to new issue)
  10. has $schema for each Schema class
  11. has some initial mappings - will possibly have to go through them in depth in another pull request.
  12. Cleaned old code (cleanup process).
  13. Has example of full map source -> destination

Features:

  1. Enable User Mapping: facilitate simple mapping ex. project name or project_id from source -> destination preserving GDC data & hierarchy (improvements to come to break down FHIR hierarchy).

Fixes:

  1. issue add testing  #4
  2. issue add content mappings to destination enum's and/or codings  #5
  3. issue add update schema, audit by key, delete by key functionality  #6
  4. issue FHIR functions  #7
  5. issue resolve many to one and one to many mappings  #9
  6. issue case -> Patient  #12
  7. issue remove GenomicStudy  #13
  8. issue add reference for linking and parent tracking  #14

unit testing:

NOTE: this is not an exhaustive testing, but the initial testing on major changes.
we can add exhaustive testing in a separate pull request.

  1. git clone repo
  2. cd fhirizer
  3. python setup.py install
  4. pytest -cov

@teslajoy
Copy link
Member Author

teslajoy commented Jan 16, 2024

Context

This pull request addresses the issue #384 in the bmeg-etl repository.

Checklist

  • Create initial key annotation mappings json structure.
  • Add python utility functions for mappings.
  • Manual mapping.
  • add content annotations for demographics ex race, ethnicity etc. https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=demographic
  • check key compatibility ex. description vs definition etc and FHIR hierarchy naming convention
  • Implement programmatic mapping using Python APIs, aligning with the current project's schemas versions.
  • add high-level entity and modules diagram to readme
  • rename category to parent category in GDC (moved to rename category to parent category in GDC #10 )
  • double check to map enums to existing FHIR codes (TBD - created issue)
  • fix bug in get_key_hierarchy

@teslajoy teslajoy changed the title initial mapping test structure initial mapping structure Jan 26, 2024
"namespace",
"name",
"short_name",
"center_type"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be required? could be use cases where new data doesn't have a clear center type (ie. category of center). does seem that short_name (name of center) could hold info similar to this. aka would it make sense to move center_type to not be required as long as short_name is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point! will definitely have that in mind. For now all required elements are defined via GDC or FHIR. In this case, the data dictionary in resources is pulling GDC dictionary down to keep track of the data version locally as I go through the mappings (mapping list with source -> destination dictionaries).

Copy link

@bwalsh bwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwalsh
Copy link

bwalsh commented Apr 10, 2024

  • test a "fresh" install and ensure pytest pass

@teslajoy
Copy link
Member Author

@bwalsh it's ready for another go - hopefully I caught them all.

@matthewpeterkort
Copy link

matthewpeterkort commented Apr 11, 2024

If it is not intended for the end user to change the output of the "convert" command, it might be better to add in this function into the generate command to simplify things for the end user.

@teslajoy
Copy link
Member Author

Sounds good! I added issue #17 to address Matthew's suggestion. Still using the output of convert for my data mappings. @bwalsh ready for your input.

@teslajoy teslajoy merged commit cbd7f99 into main Apr 11, 2024
@teslajoy teslajoy deleted the mappings branch April 12, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants