Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to populate year_of_birth when it is missing from the source #92

Closed
burrowse opened this issue Apr 1, 2024 · 4 comments · Fixed by #116
Closed

How to populate year_of_birth when it is missing from the source #92

burrowse opened this issue Apr 1, 2024 · 4 comments · Fixed by #116
Assignees

Comments

@burrowse
Copy link

burrowse commented Apr 1, 2024

How to populate year_of_birth when it is missing from the source

CDM or THEMIS convention?

Themis

Is this a general convention?

No

Summary of issues

  • There may be cases in the source data where the year of birth is not explicitly available but there is an age bucket, group or categorization is available.

Summary of answer

  • For data sources where the year of birth is not available, the approximate year of birth could be derived based on age group categorization, if available.

Related links

@waydes
Copy link
Collaborator

waydes commented Apr 5, 2024

Issue # and location

NA

Issue summary

The lack of year_of_birth creates a dilemma on how to process those records. If an age group categorization is available, the approximate year of birth can be derived. I could not find guidance on how to estimate year of birth from age group categorization.
The age of a patient is so important to observational research that we have the convention to exclude patients without known age. The recommendation is to eliminate those records from a study.

Discussions in the forums indicate that setting year of birth to NULL precludes finding those records in SQL queries. Incorrect and inconsistent results occur when setting year of birth to 0. When year_of_birth is 0, Postgres calculates an age of 2021 years in but In SQL Server it would be 122 years old as year 0 is 1900-01-01.

Setting all unknown year of birth to specific year creates problems in performing network studies as the tools and alogortithms used in network studies do not include control structures (if/then or switch statements ) to identify unknown year of birth when set to an incorrect year of birth with the assumption that that year means "unknown year of birth". Modifying the code in tools to accomodate the idiosyncrasies of databases creates problems and requires additional work. This same issue occurs when year of birth is set to 0 or NULL.

The lack of year of birth raises an issue about year of birth know to be incorrect. Examples include year of birth after today''s year year of birth after the most recent year of visit or other fields with year.

Convention type

Table

CDM table

Person

CDM field

year_of_birth

Links to issue discussion

Provenance of data.

General

The ratified convention

For data sources with date of birth, the year should be extracted. For data sources where the year of birth is not available, the approximate year of birth could be derived based on age group categorization, if available. If no year of birth is available all the person’s data should be dropped from the CDM instance.

Date of ratification/published

4/9/2024

Downstream implications

No

Link to DQD check

Yes - isRequired.

Related conventions/further information

Other helpful information, if needed. i.e. related conventions, queries to evaluate source or CDM data, or any additional information

#Tags
birthdate, birthyear, year_of_birth

@waydes waydes self-assigned this Apr 5, 2024
@clairblacketer
Copy link
Collaborator

@waydes is this one ready to review?

@waydes
Copy link
Collaborator

waydes commented Apr 9, 2024 via email

@clairblacketer
Copy link
Collaborator

Opened PR #116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

3 participants