Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SURVEY data in OMOP CDM - updated #137

Closed
ColinOrr2006 opened this issue Nov 23, 2017 · 7 comments
Closed

SURVEY data in OMOP CDM - updated #137

ColinOrr2006 opened this issue Nov 23, 2017 · 7 comments

Comments

@ColinOrr2006
Copy link

Adding Patient Reported Outcome data to CDM

  • Requestor: Colin Orr and Catherine Kerr, ICON plc
  • Revising party: Joshua Ransom, Anna Corning, Emelly Rusli, Rayhnuma Ahmed, Aaron Stern; SHYFT Analytics
  • Discussion: here

Background

ICON plc is currently engaged in a project with [[http://www.ichom.org/|ICHOM (International Consortium for Health Outcomes Measurement]].
ICHOM's mission is to unlock the potential of value-based healthcare by defining global Standard Sets of outcome measures that really matter to patients for the most relevant medical conditions and by driving adoption and reporting of these measures worldwide.
ICHOM brings together patient representatives, clinician leaders, and registry leaders from all over the world to develop Standard Sets, comprehensive yet parsimonious sets of outcomes and case-mix variables for specific medical conditions that ICHOM recommends all providers track.
Each Standard Set focuses on patient-centered results, and provides an internationally-agreed upon method for measuring each of these outcomes. ICHOM believes that standardized outcomes measurement will open up new possibilities to compare performance globally, allow clinicians to learn from each other, and rapidly improve the care provided to patients.
ICHOM Standard Sets include baseline conditions and risk factors to enable meaningful case-mix adjustment globally, ensuring that comparisons of outcomes will take into account the differences in patient populations across not just providers, but also countries and regions. They also include high-level treatment variables to allow stratification of outcomes by major treatment types. A comprehensive data dictionary, as well as scoring guides for patient-reported outcomes is provided for each Standard Set.

Proposal

ICON plc is developing a platform to ingest, store and analyse the patient outcome measures and is using the OMOP Common Data Model to store the data. The current CDM satisfies many of the requirements, but there are some gaps, specifically:

  • We need to store data relating to each Patient Reported Outcome (PRO) questionnaire that is completed by a patient. Examples of this type of data are; timestamp of when the questionnaire was completed, did the patient complete it with assistance, role of person who completed the questionnaire, etc. We also need to store the attributes related to the timing of the survey in relation to the treatment the patient received - for example, 'baseline', or 'six month follow-up'. This is additional contextual data that allows us to compare outcomes over time. To store this data, we propose introducing a new SURVEY table. Each row in the table represents an instance of a completed survey and serves to link a number of survey questions and answers together. Individual questions and their answers are stored as name-value pairs in the OBSERVATIONS table. The OBSERVATIONS table requires some additional columns in order to maintain the relationship with the patient questionnaire (SURVEY) as described below.

SURVEY table

The SURVEY table is used to store an instance of a completed survey or questionnaire. It captures details of the individual questionnaire such as who completed it, when it was completed and to which patient treatment or visit it relates to (if any). Each SURVEY has a SURVEY_CONCEPT_ID, a concept in the CONCEPT table identifying the questionnaire e.g. EQ5D, VR12, SF12. Each questionnaire should exist in the CONCEPT table. Each SURVEY can be optionally related to a specific patient visit in order to link it to a specific patient assessment or treatment.

Field Required Type Description
SURVEY_OCCURRENCE_ID Yes integer Unique identifier for each completed survey
SURVEY_CONCEPT_ID Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of survey.
PERSON_ID Yes integer A foreign key identifier to the Person in the PERSON table about whom the survey was completed
VISIT_OCCURRENCE_ID No integer A foreign key to the visit_occurrence table during which the  survey was completed
RESPONSE_TO_VISIT_OCCURRENCE_ID No   A foreign key to the visit in the visit_occurrence table during which treatment was carried out that relates to this survey.
SURVEY_START_DATE No date Date on which the survey was started
SURVEY_START_DATETIME No Timestamp Date and time on which the survey was started
SURVEY_END_DATE Yes Date Date on which the survey was completed
SURVEY_END_DATETIME No Timestamp Date and time on which the survey was completed
ASSISTED_CONCEPT_ID No integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies indicating whether the survey was completed with assistance or not (yes / No)
ASSISTED__SOURCE_VALUE No varchar(100) Source value representing whether patient required assistance to complete the survey. Example: “Completed without assistance”, ”Completed with assistance”.
RESPONDENT_TYPE_ CONCEPT_ID No integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the respondent type. Example: Research Associate, Patient
RESPONDENT_SOURCE_VALUE No varchar(100) Source code representing role of person who completed the survey.
TIMING_CONCEPT_ID No integer A foreign key that refers to a timing Concept identifier in the Standardized Vocabularies Example: 3 month follow-up, 6 month follow-u, …
TIMING_SOURCE_VALUE No varchar(100) Text string representing the timing of the survey. Example: Baseline, 6-month follow-up
COLLECTION_METHOD_ CONCEPT_ID No varchar(10) A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the data collection method (e.g. Paper, Telephone, Electronic Questionnaire)
COLLECTION_METHOD_SOURCE_VALUE No varchar(100) The collection method as it appears in the source data.
SURVEY_SOURCE_VALUE No varchar(100) The survey name/title as it appears in the source data.
SURVEY_SOURCE_IDENTIFIER No varchar(100) Unique identifier for each completed survey in source system
VALIDATED_SURVEY_ CONCEPT_ID No Integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the validation status of the survey.
SURVEY_VERSION_NUMBER No Varchar2(20) Version number of the questionnaire / survey used.
PROVIDER_ID      

OBSERVATION table

Patient responses to survey questions are stored in the OBSERVATION table. Each record in the OBSERVATION table represents a single question/response pair and is linked to a specific SURVEY / questionnaire in the SURVEY_OCCURRENCE_ID. Each response record is the response to a specific question identified by the OBSERVATION_CONCEPT_ID. This concept ID is a unique question contained in the CONCEPT table. An individual survey question can have multiple responses to a question (e.g. which of these items relate to you, a, b, c ,…?). Each response is stored as a separate record in the OBSERVATION table.
The question / answer observation record is linked to the patient questionnaire used for collecting the data using two new fields in the OBSERVATION table; DOMAIN_ID and DOMAIN_OCCURRENCE_ID. DOMAIN_ID for any survey related observations contains the text ‘Survey’ and DOMAIN_OCCURRENCE_ID contains the SURVEY_OCCURRENCE_ID of the specific survey. This domain construct can be used for other observation groupings.
The OBSERVATION table can also store survey scoring results. Many validated PRO questionnaires have scoring algorithms (many of which proprietary) that return an overall patient score based on the answers provided.. Survey scores are identified by their OBSERVATION_CONCEPT_ID and are linked back to the scored survey using the same DOMAIN construct described.
In the name/value pair model, the name (question) is stored as OBSERVATION_CONCEPT_ID and the value (answer) is stored as OBSERVATION_AS_CONCEPT_ID where the answer is categorical and is defined as a concept in the concept table, OBSERVATION_AS_NUMBER where the answer is numeric, OBSERVATION_AS_STRING where the answer is a free text string or OBSERVATION_AS_DATETIME.

Amendments required to the OBSERVATION table are as follows

Change Field Required Type Description
New DOMAIN_OCCURRENCE_ID No integer A foreign key to SURVEY table
New DOMAIN_ID No   ‘Survey’
New VALUE_AS_DATETIME No Timestamp The observation result stored as a datetime value. This is applicable to observations where the result is expressed as a point in time.

Other Considerations

  • Extensions to the concept table include the survey and response data that is not currently contained in the standard libraries. All custom extensions to the concept table have been stored in the negative address space so as not to conflict with the currently defined standard. These extensions are not included in the definition of this proposal but should be considered for future work.
  • There is no formal definition of the relationship between a questionnaire/survey and the questions presented on that survey. There is an implicit relationship created when survey/response data is stored. If an explicit relationship is required, this can be achieved using the FACT_RELATIONSHIP table.

Use Cases

The example below describes the data to be stored for a question on the HOOSPS (Hip Disability and Osteoarthritis Outcome Score) patient questionnaire.
The question asks the degree of difficulty in descending stairs due to the patient's hip problem. The patient answers "Moderate".
The CONCEPT table contains domain data for the survey HOOSPS, question (HPS1) plus all the potential values that a patient can respond with.

CONCEPT table – example

CONCEPT _ID CONCEPT  _NAME DOMAIN _ID VOCABULARY_ID CONCEPT_ CLASS_ID STANDARD  _CONCEPT CONCEPT _CODE
-2020 HPS1 Metadata Domain Domain   ICHOM generated
-2021 None HPS1 ICHOM Observation PRO Measure S 0
-2022 Mild HPS1 ICHOM Observation PRO Measure S 1
-2023 Moderate HPS1 ICHOM Observation PRO Measure S 2
-2024 Severe HPS1 ICHOM Observation PRO Measure S 3
-2025 Extreme HPS1 ICHOM Observation PRO Measure S 4
-3501 HOOSPS Metadata ICHOM Survey Domain   ICHOM generated

The patient response is captured as a code 2 (in this instance) in the questionnaire. The CONCEPT_ID is determined by finding a match in the concept table for the code (2) for the specific question (identified by HPS1) in column DOMAIN_ID and the response value (2) in the column CONCEPT_CODE.

SURVEY table - example

Column Value Comment
SURVEY_OCCURRENCE_ID 19073  
SURVEY_CONCEPT_ID -3501 Concept for HOOSPS survey
PERSON_ID 21405  
VISIT_OCCURRENCE_ID    
RESEPONSE_TO_VISIT_OCCURRENCE_ID 13403  
SURVEY_START_DATE    
SURVEY_END_DATE 2016-07-14  
ASSISTED_CONCEPT_ID -3601 Concept for "Completed without assistance"
     
ASSISTED_SOURCE_VALUE Complete w/o assistance  
RESPONDENT_TYPE_CONCEPT_ID 3611 Concept for "Patient-reported"
RESPONDENT_SOURCE_VALUE P-REP Source system value for "Patient-reported"
TIMING_CONCEPT_ID -3621 Concept for "BASELINE" timing
TIMING_SOURCE_VALUE Baseline  
COLLECTION_METHOD_CONCEPT_ID -3631 Concept for "Electronic questionnaire"
COLLECTION_METHOD_SOURCE_VALUE E-QUEST Source system value for "Electronic questionnaire"
SURVEY_SOURCE_VALUE HOOSPS  
SURVEY_SOURCE_IDENTIFIER HS001234  
VALIDATED_SURVEY_CONCEPT_ID -3701 Concept for "Validated survey"
SURVEY_VERSION_NUMBER    
PROVIDER_ID    

OBSERVATION table - example

Column Value Comment
OBSERVATION_ID 794657  
PERSON_ID 21405  
OBSERVATION_CONCEPT_ID -2020 Concept for HPS1
OBSERVATION_DATE 2016-07-14  
OBSERVATION_DATETIME    
OBSERVATION_TYPE_CONCEPT_ID XXX CONCEPT_ID to indicate PRO survey response’
VALUE_AS_CONCEPT_ID -2023 Concept for "Moderate"
VALUE _AS_STRING    
VALUE _AS_NUMBER    
VALUE_AS_DATETIME    
PROVIDER_ID    
VISIT_OCCURRENCE_ID    
OBSERVATION_SOURCE_VALUE degree of difficulty in ….  
OBSERVATION_SOURCE_CONCEPT_ID    
DOMAIN_ID Survey  
DOMAIN_OCCURRENCE_ID 19073  
@vojtechhuser
Copy link
Collaborator

The proposal argues for:

Each SURVEY has a SURVEY_CONCEPT_ID, a concept in the CONCEPT table identifying the questionnaire e.g. EQ5D, VR12, SF12. Each questionnaire should exist in the CONCEPT table. Each SURVEY can be optionally related to a specific patient visit in order to link it to a specific patient assessment or treatment.

This will needed maintenance !
Efforts under the name "Common Data Elements" (for research) try to standardize those. (Kansas City Cardiomyopathy Questionnaire).

@don-torok
Copy link

Whether the CMD should or should not include Health Outcome Measures is a different question from should this proposal be accepted. I plan to say the proposal should not be accepted, based upon the implementation. The proposal tries to fit Outcome Measures into the existing CDM, but the CDM does not handle it very well.
Weaknesses of current proposal are: No relation between a survey and question except the backward relation from the new proposed columns domain_occurrence_id and domain_id; No relation between questions and allowable answers; Using concepts to define questions is stretching the idea of what a concept is; Adding domain_occurrence_id and domain_id to Observation to tie the question/answers to survey is not how the survey/question relationship should be defined.
If it is determined that Health Outcome Measures should be added to the CDM a new set of tables designed to model the patient survey, questions, answers and patient responses will be a much cleaner solution.

@ColinOrr2006
Copy link
Author

@don-torok , thanks for that input. Just to set the context a little further here, the thinking behind the design here is to keep it to the same level of normalization as the other OMOP domains and minimize the number of new entities or domains. This approach is consistent with the overall OMOP philosophy?Taking each point in turn;

  1. The core domain is survey which represents the instance of someone completing patient questionnaire or survey. The answers are stored in observations and as point out the relationship between the two is through a key in the observations table. I have used to design to simplify querying and data analysis. An alternative of course would be to use the relationship table which will have an impact on performance.
  2. Relationship between questions and allowable answers. I have explicitly omitted this for the scope of the proposal as I see this more as a vocabulary definition rather than specifically survey data model. The vocabulary aspect is obviously important but it will be different for every use case. If you look at PROMIS, these questions are their allowable answers are defined in the LOINC vocabulary and hence in the CONCEPT table (if they are loaded into your specific instance). Each use case will have their specific vocabulary that is either currently available or need to be created. In my case, I am working on creating ICHOM specific vocabulary that is included in my concept tables.
  3. Using concepts to define questions and allowable answers is already being used within OMOP. As per (2) above LOINC have defined PROMIS is their vocabulary and is presented in OMOP using CONCEPT table.
  4. Using DOMAIN_OCCURRENCE_ID as link to Survey, This is the same as point 1 which hopefully I have explained clearly the reasoning here.
  5. I believe it has been determined to add PRO to OMOP.

I would be interested to get your view on what use-cases that the current proposal does not support. Are there specific scenarios that you are dealing with or have considered that cannot be supported using the proposed model.

@clairblacketer clairblacketer added this to In progress in CDM v6.0 Jul 16, 2018
@clairblacketer clairblacketer moved this from In progress to on Dev in CDM v6.0 Jul 25, 2018
@clairblacketer clairblacketer moved this from on Dev branch/in development to Done in CDM v6.0 Sep 25, 2018
This was referenced Oct 11, 2018
@clairblacketer
Copy link
Contributor

added in v6.0

@mqunell8
Copy link

mqunell8 commented Mar 1, 2021

@clairblacketer We hope to use the survey table and observation table as prescribed here. The BigQuery DDL for v6.0 though is missing the new fields for the observation table, even though these are mentioned in the published standard. Can you clarify the status? Is there updated DDL, or should we just add it ourselves?

@nseth04
Copy link

nseth04 commented Jun 2, 2023

@clairblacketer - Can you provide an updated version of the discussion and proposal above? Is the above still applicable and valid? If this was versioned as v6.0, I do not see survey_occurrence_id field or domain_occurrence fields in 6.0 or 5.4

@gkennos
Copy link

gkennos commented Jun 17, 2024

@nseth04 , @mqunell8 not sure if you're still looking for this, but the only place I could find the full spec of the added fields is here

Look for survey_conduct table instead of survey_occurrence and look for observation.obs_event_field_concept_id and observation.observation_event_id instead of the domain / domain_occurrence_id fields though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
CDM v6.0
  
Done
Development

No branches or pull requests

7 participants