Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDM Results Schema #212

Closed
clairblacketer opened this issue Sep 18, 2018 · 10 comments
Closed

CDM Results Schema #212

clairblacketer opened this issue Sep 18, 2018 · 10 comments
Assignees
Projects

Comments

@clairblacketer
Copy link
Contributor

Creation of CDM Results schema


Proposal

Relevant tables:

The four tables listed above need to have the ability to be edited by the user. As is stands now, most CDMs are in read-only schemas which basically renders these tables useless. A formal 'results' schema that allows users write-access should be created to house them. To take it a step further, the COHORT_ATTRIBUTE and ATTRIBUTE_DEFINITION tables should be removed altogether as they have no existing use cases, nor are they currently being used by anyone in the community (to our knowledge).

In summation, we propose to:

  1. Move COHORT and COHORT_DEFINITION to a formal CDM 'results' schema
  2. Remove COHORT_ATTRIBUTE and ATTRIBUTE_DEFINITION

Consequences

A separate results DDL would need to supplied along side the current CDM DDL with each new release.

@clairblacketer clairblacketer self-assigned this Sep 19, 2018
@clairblacketer clairblacketer added this to To do in CDM v6.0 via automation Sep 19, 2018
@vojtechhuser
Copy link
Collaborator

The question of which tables are read only and which writable is hard. With METADATA we have part of that problem as well. If new annotation needs to be made by user and it is in CDM schema.

@gowthamrao
Copy link
Member

If the CDM constructed cohort and cohort_definition tables are moved to results schema, then there maybe a potential risk of conflict with the webapi/Atlas constructed/managed cohort/cohort_definition tables.

In current design, I think, if an external (non Atlas) application or manual queries writes to existing results.cohort results.cohort_deinition tables, it cause Atlas to throw errors.

If we go down the route of using a shared writable cohort table, shared across multiple applications -- maybe we need additional fields for 'application lineage'.. i.e. Atlas would only look for records in cohort table where the application id matches the application id of Atlas.?

@gklebanov
Copy link

gklebanov commented Sep 22, 2018

In general, I do not believe it is a good idea to make any table in OMOP CDM schema (which is really a type of a "data mart" ) writable. In typically enterprise environments, DWs and DMs are not allowed to be updatable by users and only can be updated by incoming ETL processes. E.g. the data needs to flow into CDM via ETL.

If I understand the original idea of having the COHORT and COHORT_DEFINITION tables in OMOP CDM was that if the source data DOES HAVE cohort definitions as a part of that raw data set, these definitions can be then transferred into OMOP CDM (I have seen these in a few data sets). With that concept in mind, it was not meant to be updated by the end users who would instead use ATLAS to define user specific cohorts the definitions of which will be in OHDSI shema but the generated results that would go into RESULTS schema. I think it is not a bad idea to have those tables IF we properly describe the use cases.

As far as METADATA and ANNOTATION, that is an interesting question. If we follow the pattern and want to be consistent, METADATA should be populated by ETL and ANNOTATIONS should really move into RESULTS since it is meant to be updated by the end user. Then ATLAS can be extended to add the annotation functionality that would write into ANNOTATION table sitting in results and would leave the CDM schema untouched. This will also work if we decide to move ACHILLES data into METADATA table.

Another pattern that is emerging is that we always have to create two schemas - one for CDM and one RESULTS, one of data and for one of data analysis. And I think we should always be discussing both in our CDM WG since RESULTS are always linked to and cannot be detached from the CDM schema.

@clairblacketer clairblacketer moved this from To do to Done in CDM v6.0 Sep 25, 2018
@cgreich
Copy link
Contributor

cgreich commented Sep 30, 2018

Annotation table, @gklebanov. Not aware we have such a thing.

@gklebanov
Copy link

@cgreich
Copy link
Contributor

cgreich commented Sep 30, 2018

Do we have this in a proposal here?

@mgurley
Copy link

mgurley commented Sep 30, 2018

Is there a corresponding public documentation of the results schema along the lines of what is done for the CDM: https://github.com/OHDSI/CommonDataModel/wiki?

@cgreich
Copy link
Contributor

cgreich commented Oct 1, 2018

We have to add the documentation about schemas. Right now, the documentation completely avoids prescribing implementation recommendations, such as schemas and privileges to them. Will do.

@clairblacketer
Copy link
Contributor Author

@mgurley the results schema will be new for CDMv6.0 so the documentation will be updated accordingly with the release.

@cgreich the Metadata and Annotation tables do not have a fully fleshed out proposal yet because their structure is still being discussed. Plans are for Ajit to present at the November workgroup meeting.

@gowthamrao I don't believe there will be conflict with the WebAPI if we move these tables to a results schema. We have been using a results schema for a while and it has worked out very well. ATLAS writes all of our cohorts to ohdsi_results.cohort and so far we haven't had any issues (@fdefalco correct me if I'm wrong)

This was referenced Oct 11, 2018
@clairblacketer
Copy link
Contributor Author

added in v6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
CDM v6.0
  
Done
Development

No branches or pull requests

6 participants