Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost table changes (add PERSON_ID, dates and normalize) #81

Closed
clairblacketer opened this issue Jul 10, 2017 · 37 comments
Closed

Cost table changes (add PERSON_ID, dates and normalize) #81

clairblacketer opened this issue Jul 10, 2017 · 37 comments

Comments

@clairblacketer
Copy link
Contributor

clairblacketer commented Jul 10, 2017

Please see Gowtham's updated Cost table proposal in the comments below

Cost Table Changes (Add Person_id, Dates and normalize)

Proposal Owner: Gowtham Rao, Chris Knoll, Klaus Bonadt

Discussion: forum post

Cost table description: COST

Proposal overview:

  • Add person_id
  • Add billed_datetime and paid_datetime
  • Change Cost table structure - to "normalized" version
  • Add new field cost_domain_concept_id

1. Add Person_id to Cost table

  • Design justification: We have a design principle that all domain related data tables are person-centric where for each record the person_id and date are captured at a minimum. We deviate from this design principle in the COST table. Addition of person_id will make the SQL statement in database systems faster. A missing person_id prevents other lightweight non-database systems to process all transaction tables chunk wise. In certain database environments (specifically Massively Parallel Processing (MPP) that we see in redshift and MS PDW (and possibly HiveDB on Hadoop)) we want to be able to hash on a field of the table. By NOT having a person id in the cost table, we can't hash the data such that the data for a given person can be co-located on the same cluster. By HAVING this person_id field, we can define the table as hashed on person_id.
  • Discussion for and against: However, adding the person id and date will further de-normalize the OMOP CDM Schema. Local deployments of OMOP may add person_id on their own. This was countered by: but that will fail the standardization principle. The query optimizer may not leverage a hashing column if the column isn't applied in the query - standard OHDSI applications will not use person_id if it is not a standard OMOP CDM standard. While trying to calculate an average cost of something by person from the cost table, we have to go to the drug_exposure table, join back to cost (on the domain="drug") and the drug_exposure_id (neither of these are hashed). This will cause a sort of 'shuffle' move of data from the cost table (which can only be hashed on a cost_id) UNLESS we say where cost.person_id = drug_exposure.person_id. Cost-table is already denormalized with many cost entries (allowed amount, paid by payer, paid by patient etc). Alternative would be to redesign the cost table to cost_type and one cost_value instead of many COST entries. Addition of person_id may not impact the already denormalized table, but would rather enable us to process huge datasets without the need of spending cost expensive hardware and software resources for building up complex queries thru multiple inefficient joins.
  • Data integrity checks: To avoid the possibility of database integrity conflicts, e.g. the person_id passed through the event table might be different from the one in the COST table. We need to add it to the constraints (slow), a warning in the description or data quality tools like ACHILLES HEELS, which should watch that kind of thing.

2. Add incurred_date, incurred_datetime, billed_datetime, paid_datetime:

  • Add three new datetime fields in the cost table. billed_datetime and paid_datetime
  • Incurred_date or service_date (match event table) - This is in-line with the design principle.
  • Both fields are optional (Required = No), they are date time fields (Type = DateTime - following OMOP conventions)
  • Costs are associated with a visit_occurrence, procedure_occurrence, drug_occurrence, observation, device etc.
  • There are generally three types of dates associated with costs. Incurred date, Billed Date and Paid date.
  • Incurred date is the date of the service. They are captured in the respective visit, procedure etc.
  • Billed date and paid date are not captured in OMOP CDM. Use cases for these are listed below

Use cases:

  • Health economics and actuarial analysis use incurred date, billed date and paid date in the formulas
  • Incurred But not reported: https://en.wikipedia.org/wiki/Incurred_but_not_reported amount owed by an insurer to all valid claimants who have had a covered loss but have not yet reported it. Very important for claim reserves estimation (represent the money which should be held by the insurer so as to be able to meet all future claims arising from policies currently in force and policies written in the past.)
  • Completion factor trend analysis

Analytic questions:

  • Trend analysis for operational efficiency - claims adjudication rate, completion factor, IBNR, claims reserve estimation

Importance:

  • New use cases around financial and actuarial departments of organizations
  • This will expand the OHDSI/OMOP footprint

Consequence of doing it:

  • Adoption of Cost table may increase.
  • New use cases that serve the needs of financial entities in an organization will expand the adoption of the OMOP CDM.

Consequences of not doing it:

  • Query optimization is difficult.
  • Analyst has to write complex queries. Development of standardized tools may be delayed

3. Leverage cost_type_concept_id and remove cost type columns

  • Proposed structure of new table is below
  • The key idea here is that instead of making each cost-type a column (i.e. wide representation), lets convert to long representation. We will leverage cost_type_concept_id for this.
  • cost_type_concept_id will be used to identify standard cost types: total_charge, total_cost, total_paid, paid_by_payer, paid_by_patient, paid_patient_coinsurance, paid_patient_deductible, paid_by_primary, paid_ingredient_cost, paid_dispensing_fee, payer_plan_period_id, amount_allowed. This allows to generalize the cost table and we represent arbitrary number of cost types. We can also represent international and custom use cases. The concept_ids that would need to be created are:
    • Copayment amount
    • Coinsurance amount
    • Charged by the provider
    • Recovered by the provider
    • Allowed by the primary payer
    • Paid by the primary payer
    • Allowed by the secondary payer
    • Paid by the secondary payer
    • Allowed by all payers
    • Paid by all payers
    • Charged to the patient
    • Paid by the patient total out of pocket
    • Paid by the patient towards co-insurance
    • Paid by the patient towards copay
    • Paid by the patient towards deductible
    • Fee for pharmacy dispensing
    • Cost of pharmacy ingredient
    • Average Wholesale Price amount

4. Add new field cost_domain_concept_id

  • The original cost_domain_id is 'character' field, we want a new integer field called cost_domain_concept_id which is an integer that currently may be one of the following:
    • 8 (Visit)
    • 10 (Procedure)
    • 13 (Drug)
    • 17 (Device)
    • 19 (Condition)
    • 21 (Measurement)
    • 27 (Observation)
    • 36 (Specimen)
Field Required Type Description
cost_id Yes integer A unique identifier for each COST record.
person_id Yes integer A unique identifier for each person.
cost_event_id Yes integer A foreign key identifier to the event (e.g. Measurement, Procedure, Visit, Drug Exposure, etc) record for which cost data are recorded.
cost_domain_id Yes string(50) The concept id representing the domain of the cost event, from which the corresponding table can be inferred that contains the entity for which cost information is recorded.
cost_domain_concept_id Yes integer A foreign key identifier to a concept in the CONCEPT table representing the domain of the cost event
cost_type_concept_id Yes integer A foreign key identifier to a concept in the CONCEPT table for the provenance or the source of the COST data: Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc.
currency_concept_id Yes integer A foreign key identifier to the concept representing the 3-letter code used to delineate international currencies, such as USD for US Dollar.
total_cost Yes float The actual financial cost amount
incurred_date Yes date The first date of service of the clinical event as in table capturing the information (e.g. date of visit, date of procedure, date of condition, date of drug etc).
incurred_datetime No datetime
billed_datetime No datetime The date and time a bill was generated for a service or encounter
paid_datetime No datetime The date and time payment was received for a service or encounter
revenue_code_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for Revenue codes.
drg_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for DRG codes.
revenue_code_source_value No string(50) The source code for the Revenue code as it appears in the source data, stored here for reference.
drg_source_value No string(50) The source code for the 3-digit DRG source code as it appears in the source data, stored here for reference.
@gowthamrao
Copy link
Member

gowthamrao commented Aug 11, 2017

~~@cgreich next CDM meeting? ~~

For cost_domain_concept_id - we used the concept-id of the domain that represents the costs. This is more efficient.
8 Visit
10 Procedure
13 Drug
17 Device
19 Condition
21 Measurement
27 Observation
36 Specimen

changed proposal from cost_domain_concept_id to cost_source_table_concept_id

@cgreich
Copy link
Contributor

cgreich commented Aug 14, 2017

@gowthamrao:

You mean you are using numeric IDs instead of the string ones to gain performance?

@gowthamrao
Copy link
Member

Yes. That's correct. We felt that the Integer representations leads to performance gain, makes joins efficient, and are probably the reason why OMOP CDM is so fast. I don't have benchmarks, but compared to text, pretty sure there is exponential gain.

Any reason to the text? Cost tables can go to millions/billions of rows

@abedtashh
Copy link

I agree with using Concept IDs instead of the text due to better performance. It also applies to other columns in the CDM representing an ID, like Relationship_ID, Class_ID, etc. I prefer using integer instead of text as it would follow database design best practices; however, it makes the tables less human readable before joining the tables with the description of the IDs.

@cgreich
Copy link
Contributor

cgreich commented Aug 14, 2017

You are totally right. But: We went from numeric IDs to alphanumeric ones for the vocabulary_id, domain_id, relationship_id, concept_class_id. Reason is that it is so much easier to do manual queries (you otherwise have to constantly join the reference tables vocabulary, relationship, concept_class etc.) and these are not that many, so you can just index by them (and essentially turn them internally into sequential numerical IDs).

@gowthamrao
Copy link
Member

gowthamrao commented Aug 14, 2017

changed proposal from cost_domain_concept_id to cost_source_table_concept_id

@cgreich I think it is fine for the vocabulary tables because they are relatively small and don't run over billions of rows. But in the clinical tables run over billions of rows, using text as a ID is an inefficient use of the database space. Plus text dont support MPP systems - so you can't shard the data over many nodes. We need to use integers like @abedtashh said.

For the users who want to use 'human readable' alphanumeric ones, they can always do:

NEW

select cost.*
from
       <schema>.cost as cost
inner join
      ( select domain_concept_id
        from <schema>.domain
        where domain_id = 'VISIT'
     ) as domain
on a.cost_domain_id = domain.domain_concept_id
;

so they don't have to memorize the concept-id's, and can retain the human readable.

Current

select cost.*
from
       <schema>.cost as cost
where cost_domain_id = 'VISIT';

@gowthamrao
Copy link
Member

@cgreich @abedtashh @fdefalco what are your thoughts on 'cost_type_concept_id and remove cost type columns' From actuarial, economical and financial analytics perspective - this would really help because we can now support incurred_date, paid_date, billed_date AND we can support so many different types of cost_type's using standardized concept_id's.

@cgreich would like to discuss this at next cdm vocab workgroup - if possible.

@clairblacketer
Copy link
Contributor Author

Hi @gowthamrao you have been added to the agenda for 9/5

@cgreich
Copy link
Contributor

cgreich commented Aug 18, 2017

@gowthamrao:

Actually, bring up both: The numerical ID and the cost. Even though I am not getting what you mean by "cost_type_concept_id and remove cost type columns"

@gowthamrao
Copy link
Member

gowthamrao commented Aug 18, 2017

Will discuss them in the workgroup, but for those who read in advance.

remove wide columns that represent cost types: the proposal is to convert wide to long tables. i.e. currently we have certain types of costs represented in cost table. total_charge total_cost total_paid paid_by_payer paid_by_patient paid_patient_copay paid_patient_coinsurance paid_patient_deductible paid_by_primary paid_ingredient_cost paid_dispensing_fee amount_allowed
some of these are only relevant to drug_occurrence table paid_ingredient_cost paid_dispensing_fee
some these are redundant total_cost total_paid paid_by_payer paid_by_primary if not redundant, they are just not clean. We have an opportunity to make it clean. The number of cost types can be very long -- in the world of 'payment innovation', lot of new type of payment systems are happening - including 'care coordination', 'bonus', 'shared savings', etc. How many columns are we going to add?

We cant add more columns -- we need make this table long form, by creating a cost type concept-id. That way we can do have 'source_concept_id' and 'standardized_concept_id' -- and do stanardized/distributed OHDSI cost studies, AND, support local studies.

@gowthamrao
Copy link
Member

@cgreich @abedtashh another point in favor of using integers for the *_domain_id is we are using integers for _domain_id in FACT_RELATIONSHIP table

@cgreich
Copy link
Contributor

cgreich commented Aug 24, 2017

Why is that any better?

@gowthamrao
Copy link
Member

@cgreich why use concept_id vs concept name? because concept_id's are integer, and concept_name are text -- and integers are more efficient

@gowthamrao
Copy link
Member

@clairblacketer is there a way for me to edit the proposal?

@clairblacketer
Copy link
Contributor Author

Hi @gowthamrao unfortunately not within github itself. However, you can either send me your edits and I can add them here or you can create a new issue with your edits referencing this one.

@gowthamrao
Copy link
Member

gowthamrao commented Aug 24, 2017

changed proposal from cost_domain_concept_id to cost_source_table_concept_id

@clairblacketer then could you please do the following changes to the proposal

cost_domain_id --> cost_domain_concept_id to clarify based on comments above. Original cost_domain_id is 'character' field, we want a new integer field called cost_domain_concept_id which is an integer that currently may be one of the following.

8 Visit
10 Procedure
13 Drug
17 Device
19 Condition
21 Measurement
27 Observation
36 Specimen

cost_concept_id relies on new to be created concept_id's

  • Copayment amount
  • Coinsurance amount
  • Charged by the provider
  • Recovered by the provider
  • Allowed by the primary payer
  • Paid by the primary payer
  • Allowed by the secondary payer
  • Paid by the secondary payer
  • Allowed by all payers
  • Paid by all payers
  • Charged to the patient
  • Paid by the patient total out of pocket
  • Paid by the patient towards co-insurance
  • Paid by the patient towards copay
  • Paid by the patient towards deductible
  • Fee for pharmacy dispensing
  • Cost of pharmacy ingredient
  • Average Wholesale Price amount

@clairblacketer
Copy link
Contributor Author

@gowthamrao done!

@gowthamrao
Copy link
Member

gowthamrao commented Sep 18, 2017

Anybody know the rationale behind leaving revenue_code/drg in cost_table? I think observation table with ability to link to visit tables and cost (via visit table) is the right place for revenue_code/drg

@gowthamrao
Copy link
Member

gowthamrao commented Sep 24, 2017

@clairblacketer after discussing wtih @cgreich - want to change the proposal as a follows. Withdraw the field cost_domain_concept_id and replace with cost_record_table_concept_id.

Explanation:
_domain_concept_id --> was proposed, but is being withdrawn. This is because it is "dirty", as more than one table may belong to the same domain e.g. drug and drug_era, condition_occurrence and condition_era, or visit_occurrence and visit_detail. We cannot tell if the cost_event_id is referencing the visit_occurrence or visit_detail, condition_occurrence .

But we need an alternative, because current cost CDM uses cost_domain_id which is as dirty as _domain_concept_id but also is inefficienct because it is a text search instead of an efficient integer search.

To solve linkage with clinical-event tables (visit_occurrence_id -- cost_event_id or condition_occurrence_id -- cost_id) we are proposing a new field called cost_source_table_concept_id. This proposal is both a new field in Cost CDM, and also new vocabulary.

New vocabulary: For every table in the OMOP CDM we will need to create a new concept_id. See attached. It will be a one time task of inserting into omop vocabulary with maintenance when a new table is added to the omop cdm. See proposed concepts for event_table_concept_id

Advantages: The cost table will now be linkable to any clinical event table using a combination of cost_source_table_concept_id and cost_event_id - where cost_event_id is the FK to the PK of the table represented in the cost_source_table_concept_id.

This will make the queries faster and cleaner.

There are other advantages - e.g. cohort table. The subject_id in the cohort_table maybe the person_id, provider_id, visit_occurrence_id, visit_detail_id --- as a cohort maybe more than just a person (although we currently use it almost exclusively as a person). Adding this table concept_id (may - not part of this proposal) allow us to create cohorts that using a combination of subject_id and _source_table_concept_id is linkable to the table.

@gowthamrao
Copy link
Member

gowthamrao commented Oct 3, 2017

Cost Table Changes (Add Person_id, Dates and normalize)

Proposal Owner: Gowtham Rao, Chris Knoll, Klaus Bonadt

Discussion: forum post

Cost table description: COST

Proposal overview:

  • Add person_id and inccurred_date -- (design principle)
  • Add incurred_date, billed_date and paid_date
  • Change Cost table structure - from wide to tall. ("normalized", replace multiple cost columns with with cost_concept_id and cost_source_concept_id)
  • Add new field event_table_concept_id -- to link to other OMOP CDM tables

1. Add Person_id to Cost table

  • Design justification: We have a design principle that all domain related data tables are person-centric where for each record the person_id and date are captured at a minimum. We deviate from this design principle in the COST table. Addition of person_id will make the SQL statement in database systems faster. A missing person_id prevents other lightweight non-database systems to process all transaction tables chunk wise. In certain database environments (specifically Massively Parallel Processing (MPP) that we see in redshift and MS PDW (and possibly HiveDB on Hadoop)) we want to be able to hash on a field of the table. By NOT having a person id in the cost table, we can't hash the data such that the data for a given person can be co-located on the same cluster. By HAVING this person_id field, we can define the table as hashed on person_id.
  • Discussion for and against: However, adding the person id and date will further de-normalize the OMOP CDM Schema. Local deployments of OMOP may add person_id on their own. This was countered by: but that will fail the standardization principle. The query optimizer may not leverage a hashing column if the column isn't applied in the query - standard OHDSI applications will not use person_id if it is not a standard OMOP CDM standard. While trying to calculate an average cost of something by person from the cost table, we have to go to the drug_exposure table, join back to cost (on the domain="drug") and the drug_exposure_id (neither of these are hashed). This will cause a sort of 'shuffle' move of data from the cost table (which can only be hashed on a cost_id) UNLESS we say where cost.person_id = drug_exposure.person_id. Cost-table is already denormalized with many cost entries (allowed amount, paid by payer, paid by patient etc). Alternative would be to redesign the cost table to cost_type and one cost_value instead of many COST entries. Addition of person_id may not impact the already denormalized table, but would rather enable us to process huge datasets without the need of spending cost expensive hardware and software resources for building up complex queries thru multiple inefficient joins.
  • Data integrity checks: To avoid the possibility of database integrity conflicts, e.g. the person_id passed through the event table might be different from the one in the COST table. We need to add it to the constraints (slow), a warning in the description or data quality tools like ACHILLES HEELS, which should watch that kind of thing.

2. Add billed_date, paid_date:

Add two new date fields in the cost table. billed_date and paid_date
Both fields are optional (Required = No), they are date fields
Costs are associated with a visit_occurrence, procedure_occurrence, drug_occurrence, observation, device etc.
There are generally three types of dates associated with costs. Incurred date, Billed Date and Paid date.
Incurred date is the date of the service. They are captured in the respective visit, procedure etc.
Billed date and paid date are not captured in OMOP CDM. Use cases for these are listed below

Use cases:
Health economics and actuarial analysis use incurred date, billed date and paid date in the formulas
Incurred But not reported: https://en.wikipedia.org/wiki/Incurred_but_not_reported amount owed by an insurer to all valid claimants who have had a covered loss but have not yet reported it. Very important for claim reserves estimation (represent the money which should be held by the insurer so as to be able to meet all future claims arising from policies currently in force and policies written in the past.)
Completion factor trend analysis

Analytic questions:
Trend analysis for operational efficiency - claims adjudication rate, completion factor, IBNR, claims reserve estimation

Importance:
New use cases around financial and actuarial departments of organizations
This will expand the OHDSI/OMOP footprint

Consequence of doing it:
Adoption of Cost table may increase.
New use cases that serve the needs of financial entities in an organization will expand the adoption of the OMOP CDM.

Consequences of not doing it:
Query optimization is difficult.
Analyst has to write complex queries. Development of standardized tools may be delayed

3. Change Cost table structure - from wide to tall by using cost_concept_id and cost_source_concept_id

Proposed structure of new table is below
The idea is that instead of making each cost-type a column (i.e. wide representation), lets convert to long representation. We will leverage concept_id for this. _source_concept_id will represent the source-vocabulary, which will be crosswalked to standard _concept_id for standardized analytics.
cost_concept_id will be used to identify standard cost types. This allows to generalize the cost table and we represent arbitrary number of cost types and supports many different cost types by adding concept-id's for cost. We can also represent international and custom use cases. The concept_ids proposed in this spreadsheet

4. Add new field event_table_concept_id

This field will allow cost table to be linked to any OMOP CDM table for which cost is being represented. cost_table_concept_id will be used to infer the source of the cost information thru vocabulary look-up and then joining by the respective pk. e.g. if the event_table_concept_id points to visit_occurrence table, then cost_event_id = visit_occurrence_id. See attached concepts spreadsheet with proposed new OMOP concepts

Field Required Type Description
cost_id Yes integer A unique identifier for each COST record.
person_id Yes integer A unique identifier for each person.
cost_event_id Yes ? No integer A foreign key identifier to the event (e.g. Measurement, Procedure, Visit, Drug Exposure, etc) record for which cost data are recorded.
cost_domain_id Yes string(50) The concept id representing the domain of the cost event, from which the corresponding table can be inferred that contains the entity for which cost information is recorded.
cost_event_table_concept_id Yes integer A foreign key identifier to a concept in the CONCEPT table representing the identity of the table whose primary key is equal to cost_event_id
cost_concept_id Yes integer A foreign key that refers to a Standard Cost Concept identifier in the Standardized Vocabularies.
cost_type_concept_id Yes integer A foreign key identifier to a concept in the CONCEPT table for the provenance or the source of the COST data: Co-ordination of benefits, Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc.
cost_source_concept_id Yes integer A foreign key to a Cost Concept that refers to the code used in the source.
currency_concept_id Yes integer A foreign key identifier to the concept representing the 3-letter code used to delineate international currencies, such as USD for US Dollar.
cost Yes float The actual financial cost amount
incurred_date Yes date The first date of service of the clinical event corresponding to the cost as in table capturing the information (e.g. date of visit, date of procedure, date of condition, date of drug etc).
billed_date No date The date a bill was generated for a service or encounter
paid_date No date The date payment was received for a service or encounter
revenue_code_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for Revenue codes.
drg_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for DRG codes.
revenue_code_source_value No string(50) The source code for the Revenue code as it appears in the source data, stored here for reference.
drg_source_value No string(50) The source code for the 3-digit DRG source code as it appears in the source data, stored here for reference.
payer_plan_period_id No integer A foreign key to the PAYER_PLAN_PERIOD table, where the details of the Payer, Plan and Family are stored. Record the payer_plan_id that relates to the payer who contributed to the paid_by_payer field.

@don-torok
Copy link

Whenever possible the description for '_concept_id' fields should include the vocabulary or domain to aid in the ETL getting the correct concepts.
For example drg_concept_id: Standard concepts where vocabulary_id = 'DRG'
revenue_code_concept_id: Standard concept where vocabulary_id = 'Revenue Code'
currency_concept_id: ... vocabulary id = 'Currency'

Description for event_source_table_concept_id should state that this is for a new set of concepts, and since the proposal cannot be accepted with out the concepts being defined can also give the new vocabulary id. BTW, would not 'event_table_concept_id 'be a better name. When I see '_source_concept_id' I then look for a '_concept_id (e.g. drug_concept_id, drug_source_concept_id pairing)

If the cost_concept_id is to in reference to a new set of concept, proposal should say that and again provide the vocabulary name. I also think the new concepts need a definition for those of us that are not so closely tied to the payer space but still need to implement the ETL. It was fairly simple at one time because most of the column name in claim tables matched the names in the cost table, but that does not apply to EHR systems. Also is 'Paid by all payers' a summary, or is this a value that is likely to exist in a claims database?

Are we moving to always use date time?

cost_source_concept_id: I do not understand this column, assuming it is the 'source' that corresponds to the cost_concept_id, what concepts are these going to point to?

In proposed concepts Concepts.xlsx, I think the domain should be 'Metadata' instead of 'Type Concept.

@gowthamrao
Copy link
Member

Recap from the Oct 3rd 2017 CDM workgroup, forums and symposium discussion

  1. Presented the need for person_id and incurred_date: discussion held, most in agreement. Details in first post.
  2. Add billed_date and paid_date: the proposed table will have total of three dates (incurred, billed, paid). @cgreich questioned the use-case supporting it. @pbr6cornell supported use-case of financial anlaytics. Details in first post.
  3. Change Cost table structure - to "normalized" version by using cost_concept_id and cost_source_concept_id: we went half-way thru this conversation, till we ran out of time. Discussions continued here . Most liked the pivoting of cost table. Concern 1: There was concern that this would increase the work during ETL, but the new structure allows for more flexibility and is not constrained by the cost-types restricted to legacy claims systems. @don-torok and ericaVoss. Concern 2: There was concern that end-users may add up cost components along with total-costs - leading to erroneous calculation due to double counting. It was proposed to have separate columns for total cost and component costs to avoid that, but that would make standardized analytics difficult. Many data sources do not have both components of cost and total cost for an encounter, and even if they do - many times components don't always add-up to total cost. To avoid ambiguity, the field name was changed from originally proposed 'Total Cost' to just 'Cost'. Follow-up conversations were held, to use the vocabulary to handle the concept-standardization.

@gowthamrao
Copy link
Member

@don-torok
for event_table_concept_id please review this spreadsheet for proposed concept_id's. This is a thought --
why not assign every table in OMOP CDM should have a corresponding concept_id. @cgreich @pbr6cornell thoughts? The orange section is the proposed vocabulary, and the green section are the details. Discussion is here . These new concept_id's will be used for the proposed event_table_concept_id, but using this approach, it may also be useful in many visualization applications - where the app only needs to know the concept_id.

@gowthamrao
Copy link
Member

@don-torok is this what you were recommending? Cost concepts

@don-torok
Copy link

Yes and no (:
You have to understand that I am looking at this as someone who will be doing the ETL. So I will be trying to align what I see in the source documentation to columns, or in this case attribute concepts, in the CDM. The first part of the definition for co-payment is helpful

Co-payment is the fixed amount that is paid for a covered health care service. This payment is usually in addition to deductible. Also known as 'copays'. Copayments may differ based on services within the same plan, like drugs, lab tests, and visits to specialists.

It supplies a brief description, gives a couple of options for source of how the data may be described in the source documentation, co-payment or copays and it tells me it is not the same as a deductible.

However, this part of the description offers little help in doing the ETL defining the attribute in the CDM.
Generally plans with lower monthly premium have higher copayments, and plans with higher monthly premiums have lower copayments.

Another example of where a detailed definition needed is to distinguish between Pharmacy ingredient (the amount charged by the wholesale distributor or manufacturer ) and Average Wholesale. I would like sufficient information to help me determine if a column labeled 'Wholesale cost' should go into Pharmacy ingredient or Average Wholesale.

@gowthamrao
Copy link
Member

Recap from CDM workgroup on Nov 7th 2017

  • clarified that final proposal that includes feedback from participants is here
  • Recap: we recapped discussion from prior workgroup October meeting, and discussants voiced consensus around adding person_id, incurred_date, paid_date and billed_date. We went thru the f/u conversations on the online forums here , and continued where we left off.
  • wide vs long format: Group felt that the long-format (vs wide-format) is a better way of representing data, it would make cost analysis easier. Most systems use wide-format, some organizations (discussed Mark_Danese post on forum) use both forms. It was agreed that OHDSI community would not use both wide and long form. The long-form had advantages of being able to support any number of cost types. It was felt that there may be an initial resistance to adoption (because of unfamiliarity).
  • use of concepts: @don-torok expressed concern that the cost concepts should be clearly and unambiguously described for the proposal to be accepted. Group agreed to come back with a list of concept's for cost. @don-torok to help clarify.
  • adjourned to the next discussion.

@gowthamrao
Copy link
Member

@don-torok please check the updated spreadsheet here

Please provide input on cost concepts being proposed.

@clairblacketer
Copy link
Contributor Author

added in v6.0

@jenniferduryea
Copy link

I know this thread is old and closed, but I do not see the specs for the updated COST table in the V6.0 documentation, specifically here: https://ohdsi.github.io/CommonDataModel/cdm60.html#COST. @clairblacketer , @gowthamrao - is this just a mistake? Where can I get updated specs for the cost table for v6? Thanks!

@cgreich
Copy link
Contributor

cgreich commented Mar 25, 2021

What's missing, @jenniferduryea?

@clairblacketer
Copy link
Contributor Author

I see the problem @cgreich @jenniferduryea. There is no person_id or dates as the issue suggests. I’ll push a fix in the next few days.

@jenniferduryea
Copy link

@clairblacketer and @cgreich both COST tables under the v6.0 and v5.3.1 specs are the same, even though the table has been converted from Wide to Long (unless you kept it wide? which was not my impression). The accompanying wording for ETL convention and description of the table does not reflect the "wide to long" conversion. Also, v6 still shows columns for paid_patient_copay, paid_patient_coinsurance, etc which I believe are now stored under the cost.cost_concept_id field for V6. Basically, the whole COST table has not been updated in the V6 spec from V5.3.1 to reflect all of the changes above.

@clairblacketer
Copy link
Contributor Author

Labelled as Documentation to be fixed in the 2021 CDM Hackathon.

Final Decision

Fix the COST table in the current v6.0 specs and moving forward with v5.4. The correct specification can be found here.

@aandryc aandryc moved this from Ready to In Progress in CDM v5.4 Aug 18, 2021
ablack3 added a commit that referenced this issue Aug 20, 2021
* Add github actions workflow to build package and run tests.

* update Description file

* rename .Rproj file.

* Consolidate 'create' functions into one file.

* Add tests for create functions.

* update description

* removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file.

* consolidate write functions into one file. Add execute function.

* update docs

* add tests for write and execute functions

* update documentation

* Add windows and linux runners in github actions.

* update github actions

* download drivers before running tests

* fix small error in setup test file.

* debug github actions

* debug github actions

* debug github actions

* debug github actions

* fix tiny bug

* comment out execute ddl test

* fix bug in test

* Add execute test back in

* revert accidental change in description

* add print statement for debugging schema error on github actions.

* Fix schema environment variable name

* Add comment to github actions workflow file.

* remove placeholder text in function documentation.

* Rename createdDdl.R to createDdl.R

* Hack-a-thon updates

Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264

* Changed bigint to integer for consistency

* Updated DDLs

* Add tests for redshift. Clean up test setup file.

* Foreign key fixes

* Add imports and update docs.

* Fix bug in setup test script.

* update setup file

* Add tests for oracle and sql server. Move setup.R file.

* fix bug in setup

* debug tests on github

* debug github actions

* debug actions.

* debug actions

* debug actions.

* Add missing secrets to yaml!!

* debug actions

* test connection on all platforms

* add ddl execution

* add windows and linux runners

Co-authored-by: Adam Black <adam.black@odysseusinc.com>
Co-authored-by: Clair Blacketer <mblacke@its.jnj.com>
ablack3 added a commit that referenced this issue Aug 20, 2021
* Add github actions workflow to build package and run tests.

* update Description file

* rename .Rproj file.

* Consolidate 'create' functions into one file.

* Add tests for create functions.

* update description

* removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file.

* consolidate write functions into one file. Add execute function.

* update docs

* add tests for write and execute functions

* update documentation

* Add windows and linux runners in github actions.

* update github actions

* download drivers before running tests

* fix small error in setup test file.

* debug github actions

* debug github actions

* debug github actions

* debug github actions

* fix tiny bug

* comment out execute ddl test

* fix bug in test

* Add execute test back in

* revert accidental change in description

* add print statement for debugging schema error on github actions.

* Fix schema environment variable name

* Add comment to github actions workflow file.

* remove placeholder text in function documentation.

* Rename createdDdl.R to createDdl.R

* Hack-a-thon updates

Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264

* Changed bigint to integer for consistency

* Updated DDLs

* Add tests for redshift. Clean up test setup file.

* Foreign key fixes

* Add imports and update docs.

* Fix bug in setup test script.

* update setup file

* Add tests for oracle and sql server. Move setup.R file.

* fix bug in setup

* debug tests on github

* debug github actions

* debug actions.

* debug actions

* debug actions.

* Add missing secrets to yaml!!

* debug actions

* test connection on all platforms

* add ddl execution

* add windows and linux runners

* Allow user to specify output location in buildRelease

* replace outputpath with outputfolder for consitent argument names in the package.

* Add test for buildRelease.

* replace outputpath with outputfolder for consistency. update documentation.

* move ddl folder to inst so it is accessible from tests

* update documentation

Co-authored-by: Adam Black <adam.black@odysseusinc.com>
Co-authored-by: Clair Blacketer <mblacke@its.jnj.com>
ablack3 added a commit that referenced this issue Aug 20, 2021
* Add github actions workflow to build package and run tests.

* update Description file

* rename .Rproj file.

* Consolidate 'create' functions into one file.

* Add tests for create functions.

* update description

* removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file.

* consolidate write functions into one file. Add execute function.

* update docs

* add tests for write and execute functions

* update documentation

* Add windows and linux runners in github actions.

* update github actions

* download drivers before running tests

* fix small error in setup test file.

* debug github actions

* debug github actions

* debug github actions

* debug github actions

* fix tiny bug

* comment out execute ddl test

* fix bug in test

* Add execute test back in

* revert accidental change in description

* add print statement for debugging schema error on github actions.

* Fix schema environment variable name

* Add comment to github actions workflow file.

* remove placeholder text in function documentation.

* Rename createdDdl.R to createDdl.R

* Hack-a-thon updates

Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264

* Changed bigint to integer for consistency

* Updated DDLs

* Add tests for redshift. Clean up test setup file.

* Foreign key fixes

* Add imports and update docs.

* Fix bug in setup test script.

* update setup file

* Add tests for oracle and sql server. Move setup.R file.

* fix bug in setup

* debug tests on github

* debug github actions

* debug actions.

* debug actions

* debug actions.

* Add missing secrets to yaml!!

* debug actions

* test connection on all platforms

* add ddl execution

* add windows and linux runners

* Resolving conflicts

* Removing unnecessary file

* Trying again to remove .DS_Store, adding to gitignore

* Allow user to specify output location in buildRelease

* replace outputpath with outputfolder for consitent argument names in the package.

* Add test for buildRelease.

* replace outputpath with outputfolder for consistency. update documentation.

* move ddl folder to inst so it is accessible from tests

* update documentation

* Add OMOP header genearator function

Co-authored-by: Adam Black <adam.black@odysseusinc.com>
Co-authored-by: Clair Blacketer <mblacke@its.jnj.com>
Co-authored-by: clairblacketer <clairblacketer@users.noreply.github.com>
ablack3 added a commit that referenced this issue Aug 20, 2021
* Add github actions workflow to build package and run tests.

* update Description file

* rename .Rproj file.

* Consolidate 'create' functions into one file.

* Add tests for create functions.

* update description

* removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file.

* consolidate write functions into one file. Add execute function.

* update docs

* add tests for write and execute functions

* update documentation

* Add windows and linux runners in github actions.

* update github actions

* download drivers before running tests

* fix small error in setup test file.

* debug github actions

* debug github actions

* debug github actions

* debug github actions

* fix tiny bug

* comment out execute ddl test

* fix bug in test

* Add execute test back in

* revert accidental change in description

* add print statement for debugging schema error on github actions.

* Fix schema environment variable name

* Add comment to github actions workflow file.

* remove placeholder text in function documentation.

* Rename createdDdl.R to createDdl.R

* Hack-a-thon updates

Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264

* Changed bigint to integer for consistency

* Updated DDLs

* Add tests for redshift. Clean up test setup file.

* Foreign key fixes

* Add imports and update docs.

* Fix bug in setup test script.

* update setup file

* Add tests for oracle and sql server. Move setup.R file.

* fix bug in setup

* debug tests on github

* debug github actions

* debug actions.

* debug actions

* debug actions.

* Add missing secrets to yaml!!

* debug actions

* test connection on all platforms

* add ddl execution

* add windows and linux runners

* Resolving conflicts

* Removing unnecessary file

* Trying again to remove .DS_Store, adding to gitignore

* Allow user to specify output location in buildRelease

* replace outputpath with outputfolder for consitent argument names in the package.

* Add test for buildRelease.

* replace outputpath with outputfolder for consistency. update documentation.

* move ddl folder to inst so it is accessible from tests

* update documentation

* Add OMOP header genearator function

Co-authored-by: Adam Black <adam.black@odysseusinc.com>
Co-authored-by: Clair Blacketer <mblacke@its.jnj.com>
Co-authored-by: clairblacketer <clairblacketer@users.noreply.github.com>
CDM v5.4 automation moved this from In Progress to Done Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants