Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

Closed
turbomam opened this issue Mar 24, 2019 · 1 comment
Closed

Comments

@turbomam
Copy link

turbomam commented Mar 24, 2019

After doing a hybrid of the Linux bulk load and some R scripts (#17), I'm seeing duplication of row IDs in some tables. For example, my observations table has

observation_id|person_id|observation_concept_id|observation_date|observation_datetime|observation_type_concept_id|value_as_number|value_as_string|value_as_concept_id|qualifier_concept_id|unit_concept_id|provider_id|visit_occurrence_id|visit_detail_id|observation_source_value|observation_source_concept_id|unit_source_value|qualifier_source_value| 
--------------|---------|----------------------|----------------|--------------------|---------------------------|---------------|---------------|-------------------|--------------------|---------------|-----------|-------------------|---------------|------------------------|-----------------------------|-----------------|----------------------| 
             1|        8|               4174876|      1985-09-13|          1985-09-13|                   38000280|               |               |                  0|                   0|              0|          0|                211|              0|424213003               |                      4174876|                 |                      | 
             1|        1|                     0|      2010-07-26|          2010-07-26|                   38000276|               |               |                  0|                   0|              0|          0|                 10|              0|233604007               |                       255848|                 |                      | 

Does person 8 (0141bae5-c190-4e83-aab0-eed8dc2e91bd) have a 424213003 observation from visit 211 (acf6725e-23ea-4e10-a1f6-d8e2196d51fb) on 1985-09-13 or does person 1 (00002c66-a365-4e88-8e80-d52bcad4869e) have a 233604007 from visit 10 (120aa894-4465-4b04-af96-1928191f1c36) on 2010-07-26?

ubuntu@ip-172-31-88-67:~/synthea/output/csv$ grep 424213003 * | grep 0141bae5-c190-4e83-aab0-eed8dc2e91bd
allergies.csv:1985-09-13,,0141bae5-c190-4e83-aab0-eed8dc2e91bd,acf6725e-23ea-4e10-a1f6-d8e2196d51fb,424213003,Allergy to bee venom

ubuntu@ip-172-31-88-67:~/synthea/output/csv$ grep 233604007 * | grep 120aa894-4465-4b04-af96-1928191f1c36
conditions.csv:2010-07-26,,00002c66-a365-4e88-8e80-d52bcad4869e,120aa894-4465-4b04-af96-1928191f1c36,233604007,Pneumonia
encounters.csv:120aa894-4465-4b04-af96-1928191f1c36,2010-07-26T08:39:03Z,2010-07-27T08:39:03Z,00002c66-a365-4e88-8e80-d52bcad4869e,69c66b65-96fc-3702-a7d1-ea4ba3ac3b87,inpatient,32485007,Hospital admission,105.37,233604007,Pneumonia

Same thing for drug_exposure and measurement.

drug_exposure_id|person_id|drug_concept_id|drug_exposure_start_date|drug_exposure_start_datetime|drug_exposure_end_date|drug_exposure_end_datetime|verbatim_end_date|drug_type_concept_id|stop_reason|refills|quantity|days_supply|sig|route_concept_id|lot_number|provider_id|visit_occurrence_id|visit_detail_id|drug_source_value|drug_source_concept_id|route_source_value|dose_unit_source_value|
----------------|---------|---------------|------------------------|----------------------------|----------------------|--------------------------|-----------------|--------------------|-----------|-------|--------|-----------|---|----------------|----------|-----------|-------------------|---------------|-----------------|----------------------|------------------|----------------------|
               1|        1|        1539464|              2003-12-25|                  2003-12-25|            2004-12-24|                2004-12-24|       2004-12-24|            38000177|           |      0|       0|        365|   |               0|0         |          0|                 11|              0|316672           |               1539464|                  |                      |
               1|        1|       40213154|              2008-09-24|                  2008-09-24|            2008-09-24|                2008-09-24|       2008-09-24|              581452|           |      0|       0|          0|   |               0|0         |          0|                 40|              0|140              |              40213154|                  |                      |
               1|        1|              0|              1981-09-03|                  1981-09-03|            1981-09-03|                1981-09-03|                 |              581452|           |      0|       0|          0|   |               0|0         |          0|                  4|              0|92691004         |                200970|                  |                      |

BTW, source_to_concept_map is empty

Could this have something to do with my vocabulary choices? I'm using everything that can be downloaded without requiring a licence. I have loaded the remotely-downloaded CPT codes.

Possibly relevant warnings?

> local.LoadCDMTables(cd,"cdm_synthea10","native")
Connecting using PostgreSQL driver
Running: insert_person.sql
  |======================================================================| 100%
Executing SQL took 0.00914 secs
Running: insert_observation_period.sql
  |======================================================================| 100%
Executing SQL took 0.027 secs
Running: insert_visit_occurrence.sql
  |======================================================================| 100%
Executing SQL took 0.153 secs
Running: insert_condition_occurrence.sql
  |======================================================================| 100%
Executing SQL took 20.2 secs
Running: insert_observation.sql
  |======================================================================| 100%
Executing SQL took 21.7 secs
Running: insert_measurement.sql
  |======================================================================| 100%
Executing SQL took 8.85 mins
Running: insert_procedure_occurrence.sql
  |======================================================================| 100%
Executing SQL took 47.6 secs
Running: insert_drug_exposure.sql
  |======================================================================| 100%
Executing SQL took 1.3 mins
Running: insert_condition_era.sql
  |======================================================================| 100%
Executing SQL took 0.152 secs
Running: insert_drug_era.sql
  |======================================================================| 100%
Executing SQL took 25.2 secs
Warning messages:
1: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
2: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
3: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
4: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
5: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
6: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
7: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
> 

@turbomam
Copy link
Author

turbomam commented Mar 25, 2019

closing... I found that running the R scripts after a failed bulk load might have led to duplicate row IDs, mixed up concept realms and concept_id values of 0 in the clinical tables (#17). In the end, I just set the schema search path to include native and cdm_synthea10 and hand-ran the queries in ETL/SQL/*.sql, starting with "creating visit logic tables..." (ETL/SQL/AllVisitTable.sql)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant