Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

turbomam · 2019-03-24T22:04:32Z

After doing a hybrid of the Linux bulk load and some R scripts (#17), I'm seeing duplication of row IDs in some tables. For example, my observations table has

observation_id|person_id|observation_concept_id|observation_date|observation_datetime|observation_type_concept_id|value_as_number|value_as_string|value_as_concept_id|qualifier_concept_id|unit_concept_id|provider_id|visit_occurrence_id|visit_detail_id|observation_source_value|observation_source_concept_id|unit_source_value|qualifier_source_value| 
--------------|---------|----------------------|----------------|--------------------|---------------------------|---------------|---------------|-------------------|--------------------|---------------|-----------|-------------------|---------------|------------------------|-----------------------------|-----------------|----------------------| 
             1|        8|               4174876|      1985-09-13|          1985-09-13|                   38000280|               |               |                  0|                   0|              0|          0|                211|              0|424213003               |                      4174876|                 |                      | 
             1|        1|                     0|      2010-07-26|          2010-07-26|                   38000276|               |               |                  0|                   0|              0|          0|                 10|              0|233604007               |                       255848|                 |                      |

Does person 8 (0141bae5-c190-4e83-aab0-eed8dc2e91bd) have a 424213003 observation from visit 211 (acf6725e-23ea-4e10-a1f6-d8e2196d51fb) on 1985-09-13 or does person 1 (00002c66-a365-4e88-8e80-d52bcad4869e) have a 233604007 from visit 10 (120aa894-4465-4b04-af96-1928191f1c36) on 2010-07-26?

ubuntu@ip-172-31-88-67:~/synthea/output/csv$ grep 424213003 * | grep 0141bae5-c190-4e83-aab0-eed8dc2e91bd
allergies.csv:1985-09-13,,0141bae5-c190-4e83-aab0-eed8dc2e91bd,acf6725e-23ea-4e10-a1f6-d8e2196d51fb,424213003,Allergy to bee venom

ubuntu@ip-172-31-88-67:~/synthea/output/csv$ grep 233604007 * | grep 120aa894-4465-4b04-af96-1928191f1c36
conditions.csv:2010-07-26,,00002c66-a365-4e88-8e80-d52bcad4869e,120aa894-4465-4b04-af96-1928191f1c36,233604007,Pneumonia
encounters.csv:120aa894-4465-4b04-af96-1928191f1c36,2010-07-26T08:39:03Z,2010-07-27T08:39:03Z,00002c66-a365-4e88-8e80-d52bcad4869e,69c66b65-96fc-3702-a7d1-ea4ba3ac3b87,inpatient,32485007,Hospital admission,105.37,233604007,Pneumonia

Same thing for drug_exposure and measurement.

drug_exposure_id|person_id|drug_concept_id|drug_exposure_start_date|drug_exposure_start_datetime|drug_exposure_end_date|drug_exposure_end_datetime|verbatim_end_date|drug_type_concept_id|stop_reason|refills|quantity|days_supply|sig|route_concept_id|lot_number|provider_id|visit_occurrence_id|visit_detail_id|drug_source_value|drug_source_concept_id|route_source_value|dose_unit_source_value|
----------------|---------|---------------|------------------------|----------------------------|----------------------|--------------------------|-----------------|--------------------|-----------|-------|--------|-----------|---|----------------|----------|-----------|-------------------|---------------|-----------------|----------------------|------------------|----------------------|
               1|        1|        1539464|              2003-12-25|                  2003-12-25|            2004-12-24|                2004-12-24|       2004-12-24|            38000177|           |      0|       0|        365|   |               0|0         |          0|                 11|              0|316672           |               1539464|                  |                      |
               1|        1|       40213154|              2008-09-24|                  2008-09-24|            2008-09-24|                2008-09-24|       2008-09-24|              581452|           |      0|       0|          0|   |               0|0         |          0|                 40|              0|140              |              40213154|                  |                      |
               1|        1|              0|              1981-09-03|                  1981-09-03|            1981-09-03|                1981-09-03|                 |              581452|           |      0|       0|          0|   |               0|0         |          0|                  4|              0|92691004         |                200970|                  |                      |

BTW, source_to_concept_map is empty

Could this have something to do with my vocabulary choices? I'm using everything that can be downloaded without requiring a licence. I have loaded the remotely-downloaded CPT codes.

Possibly relevant warnings?

> local.LoadCDMTables(cd,"cdm_synthea10","native")
Connecting using PostgreSQL driver
Running: insert_person.sql
  |======================================================================| 100%
Executing SQL took 0.00914 secs
Running: insert_observation_period.sql
  |======================================================================| 100%
Executing SQL took 0.027 secs
Running: insert_visit_occurrence.sql
  |======================================================================| 100%
Executing SQL took 0.153 secs
Running: insert_condition_occurrence.sql
  |======================================================================| 100%
Executing SQL took 20.2 secs
Running: insert_observation.sql
  |======================================================================| 100%
Executing SQL took 21.7 secs
Running: insert_measurement.sql
  |======================================================================| 100%
Executing SQL took 8.85 mins
Running: insert_procedure_occurrence.sql
  |======================================================================| 100%
Executing SQL took 47.6 secs
Running: insert_drug_exposure.sql
  |======================================================================| 100%
Executing SQL took 1.3 mins
Running: insert_condition_era.sql
  |======================================================================| 100%
Executing SQL took 0.152 secs
Running: insert_drug_era.sql
  |======================================================================| 100%
Executing SQL took 25.2 secs
Warning messages:
1: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
2: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
3: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
4: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
5: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
6: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'vocab_schema' not found in SQL
7: In SqlRender::render(sqlQuery, cdm_schema = cdmDatabaseSchema, synthea_schema = syntheaDatabaseSchema,  :
  Parameter 'synthea_schema' not found in SQL
>

The text was updated successfully, but these errors were encountered:

turbomam · 2019-03-25T00:24:02Z

closing... I found that running the R scripts after a failed bulk load might have led to duplicate row IDs, mixed up concept realms and concept_id values of 0 in the clinical tables (#17). In the end, I just set the schema search path to include native and cdm_synthea10 and hand-ran the queries in ETL/SQL/*.sql, starting with "creating visit logic tables..." (ETL/SQL/AllVisitTable.sql)

turbomam mentioned this issue Mar 25, 2019

Broken search paths and bad SQL file names in Linux bulk load #17

Closed

turbomam closed this as completed Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

turbomam commented Mar 24, 2019 •

edited

Loading

turbomam commented Mar 25, 2019 •

edited

Loading

Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

Duplicated row IDs related to "Parameter 'vocab_schema' not found in SQL"? #18

Comments

turbomam commented Mar 24, 2019 • edited Loading

turbomam commented Mar 25, 2019 • edited Loading

turbomam commented Mar 24, 2019 •

edited

Loading

turbomam commented Mar 25, 2019 •

edited

Loading