-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHC_Core_2023 + LPCH_Core_2023 data updates #399
Comments
Just finished the first renaming. Here's the code I used for generating the SQL commands `!bq ls shc_core_2023 | grep '^ shc_' > tables_to_rename.txt prefix = 'shc_' with open('tables_to_rename.txt' ,'r') as f: |
The second renaming is finished. The code snippet utilized for this purpose has been added as Item 2 under the https://github.com/HealthRex/CDSS/blob/master/setup/BigQueryDataUpdateGuide.MD |
Great. Rather than put all the code in the issue, keep it in the actual code base and just link to the files from here.
Will help us find and collate when we need to do this again in the next iteration.
Thanks,
Jonathan
From: Fatemeh Amrollahi ***@***.***>
Sent: Friday, February 9, 2024 7:55 PM
To: HealthRex/CDSS ***@***.***>
Cc: jonc101 ***@***.***>; Author ***@***.***>
Subject: Re: [HealthRex/CDSS] SHC_Core_2023 + LPCH_Core_2023 data updates (Issue #399)
The second renaming is finished. Here is the code snippet that I used.
from google.cloud import bigquery;
from google.cloud.bigquery import dbapi;
import subprocess
import json
import pdb
import sys
def All_TimeStamp_Columns(table_name,dataset_name='shc_core_2023',project_id='som-nero-phi-jonc101'):
# extract all the columns of table with timestamp
command = f'bq show --schema --format=prettyjson {project_id}:{dataset_name}.{table_name}'
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# command was sucessful
if process.returncode == 0:
datetimescols=[]
schema = json.loads(stdout)
for field in schema:
if field['type'] == 'DATETIME':
datetimescols.append(field['name'])
return datetimescols
else:
print(f'Error running bq command: {stderr.decode("utf-8")}')
return None
def All_Available_tables(dataset_name='shc_core_2023',project_id='som-nero-phi-jonc101'):
#get all the table names aviable in dataset
command = f'bq ls --format=json {project_id}:{dataset_name}'
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# command was sucessful
if process.returncode == 0:
table_names=[]
schema = json.loads(stdout)
for field in schema:
if field['type'] == 'TABLE':
table_names.append(field['tableReference']['tableId'])
return table_names
else:
print(f'Error running bq command: {stderr.decode("utf-8")}')
return None
def Time_to_UTC(columns,table_name,dataset_name='shc_core_2023',project_id='som-nero-phi-jonc101'):
def Auth(project_id='som-nero-phi-jonc101'):
client = bigquery.Client(project_id);
return client
client=Auth()
conn = dbapi.connect(client);
cursor = conn.cursor();
new_table_name=table_name
datetime_col=','.join(map(str, columns))
query=f'CREATE OR REPLACE TABLE `{project_id}.{dataset_name}.{new_table_name}` AS SELECT * '
# add UTC col
for col in columns:
new_col=col+'_utc'
query=query+f",\n timestamp(FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S',timestamp({col})), 'America/Los_Angeles') as {new_col}"
query+=f'\n from `{project_id}.{dataset_name}.{table_name}`;'
cursor.execute(query);
results = cursor.fetchall();
for row in results:
print( row );
cursor.close()
conn.close()
if name == 'main':
project_id='som-nero-phi-jonc101'
dataset_name='shc_core_2023'
table_names=All_Available_tables(dataset_name,project_id)
for table in table_names:
datetimecolumns=All_TimeStamp_Columns(table_name=table,dataset_name=dataset_name,project_id=project_id)
if len(datetimecolumns)>0:
Time_to_UTC(columns=datetimecolumns,table_name=table,dataset_name=dataset_name,project_id=project_id)
—
Reply to this email directly, view it on GitHub <#399 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHU2RW7SOQEXEJVCGNJSBCTYS3VQZAVCNFSM6AAAAABDAPTSDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWHA2DMMBSGU> .
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
[ ] Separate out lpch_core_2023 Move these out to a separate "lpch_core_2023" dataset/database to match |
Migrate over this data from Research IT to our secure compute databases.
[ ] Rename the datasets/tables to remove the shc_ and lpch_ prefixes, so that the naming convention matches prior years (instead just store them in separate shc_core_2023 and lpch_core_2023 databases/datasets)
[ ] Add UTC version of all datetimes
[ ] Extract numerical values from flowsheets
The text was updated successfully, but these errors were encountered: