Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHC_Core_2023 + LPCH_Core_2023 data updates #399

Closed
jonc101 opened this issue Feb 8, 2024 · 4 comments
Closed

SHC_Core_2023 + LPCH_Core_2023 data updates #399

jonc101 opened this issue Feb 8, 2024 · 4 comments
Assignees
Projects

Comments

@jonc101
Copy link
Collaborator

jonc101 commented Feb 8, 2024

Migrate over this data from Research IT to our secure compute databases.
[ ] Rename the datasets/tables to remove the shc_ and lpch_ prefixes, so that the naming convention matches prior years (instead just store them in separate shc_core_2023 and lpch_core_2023 databases/datasets)
[ ] Add UTC version of all datetimes
[ ] Extract numerical values from flowsheets

@jonc101 jonc101 added this to To Do in DevOps via automation Feb 8, 2024
@jyx-su
Copy link

jyx-su commented Feb 9, 2024

Just finished the first renaming. Here's the code I used for generating the SQL commands

`!bq ls shc_core_2023 | grep '^ shc_' > tables_to_rename.txt

prefix = 'shc_'

with open('tables_to_rename.txt' ,'r') as f:
for line in f:
table_name = line.strip().split(' ')[0].strip()
#print(table_name)
print(f'ALTER TABLE shc_core_2023.{table_name} RENAME TO {table_name[len(prefix):]};')`

@fatemeh91
Copy link
Contributor

fatemeh91 commented Feb 10, 2024

The second renaming is finished. The code snippet utilized for this purpose has been added as Item 2 under the

https://github.com/HealthRex/CDSS/blob/master/setup/BigQueryDataUpdateGuide.MD

@jonc101
Copy link
Collaborator Author

jonc101 commented Feb 10, 2024 via email

@jonc101
Copy link
Collaborator Author

jonc101 commented May 28, 2024

[ ] Separate out lpch_core_2023
Looks like a bunch of "lpch_" tables are currently under the shc_core_2023 dataset/database.
These are the Lucile Packhard Children's Hospital data, and should be separate (just like they were labeled as shc_
* tables before).

Move these out to a separate "lpch_core_2023" dataset/database to match

DevOps automation moved this from To Do to Done May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
DevOps
  
Done
Development

No branches or pull requests

4 participants