Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Onboard Mimiciii dataset #449

Closed
wants to merge 6 commits into from

Conversation

Naveen130
Copy link
Collaborator

@Naveen130 Naveen130 commented Aug 17, 2022

Description

This is to onboard mimiciii dataset with 25 pipelines using Airflow v2 operators only.

Checklist

  • (Required) This pull request is appropriately labeled
  • Please merge this pull request after it's approved

Use the sections below based on what's applicable to your PR and delete the rest:

Feature

  • I'm adding or editing a feature
  • I have updated the README accordingly
  • I have added/revised tests for the feature

Data Onboarding

  • I'm adding or editing a dataset
  • The Google Cloud Datasets team is aware of the proposed dataset
  • I put all my code inside datasets/mimiciii and nothing outside of that directory

Code cleanup or refactoring

  • I'm refactoring or cleaning up some code

@Naveen130
Copy link
Collaborator Author

This PR includes 2 activities w.r.t. the dataset -

  1. Removes mimicIII (uses Airflow v1 operator, not acceptable anymore)
  2. Adds mimiciii (uses accepted Airflow v2 operator)

@Naveen130 Naveen130 added the data onboarding Onboard a dataset or submit a pipeline label Aug 17, 2022
@Naveen130 Naveen130 self-assigned this Aug 17, 2022
@adlersantos
Copy link
Member

Use the BQ data transfer service. There are examples in the repo that you can just reuse:
https://github.com/GoogleCloudPlatform/public-datasets-pipelines/tree/main/datasets/scalable_open_source/pipelines/_images/bq_data_transfer

@adlersantos
Copy link
Member

Also, please rename to mimic_iii

Copy link
Member

@adlersantos adlersantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use BQ data transfer and rename to mimic_iii

Copy link
Collaborator

@nlarge-google nlarge-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra lines

"Fetching the source tables from bq. Each pipeline will be undergoing ETL"
)
source_table_names = fetch_source_tables(source_project, source_dataset)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra lines

@Naveen130
Copy link
Collaborator Author

Closing this pull request as there is a new one, and this one is obsolete.

@Naveen130 Naveen130 closed this Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data onboarding Onboard a dataset or submit a pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants