Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Onboard Human Variant Annotation dataset #438

Merged

Conversation

vijay-google
Copy link
Collaborator

@vijay-google vijay-google commented Aug 9, 2022

Description

Dataset: human_variant_annotation pipeline: clinvar
Dataset: human_variant_annotation pipeline: db_snp

Checklist

  • (Required) This pull request is appropriately labeled
  • Please merge this pull request after it's approved
  • I'm adding or editing a dataset
  • The Google Cloud Datasets team is aware of the proposed dataset
  • I put all my code inside datasets/human_variant_annotation and nothing outside of that directory

@vijay-google vijay-google self-assigned this Aug 9, 2022
@vijay-google vijay-google added the data onboarding Onboard a dataset or submit a pipeline label Aug 9, 2022
@vijay-google vijay-google changed the title Human variant annotation Feat: Onboard Human Variant Annotation dataset Aug 9, 2022
Copy link
Collaborator

@nlarge-google nlarge-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes. Please make the changes and retest.

gcs_bucket: str,
target_gcs_folder: str,
pipeline: str,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always incorporate a return type. For no return type use:

def fn(...) -> None:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done

source_url = base_url + f"archive_{version}/{date_time.strftime('%Y')}/{file_name}"
source_file = f"./files/{folder}/{file_name}"
status_code = download_gzfile(source_url, source_file)
if status_code == 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

folder: pathlib.Path,
gcs_bucket: str,
target_gcs_folder: str,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

Copy link
Collaborator

@nlarge-google nlarge-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job. In future though, please reduce the number of blank lines between code blocks to 0. Thanks!

@nlarge-google nlarge-google merged commit ebfe4de into GoogleCloudPlatform:main Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data onboarding Onboard a dataset or submit a pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants