-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Create ml_datasets_uscentral1
for penguins
table
#204
Conversation
cc @ivanmkc |
LGTM but @tswast should probably be final reviewer since he has additional context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Technically the GCS-to-GCS step is not needed because the samples data bucket already has a copy in gs://cloud-samples-data-us-central1, but since we want this to also be useful as a potential example for future data sources where that is not the case I think it makes sense to keep it.
@tswast Thanks a lot! I had to add the intermediate GCS-to-GCS step because I started getting this error on the DAG when loading to
But yes, this can be a reference pattern that others who are encountering the same issue can refer to. |
Description
Note: This PR is based out of a feature branch (#203) that now supports adding alternative BQ datasets.
BQ datasets are location-specific, but we (internally) need an
ml_datasets_uscentral1
for upcoming ML guides and tutorials. This PR adds that dataset, which loads the samepenguins
table under it.Checklist
datasets/<YOUR-DATASET>
and nothing outside of that directory.