Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Onboard NASA wildfire #275

Merged
merged 14 commits into from May 19, 2022

Conversation

gkodukula
Copy link
Collaborator

@gkodukula gkodukula commented Jan 24, 2022

Description

Pipeline: past_week

Checklist

Note: Delete items below that aren't applicable to your pull request.

  • Please merge this PR for me once it is approved.
  • If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
  • If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/nasa_wildfire and nothing outside of that directory.
  • This PR is appropriately labeled.

download_file(source_url, source_file)

logging.info("Reading file ...")
df = pd.read_csv(str(source_file))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be chunking? What is the typical # records/file size?

TARGET_GCS_BUCKET: "{{ var.value.composer_bucket }}"
TARGET_GCS_PATH: "data/nasa_wildfire/past_week/data_output.csv"
PIPELINE_NAME: "past_week"
CSV_HEADERS: >-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to multiline

@gkodukula
Copy link
Collaborator Author

@adlersantos @happyhuman @nlarge-google please review the code after the changes as per review comments

Copy link
Collaborator

@nlarge-google nlarge-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one change - remove storage bucket in dataset.yaml.

datasets/nasa_wildfire/pipelines/dataset.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@nlarge-google nlarge-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@adlersantos adlersantos merged commit f593161 into GoogleCloudPlatform:main May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants