Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Onboard EPA Historical Air Quality dataset #301

Merged
merged 68 commits into from
Apr 8, 2022
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
b717e9d
feat: Added annual_summaries, tested locally. Errors in AF
nlarge-google Oct 27, 2021
bb846aa
feat: Added co_daily_summaries. Not ready for production
nlarge-google Oct 27, 2021
953d1cd
feat: Added co_hourly_summary. Not ready for production.
nlarge-google Oct 27, 2021
e3966b4
fix: Changed dataset name
nlarge-google Oct 27, 2021
71a9d26
fix: Attempt to resolve AF load_to_bq errors
nlarge-google Oct 27, 2021
5efa0b5
fix: Resolves issues with AF failure to execute DAG, also, some datat…
nlarge-google Oct 28, 2021
7a64407
feat: Added HAP Hourly Summary. Fixed schema issues in HAP Daily Sum…
nlarge-google Oct 28, 2021
27d962e
fix: datatype fixes
nlarge-google Oct 28, 2021
38d4e6a
feat: Added no2 daily and hourly.
nlarge-google Oct 28, 2021
ac99d56
feat: Added NONOxNOy pipelines daily and hourly
nlarge-google Oct 28, 2021
f4c09c8
feat: Added multiple pipelines and assigned destination table to a va…
nlarge-google Oct 29, 2021
9625f2d
fix: Added terraform files for new pipelines.
nlarge-google Oct 29, 2021
c2fb025
fix: Regenerated some dags.
nlarge-google Oct 29, 2021
9535847
fix: Resolved variable issues in pipeline.yaml files so that they ope…
nlarge-google Nov 1, 2021
54c3d7f
fix: clean-up code
nlarge-google Nov 1, 2021
7d93681
fix: Reduced CHUNKSIZE in order to prevent memory outage in AF, preve…
nlarge-google Nov 1, 2021
7d0f21e
fix: Resolved incorrect path entry in ozone daily summary pipeline.ya…
nlarge-google Nov 1, 2021
3869b2d
fix: Requested changes as per PR code review
nlarge-google Nov 3, 2021
9caa687
fix: Resolved black hook issue
nlarge-google Nov 3, 2021
642ab67
fix: Reduced resources used in both lead daily summary and pressure d…
nlarge-google Nov 3, 2021
4636273
fix: Tiered start time for each DAG by converting start time to chron…
nlarge-google Nov 3, 2021
6a5b3f1
fix: Resolved invalid folder path in pipeline.yaml
nlarge-google Nov 4, 2021
cdd1ca7
Merge remote-tracking branch 'upstream/main' into EPA
nlarge-google Nov 4, 2021
9d7c753
fix: Removed out of date terraform file
nlarge-google Nov 4, 2021
fcb1503
fix: attempting to resolve code check issues.
nlarge-google Nov 4, 2021
9e73eee
Merge remote-tracking branch 'upstream/main' into EPA
nlarge-google Nov 4, 2021
34d67fa
fix: Missed one change specified in code review.
nlarge-google Nov 4, 2021
5bf3062
fix: Increase memory and CPU for pressure_daily_summary and reduced b…
nlarge-google Nov 29, 2021
ba4bff6
fix: Accidental check-in resolved
nlarge-google Dec 8, 2021
ebd268f
Fix: Modified to run in AF 2 environment. Fixed issue with source fi…
nlarge-google Feb 4, 2022
a21e94d
fix: checkin. incomplete
nlarge-google Feb 14, 2022
f2acf41
fix: modified to new structure and added schema files
nlarge-google Feb 18, 2022
d7f7f87
feat: Added schema JSON files
nlarge-google Feb 22, 2022
fe1c15e
fix: need to push to git for adlers debug
nlarge-google Feb 22, 2022
f29a843
Merge branch 'GoogleCloudPlatform:main' into main
nlarge-google Feb 23, 2022
26e2473
Merge remote-tracking branch 'upstream/main' into EPA
nlarge-google Feb 23, 2022
3194464
fix: added schema files
nlarge-google Feb 23, 2022
80b09f7
fix: Submitting changes to structure while trying to test new version…
nlarge-google Feb 23, 2022
8e6451f
fix: Annual summaries now works with new structure
nlarge-google Feb 24, 2022
81d19e8
fix: working on cluster changes in each pipeline
nlarge-google Feb 24, 2022
ba95f41
fix: (1) changed pipeline.yaml for all pipelines to use clusters (2) …
nlarge-google Feb 25, 2022
0b06fe1
fix: resolved flake issues
nlarge-google Feb 25, 2022
1697757
fix: resolved cluster reference in pipeline.yaml for co_hourly
nlarge-google Feb 25, 2022
4eabd38
fix: Added WRITE_TRUNCATE to the load to bq function. Fixed variable…
nlarge-google Feb 28, 2022
dc1a3af
fix: fixes for pipeline yaml in no2 hourly and nonoxnoy daily
nlarge-google Feb 28, 2022
f717622
fix: Re-engineered code to process each file individsually instead of…
nlarge-google Mar 1, 2022
1ff2ac6
fix: misc fixes to csv transform, plus, changed pipeline yaml files t…
nlarge-google Mar 1, 2022
53602f6
fix: checks table has data before loading. If table has data for the…
nlarge-google Mar 2, 2022
8c3f55d
fix: resolved typo in pipeline.yaml.
nlarge-google Mar 2, 2022
c0e0eae
fix: Updated upload function to include file path of source file
nlarge-google Mar 3, 2022
8594162
Merge branch 'GoogleCloudPlatform:main' into main
nlarge-google Mar 11, 2022
d7ac308
Merge branch 'GoogleCloudPlatform:main' into main
nlarge-google Mar 18, 2022
2f526b6
fix: Updated code
nlarge-google Mar 25, 2022
b0ca51e
fix: Integrated all pipelines into new pipeline
nlarge-google Mar 28, 2022
275ea09
fix: Misc fixes also removed individual pipelines from project.
nlarge-google Mar 29, 2022
e335fb3
Merge branch 'GoogleCloudPlatform:main' into main
nlarge-google Mar 29, 2022
94b7cfa
Merge branch 'GoogleCloudPlatform:main' into main
nlarge-google Mar 30, 2022
fe3287e
Merge remote-tracking branch 'origin/main' into EPA
nlarge-google Mar 30, 2022
8b25b3d
Merge remote-tracking branch 'upstream/main' into EPA
nlarge-google Mar 30, 2022
45eeddc
Merge branch 'main' into EPA
nlarge-google Mar 30, 2022
48c7af9
fix: black issue
nlarge-google Mar 30, 2022
8c8da9f
fix: clean-up.
nlarge-google Mar 30, 2022
72bf125
fix: black issue
nlarge-google Mar 30, 2022
d228a0c
fix: added table ids to pipeline.yaml as per code review.
nlarge-google Apr 6, 2022
b6d66ab
fix: Resolved isort issue
nlarge-google Apr 6, 2022
c37d071
fix: misc
nlarge-google Apr 6, 2022
dfc9ba9
fix: Resolved isort issue
nlarge-google Apr 6, 2022
f856cde
fix: changes as-per code review
nlarge-google Apr 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_annual_summaries" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "annual_summaries"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "annual_summaries"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_co_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "co_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "co_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_co_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "co_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "co_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_hap_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "hap_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "hap_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_hap_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "hap_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "hap_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_lead_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "lead_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "lead_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_no2_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "no2_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "no2_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_no2_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "no2_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "no2_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_nonoxnoy_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "nonoxnoy_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "nonoxnoy_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_nonoxnoy_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "nonoxnoy_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "nonoxnoy_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_ozone_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "ozone_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "ozone_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_ozone_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "ozone_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "ozone_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm10_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm10_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm10_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm10_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm10_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm10_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm25_frm_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_frm_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_frm_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm25_nonfrm_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_nonfrm_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_nonfrm_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm25_nonfrm_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_nonfrm_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_nonfrm_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm25_speciation_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_speciation_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_speciation_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pm25_speciation_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_speciation_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pm25_speciation_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pressure_daily_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pressure_daily_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pressure_daily_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@


resource "google_bigquery_table" "epa_historical_air_quality_pressure_hourly_summary" {
project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pressure_hourly_summary"

project = var.project_id
dataset_id = "epa_historical_air_quality"
table_id = "pressure_hourly_summary"
description = "epaspc"




depends_on = [
google_bigquery_dataset.epa_historical_air_quality
]
Expand Down