Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tmp Schemas need to account for Location/Region when orchestrating BigQuery #1369

Closed
ReadytoRocc opened this issue Dec 6, 2022 · 0 comments · Fixed by #1416
Closed

Tmp Schemas need to account for Location/Region when orchestrating BigQuery #1369

ReadytoRocc opened this issue Dec 6, 2022 · 0 comments · Fixed by #1416
Assignees
Labels
priority/high High priority
Milestone

Comments

@ReadytoRocc
Copy link

Describe the bug
I am unable to run queries using the astro.sql.transform decorator when they do not exist in the region where tmp_astro is created. In the below example error, I am running a query on a table in us-central1, when tmp_astro exists in europe-west2:

sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 404 Not found: Dataset PROJECT_ID:tmp_astro was not found in location us-central1

Version

  • Astro: 1.3.0
  • OS: Debian

To Reproduce
Steps to reproduce the behavior:

  • Run the below DAG on using a TABLE_PATH in one region.
  • Then rerun it again using a different table path in another region.
"""
gcp_bq
DAG auto-generated by Astro Build.
"""

from airflow.decorators import dag
from astro import sql as aql
from astro.sql.table import Table
import pendulum


@aql.transform(conn_id="bq_fe_sa", task_id="cell_2")
def cell_2_func():
    return """SELECT * FROM TABLE_PATH LIMIT 10;"""

@dag(
    schedule_interval="0 0 * * *",
    start_date=pendulum.from_format("2022-12-02", "YYYY-MM-DD").in_tz("America/Phoenix"),
)
def gcp_bq():
    cell_2 = cell_2_func()

dag_obj = gcp_bq()

Expected behavior
I would expect the Astro SDK to make a tmp_astro dataset per region, and ensure that it is using/refencing the appropriate tmp schema based on the source table in the query.

Additional context
The BigQueryHook supports a location parameter, allowing you to specify the location/region for your query. I don't believe this can be set via a Connection parameter.

@sunank200 sunank200 added this to the 1.3.1 milestone Dec 6, 2022
@sunank200 sunank200 self-assigned this Dec 6, 2022
@phanikumv phanikumv added the priority/high High priority label Dec 7, 2022
@phanikumv phanikumv assigned utkarsharma2 and unassigned sunank200 Dec 8, 2022
utkarsharma2 added a commit that referenced this issue Dec 13, 2022
# Description
## What is the current behavior?
Currently, we are assuming all the bigquery datasets to be in the
default region - `US`
If the user tries to access a preexisting table in a different region
say, `europe-west6` all the Temp tables will try to be created in `US`
the region, which will result in the error.


closes: #1369

## What is the new behavior?
1. The solution is to create `tmp_astro` region specific. But we cannot
do that since `<project_id>.<dataset_id>` is unique. So we are creating
a unique dataset id like - `<dataset_id>__<region_id>`, this region_id.
2.  Added a default setting to select the Bigquery default region.
3.  Added `location` attribute to Metadata class. 

## Does this introduce a breaking change?
Nope

Co-authored-by: Phani Kumar <94376113+phanikumv@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
kaxil pushed a commit that referenced this issue Dec 13, 2022
Currently, we are assuming all the bigquery datasets to be in the
default region - `US`
If the user tries to access a preexisting table in a different region
say, `europe-west6` all the Temp tables will try to be created in `US`
the region, which will result in the error.

closes: #1369

1. The solution is to create `tmp_astro` region specific. But we cannot
do that since `<project_id>.<dataset_id>` is unique. So we are creating
a unique dataset id like - `<dataset_id>__<region_id>`, this region_id.
2.  Added a default setting to select the Bigquery default region.
3.  Added `location` attribute to Metadata class.

Nope

Co-authored-by: Phani Kumar <94376113+phanikumv@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
(cherry picked from commit 730d273)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants