Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform op - database is not picked from metadata #1034

Closed
utkarsharma2 opened this issue Oct 10, 2022 · 2 comments · Fixed by #1117
Closed

Transform op - database is not picked from metadata #1034

utkarsharma2 opened this issue Oct 10, 2022 · 2 comments · Fixed by #1117
Assignees
Labels
bug Something isn't working priority/critical Critical priority product/python-sdk Label describing products
Milestone

Comments

@utkarsharma2
Copy link
Collaborator

Describe the bug

with DAG(dag_id="test", schedule=None, start_date=datetime(1970, 1, 1)) as dag:
    @aql.transform
    def select(input_table: Table):
        return "SELECT * FROM my_db.my_schema.my_table LIMIT 4;"  # {{ input_table }} was also failing
    select(
        input_table=Table(metadata={"schema": "my_schema", "database": "db"}, name="my_table", conn_id="my_conn_id"),
        output_table=Table(name="fritz_test_1234")
    )

snowflake.connector.errors.ProgrammingError: 090105 (22000): 01a77af0-0605-96b0-0000-682112f6e8ae: Cannot perform SELECT. This session does not have a current database. Call 'USE DATABASE', or use a qualified name.
ref: https://astronomer.slack.com/archives/C02B8SPT93K/p1665414029917169

Version

  • Astro: 1.1.1

To Reproduce
Try the above code.

Expected behavior
If the DB is not specified in the connection we should be able to pick from metadata.

@utkarsharma2 utkarsharma2 changed the title Database is not picked from metadata Transform op - database is not picked from metadata Oct 10, 2022
@kaxil kaxil added bug Something isn't working product/python-sdk Label describing products labels Oct 11, 2022
@pankajastro pankajastro self-assigned this Oct 18, 2022
@pankajastro
Copy link
Contributor

This looks like a problem when using snowflake. I quickly tested the bigquery worked fine

@phanikumv phanikumv added this to the 1.2.1 milestone Oct 20, 2022
@phanikumv phanikumv added the priority/critical Critical priority label Oct 20, 2022
@pankajastro
Copy link
Contributor

pankajastro commented Oct 20, 2022

For the hook creation in astro-sdk we tightly depends on the airflow connection
for example

 @property
    def hook(self) -> BigQueryHook:
        """Retrieve Airflow hook to interface with the BigQuery database."""
        return BigQueryHook(gcp_conn_id=self.conn_id, use_legacy_sql=False)

Snowflake hook expects the database, schema, etc., so the ideal place to set these values is a connection only. Even in our case, we do

 @property
    def hook(self) -> SnowflakeHook:
        """Retrieve Airflow hook to interface with the Snowflake database."""
        return SnowflakeHook(snowflake_conn_id=self.conn_id)

If a user is setting up the schema, and database in the input and output table in the transform operator which value should we use from input_table or output_table or will we create multiple hooks? it looks a little bit complicated to me in the current design.

For now, I have modified the create_database method to accept conn_id as well as a table and I'm passing the output table from the base database class and setting the schema and database in the snowflake hook if the table metadata has these values.
I'll fix the CI if people agree with the approach. PR #1117

pankajastro added a commit that referenced this issue Nov 4, 2022
# Description
closes: #1034 
Add table param in ```create_database``` function so that can access and
use table metadata while creating a hook

## What is the current behavior?
Currently, we don't use Table metadata param schema and database while
creating so if these values is not set in the airflow connection we are
getting an error in the case of snowflake


## What is the new behavior?
change 

```
def create_database(conn_id: str) -> BaseDatabase:
    ...
```
to
```
def create_database(
    conn_id: str,
    table: object | None = None,
) -> BaseDatabase: 
    ...
```
and use table metadata if available while creating snowflake hook


## Does this introduce a breaking change?
No

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
utkarsharma2 pushed a commit that referenced this issue Nov 4, 2022
# Description
closes: #1034 
Add table param in ```create_database``` function so that can access and
use table metadata while creating a hook

## What is the current behavior?
Currently, we don't use Table metadata param schema and database while
creating so if these values is not set in the airflow connection we are
getting an error in the case of snowflake


## What is the new behavior?
change 

```
def create_database(conn_id: str) -> BaseDatabase:
    ...
```
to
```
def create_database(
    conn_id: str,
    table: object | None = None,
) -> BaseDatabase: 
    ...
```
and use table metadata if available while creating snowflake hook


## Does this introduce a breaking change?
No

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/critical Critical priority product/python-sdk Label describing products
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants