You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please describe the feature you'd like to see
As of Astro SDK 1.6.0, the load_file operation checks if the schema exists and if it doesn't, it attempts to create it.
Recently a user reported that the cost of checking if the schema exists is very high:
"I have a task that took 1:36 minutes to run, and it was 1:30 running the information schema query"
This was reported for Snowflake, but the same issue can apply to most of the supported Databases.
Describe the solution you'd like
Users should be able to run load_file with a boolean argument schema_exists. For backwards compatibility, the default value should be False. If this argument is False, the Python SDK does not check if the schema exists and does not attempt to create it.
Are there any alternatives to this feature?
Find a more efficient way to check if the schema exists:
"SELECT SCHEMA_NAME from information_schema.schemata WHERE LOWER(SCHEMA_NAME) = %(schema_name)s;",
Have a more generic way of allowing users to disable "optional" queries run by the Astro SDK.
tatiana
changed the title
Allow users to disable schema check & creation on load_file
Add schema_exists argument to load_file to disable schema check & creation
May 5, 2023
tatiana
changed the title
Add schema_exists argument to load_file to disable schema check & creation
Allow users to disable schema check & creation on load_fileMay 5, 2023
Support running `load_file` without checking if the table schema exists
or trying to create it.
Recently a user reported that the cost of checking if the schema exists
is very high for Snowflake:
"I have a (`load_file`) task that took 1:36 minutes to run, and it was
1:30 running the information schema query."
This is likely happening for other databases as well.
Introduce two ways of disabling schema checks:
1. On a per-task basis, by exposing the argument `schema_exists` in
`aql.load_file`
When this argument is `True`, the SDK will not check if the schema
exists or try to create it.
It is `False` by default, and the Python SDK will behave as of 1.6
(running schema check and, if needed, trying to create the schema)
2. Globally, by exposing the Airflow configuration
`load_table_schema_exists` in the `[astro-sdk]` section. This can also
be set using the environment variable
`AIRFLOW__ASTRO_SDK__LOAD_TABLE_SCHEMA_EXISTS`. The global configuration
can be overridden per task, using [1].
Closes: #1921
Please describe the feature you'd like to see
As of Astro SDK 1.6.0, the
load_file
operation checks if the schema exists and if it doesn't, it attempts to create it.Recently a user reported that the cost of checking if the schema exists is very high:
"I have a task that took 1:36 minutes to run, and it was 1:30 running the information schema query"
This was reported for Snowflake, but the same issue can apply to most of the supported Databases.
Describe the solution you'd like
Users should be able to run
load_file
with a boolean argumentschema_exists
. For backwards compatibility, the default value should beFalse
. If this argument isFalse
, the Python SDK does not check if the schema exists and does not attempt to create it.Are there any alternatives to this feature?
Additional context
Follow up with customer on Slack: https://astronomer.slack.com/archives/C04L0HNK9ME/p1683231202383579?thread_ts=1682346906.404539&cid=C04L0HNK9ME
Acceptance Criteria
The text was updated successfully, but these errors were encountered: