Skip to content

1.0.0b1

Compare
Choose a tag to compare
@sunank200 sunank200 released this 27 Jul 10:48
· 744 commits to main since this release
3cd86da

Feature:

  • Improved the performance of aql.load_file by supporting database-specific (native) load methods. This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to SQL databases which passed the data to worker node which slowed the performance. #557, #481

    Introduced new arguments to aql.load_file:

    • use_native_support for data transfer if available on the destination (defaults to use_native_support=True)
    • native_support_kwargs is a keyword argument to be used by method involved in native support flow.
    • enable_native_fallback can be used to fall back to default transfer(defaults to enable_native_fallback=True).

    Now, there are three modes:

    • Native: Default, uses Bigquery Load Job in the case of BigQuery and Snowflake COPY INTO using external stage in the case of Snowflake.
    • Pandas: This is how datasets were previously loaded. To enable this mode, use the argument use_native_support=False in aql.load_file.
    • Hybrid: This attempts to use the native strategy to load a file to the database and if native strategy(i) fails , fallback to Pandas (ii) with relevant log warnings.
  • Allow users to specify the table schema (column types) in which a file is being loaded by using table.columns. If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas (which is previous behaviour).#532

  • Implement fallback mechanism in case native support fails to default option with log warning for problem with native support. #557

  • Add Example DAG for Dynamic Map Task with Astro-SDK. #377,airflow-2.3.0

Community:

  • Allow running tests on PRs from forks + label #179

Breaking Change:

  • The aql.dataframe argument identifiers_as_lower (which was boolean, with default set to False) was replaced by the argument columns_names_capitalization (string within possible values ["upper", "lower", "original"], default is lower).#564
  • The aql.load_file before would change the capitalization of all column titles to be uppercase, by default, now it makes them lowercase, by default. The old behaviour can be achieved by using the argument columns_names_capitalization="upper". #564
  • aql.load_file attempts to load files to BigQuery and Snowflake by using native methods, which may have pre-requirements to work. To disable this mode, use the argument use_native_support=False in aql.load_file. #557, #481
  • aql.dataframe will raise an exception if the default Airflow XCom backend is being used. To solve this, either use an external XCom backend, such as S3 or GCS or set the configuration AIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True. #444
  • Change the declaration for the default Astro SDK temporary schema from using AIRFLOW__ASTRO__SQL_SCHEMA to AIRFLOW__ASTRO_SDK__SQL_SCHEMA #503
  • Renamed aql.truncate to aql.drop_table #554

Bug fix:

  • Fix missing airflow's task terminal states to CleanupOperator #525
  • Allow chaining aql.drop_table (previously truncate) tasks using the Task Flow API syntax. #554, #515

Enhancement:

  • Improved the performance of aql.load_file for files from AWS S3 to Google BigQuery up to 94%. #429, #568
  • Improved the performance of aql.load_file for files from Google Cloud Storage to Google BigQuery up to 93%. #429, #562
  • Improved the performance of aql.load_file for files from AWS S3/Google Cloud Storage to Snowflake up to 76%. #430, #544
  • Improved the performance of aql.load_file for files from GCS to Postgres in K8s up to 93%. #428, #531
  • Fix sphinx docs sidebar #472
  • Get configurations via Airflow Configuration manager. #503
  • Add CI job to check for dead links #526

@tatiana @kaxil @dimberman @utkarsharma2 @sunank200 @pankajastro @jlaneve @guohui-gao @mikeshwe @vikramkoka