Skip to content

Latest commit

 

History

History
52 lines (36 loc) · 2.99 KB

transform.rst

File metadata and controls

52 lines (36 loc) · 2.99 KB

:pytransform operator <astro.sql.operators.transform>

When to use the transform operator

The transform operator allows you to implement the T of an ELT system by running a SQL query. Each step of the transform pipeline creates a new table from the SELECT statement and enables tasks to pass those tables as if they were native Python objects.

The transform operator treats values in the double brackets as Airflow jinja templates. You can find more details on templating at templating.

There are two main uses for the transform operator.

Case 1: Passing tables between tasks while completing data transformations.

The following example applies a SQL SELECT statement to a imdb_movies table with templating and saves the result to a astro-sdk tmp table.

Note that the input_table in the double brackets is treated as an Airflow jinja template. It is not an f string. F-strings in SQL formatting are at risk of security breaches via SQL injections. For security, you must explicitly identify tables in the function parameters by typing a value as a Table. Only then will the transform operator treat the value as a table.

../../../../example_dags/example_transform.py

The following example applies a SQL SELECT statement to a imdb_movies table with templating and saves the result to a astro-sdk tmp table.

../../../../example_dags/example_transform.py

You can easily pass tables between tasks when completing a data transformation.

../../../../example_dags/example_transform.py

Case 2: Passing a Pandas dataframe between tasks while completing data transformations.

The following example shows how you can quickly pass a table and a Pandas dataframe between tasks when completing a data transformation.

../../../../example_dags/example_transform.py

Please note that in case you want to pass SQL file in the transform decorator, use transform_file_operator

Parameters

  • query_modifier - The query_modifier parameter allows you to define statements to run before and after the run_raw_sql main statement. To associate a Snowflake query tag, for instance, it is possible to use query_modifier=QueryModifier(pre_queries=["ALTER SESSION SET QUERY_TAG=<my-query-tag>]).