Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 1.96 KB

dataframe.rst

File metadata and controls

38 lines (27 loc) · 1.96 KB

dataframe operator

When to use the dataframe operator

The dataframe operator allows you to run Python transformations in Airflow. Behind the scenes, the dataframe function automatically coverts the source SQL table into a Pandas dataframe, and makes any dataframes resulting from the transformation available to downstream astro.sql functions. This means you can seamlessly transition between Python and SQL for data transformations without writing any code to explicitly do so. To use the dataframe operator, you simply provide a Python function that takes a dataframe as one of its inputs, and specify a Table object as the input SQL table. If you want the resulting dataframe to be converted back to SQL, you can specify an output_table object.

There are two main uses for the dataframe operator.

Case 1: Convert a SQL table into a dataframe.

../../../../example_dags/example_amazon_s3_snowflake_transform.py

Case 2: Convert the resulting dataframe into a table. When the output_table parameter is specified, the resulting dataframe is turned into a table.

../../../../example_dags/example_amazon_s3_snowflake_transform.py

Case 3: Pass the result of a dataframe function as a list or a dictionary

../../../../example_dags/example_dataframe_api.py

Default Datasets

  • Input dataset - No default input dataset.
  • Output dataset - Target table of the operator.