airflow-plugin-glue_presto_apas

An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.

Usage

from datetime import timedelta

import airflow
from airflow.models import DAG

from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator

args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}


dag = DAG(
    dag_id='example-dag',
    schedule_interval='0 0 * * *',
    default_args=args,
)

GluePrestoApasOperator(task_id='example-task-1',
                       db='example_db',
                       table='example_table',
                       sql='example.sql',
                       partition_kv={
                           'table_schema': 'example_db',
                           'table_name': 'example_table'
                       },
                       catalog_region_name='ap-northeast-1',
                       dag=dag,
                       )

GlueAddPartitionOperator(task_id='example-task-2',
                         db='example_db',
                         table='example_table',
                         partition_kv={
                             'table_schema': 'example_db',
                             'table_name': 'example_table'
                         },
                         catalog_region_name='ap-northeast-1',
                         dag=dag,
                         )

if __name__ == "__main__":
    dag.cli()

Configuration

glue_presto_apas.GluePrestoApasOperator

db: database name for parititioning (string, required)
table: table name for parititioning (string, required)
sql: sql file name for selecting data (string, required)
fmt: data format when storing data (string, default = parquet)
additional_properties: additional properties for creating table. (dict[string, string], optional)
location: location for the data (string, default = auto generated by hive repairable way)
partition_kv: key values for partitioning (dict[string, string], required)
save_mode: mode when storing data (string, default = overwrite, available values are skip_if_exists, error_if_exists, ignore, overwrite)
catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
presto_conn_id: connection id for presto (string, default = 'presto_default')
aws_conn_id: connection id for aws (string, default = 'aws_default')

Templates can be used in the options[db, table, sql, location, partition_kv].

glue_add_partition.GlueAddPartitionOperator

db: database name for parititioning (string, required)
table: table name for parititioning (string, required)
location: location for the data (string, default = auto generated by hive repairable way)
partition_kv: key values for partitioning (dict[string, string], required)
mode: mode when storing data (string, default = overwrite, available values are skip_if_exists, error_if_exists, overwrite)
follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default = True)
catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
aws_conn_id: connection id for aws (string, default = 'aws_default')

Templates can be used in the options[db, table, location, partition_kv].

Development

Run Example

PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh

Release

poetry publish --build

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
example		example
src/airflow		src/airflow
tests/airflow/plugin		tests/airflow/plugin
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
run-example.sh		run-example.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

airflow-plugin-glue_presto_apas

Usage

Configuration

glue_presto_apas.GluePrestoApasOperator

glue_add_partition.GlueAddPartitionOperator

Development

Run Example

Release

About

Releases 9

Packages

Languages

License

civitaspo/airflow-plugin-glue_presto_apas

Folders and files

Latest commit

History

Repository files navigation

airflow-plugin-glue_presto_apas

Usage

Configuration

glue_presto_apas.GluePrestoApasOperator

glue_add_partition.GlueAddPartitionOperator

Development

Run Example

Release

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages