Skip to content

Airflow task fails to submit a query to the Presto backend submitted using get_records/run #14282

@DRavikanth

Description

@DRavikanth

Apache Airflow version: v2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.17

Environment:

What happened:
Using PrestoSQL v341

I have a presto task in my dag with the following content.

def talk_to_presto():
    ph = PrestoHook(presto_conn_id='presto_dum')
    get_query = "select yyyymmdd, id from my_schema.my_table limit 1;"
    # Fetch Data
    results = ph.get_records(get_query)
    # Below line is successfully displaying the results retrieved from the DB
    logging.info(results)
    # Insert Data into destination table
    ph.insert_rows(table='dummy', rows=results)

Above task runs fine with no errors in Airflow task logs. However, the results that get_query returns is not being inserted into the dummy table in the insert_rows PrestoHook call. Any obvious mistake that I am doing here? Only following line is notably found in the Airflow task logs:

[2021-02-16 21:09:35,822] {presto.py:185} INFO - Transactions are not enable in presto connection. Please use the isolation_level property to enable it. Falling back to insert all rows in one transaction.

I also tried the following (observe run method usage):

def talk_to_presto():
    ph = PrestoHook(presto_conn_id='presto_dum')
    # Query PrestoDB
    insert_query = "insert into scratch.test values(1, 'test');";
    # Fetch Data
    results = ph.run(insert_query, autocommit=True)

Presto Connection String in Airflow: {"protocol": "https"}

What you expected to happen:
When called insert_rows/run with the right set of parameters the records/dataframe should get successfully inserted into Presto

How to reproduce it:

  1. Create a simple connection to Presto
  2. create a simple table with one column
  3. Insert into that table via Airflow PrestoHook using insert_rows/run function.

please note that I am able to work around this issue when using get_records(insert query)

How often does this problem occur? Once? Every time etc?
Always

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions