-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Description
Apache Airflow version: v2.0.0
Kubernetes version (if you are using kubernetes) (use kubectl version): 1.17
Environment:
- Cloud provider or hardware configuration: Self Managed via Argo
- OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/" - Kernel (e.g.
uname -a): Linux airflow-idev-worker-0 3.10.0-1062.12.1.el7.x86_64 Improving the search functionality in the graph view #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 GNU/Linux - Install tools: "apache-airflow[presto]"
What happened:
Using PrestoSQL v341
I have a presto task in my dag with the following content.
def talk_to_presto():
ph = PrestoHook(presto_conn_id='presto_dum')
get_query = "select yyyymmdd, id from my_schema.my_table limit 1;"
# Fetch Data
results = ph.get_records(get_query)
# Below line is successfully displaying the results retrieved from the DB
logging.info(results)
# Insert Data into destination table
ph.insert_rows(table='dummy', rows=results)
Above task runs fine with no errors in Airflow task logs. However, the results that get_query returns is not being inserted into the dummy table in the insert_rows PrestoHook call. Any obvious mistake that I am doing here? Only following line is notably found in the Airflow task logs:
[2021-02-16 21:09:35,822] {presto.py:185} INFO - Transactions are not enable in presto connection. Please use the isolation_level property to enable it. Falling back to insert all rows in one transaction.
I also tried the following (observe run method usage):
def talk_to_presto():
ph = PrestoHook(presto_conn_id='presto_dum')
# Query PrestoDB
insert_query = "insert into scratch.test values(1, 'test');";
# Fetch Data
results = ph.run(insert_query, autocommit=True)
Presto Connection String in Airflow: {"protocol": "https"}
What you expected to happen:
When called insert_rows/run with the right set of parameters the records/dataframe should get successfully inserted into Presto
How to reproduce it:
- Create a simple connection to Presto
- create a simple table with one column
- Insert into that table via Airflow PrestoHook using insert_rows/run function.
please note that I am able to work around this issue when using get_records(insert query)
How often does this problem occur? Once? Every time etc?
Always