Add redshift native path #700

feluelle · 2022-08-19T07:11:02Z

Description

What is the current behavior?

We currently do not have a native path available for transferring data to redshift.

closes: #613

What is the new behavior?

This PR adds native path for transferring data to redshift.

Does this introduce a breaking change?

No

Checklist

Created tests which fail without the change (if possible)
Extended the README / documentation, if necessary

for more information, see https://pre-commit.ci

…mer/astro-sdk into 612-add-redshift-support

for more information, see https://pre-commit.ci

- pre-commit now runs in pre-commit.ci Co-authored-by: feluelle <felix.uellendall@astronomer.io>

Locally spin-up airflow container to test the DAG closes: #572

Adds a `get_file_list` task that allows using it with Dynamic Task Mapping that was introduced in Airflow 2.3. closes: #507

Minor fix to Drop Table operator docs. Wanted to give it a clear name in the docs, "Drop operator" isn't clear. Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>

Looks like we already had a custom exceptions file with 2 custom exceptions. So just making it consistent and moving `DatabaseCustomError` to this file

# Description ## What is the current behavior? An example: tests/integration_test_dag.py::test_full_dag[bigquery] https://github.com/astronomer/astro-sdk/runs/7386289598?check_suite_focus=true ``` [2022-07-18 09:20:54,978] ***debug_executor.py:87*** ERROR - Failed to execute task: Reason: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas. Traceback (most recent call last): File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/gbq.py", line 591, in load_data chunks = load.load_chunks( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/load.py", line 238, in load_chunks load_parquet( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/load.py", line 130, in load_parquet client.load_table_from_dataframe( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 728, in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/google/api_core/future/polling.py", line 137, in result raise self._exception google.api_core.exceptions.Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas ``` There are a couple of ways to improve this, including: reducing the number of requests we make to BQ during load (e.g. not iterate through all files within the Python Astro SDK - but using the wildcard support in the native implementations) increasing the quota running BQ tests sequentially (? not sure if this would work) Adding a retry to bq dags closes: #553 ## What is the new behavior? Adding a retry to bq dags only when a specific error will be raised. Using Tenacity. ## Does this introduce a breaking change? Nope ### Checklist - [X] Bigquery Dags should not fail with earlier 429 and rate limit reach errors.

updates: - [github.com/Lucas-C/pre-commit-hooks: v1.2.0 → v1.3.0](Lucas-C/pre-commit-hooks@v1.2.0...v1.3.0) - [github.com/psf/black: 22.3.0 → 22.6.0](psf/black@22.3.0...22.6.0) - [github.com/PyCQA/flake8: 4.0.1 → 5.0.4](PyCQA/flake8@4.0.1...5.0.4) - https://github.com/timothycrosley/isort → https://github.com/PyCQA/isort - [github.com/pre-commit/mirrors-mypy: v0.961 → v0.971](pre-commit/mirrors-mypy@v0.961...v0.971) - [github.com/asottile/pyupgrade: v2.34.0 → v2.37.3](asottile/pyupgrade@v2.34.0...v2.37.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Updat query for bigquery schema from to specfic cols * Update tests/databases/test_bigquery.py Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>a Description What is the current behavior? There is a failing test case in CI - test_bigquery_create_table_with_columns Can be because of PR - #641 (comment) Action run failure - https://github.com/astronomer/astro-sdk/runs/7830499547?check_suite_focus=true Astro: 1.0.0b1 closes: #660 What is the new behavior? Bigquery added an extra col in the INFORMATION_SCHEMA.COLUMNS and the test broke in CI since we checked for the entire cols involved. Now we are only looking for specific columns, this should make the test more robust. Does this introduce a breaking change? Nope Checklist Created tests that fail without the change (if possible)

Add example docs for run_raw_sql

* Fix development environments - fix local.mk to respect the correct directory - fix container.mk to use native docker compose from Docker Desktop - improve development docs * Change some rst to md - fix python version prerequisites

pankajkoti · 2022-08-30T06:28:31Z

Guide for creating IAM role with needed permissions for COPY command authorization: https://www.dataliftoff.com/iam-roles-for-loading-data-from-s3-into-redshift/

…' into feature/613-redshift-native-path

for more information, see https://pre-commit.ci

…' into feature/613-redshift-native-path

codecov · 2022-08-30T07:55:04Z

Codecov Report

Merging #700 (f4a3305) into main (4585139) will decrease coverage by 0.24%.
The diff coverage is 78.57%.

❗ Current head f4a3305 differs from pull request most recent head 0a65df5. Consider uploading reports for the commit 0a65df5 to get more accurate results

@@            Coverage Diff             @@
##             main     #700      +/-   ##
==========================================
- Coverage   93.52%   93.28%   -0.25%     
==========================================
  Files          43       43              
  Lines        1776     1801      +25     
  Branches      216      219       +3     
==========================================
+ Hits         1661     1680      +19     
- Misses         93       97       +4     
- Partials       22       24       +2

Impacted Files	Coverage Δ
src/astro/databases/aws/redshift.py	`91.17% <78.57%> (-8.83%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/astro/databases/aws/redshift.py

tests/databases/test_redshift.py

python-sdk/src/astro/databases/aws/redshift.py

pankajkoti · 2022-09-01T13:54:35Z

Based on the offline discussion, we're removing the options to allow using creds. We're only supporting IAM_ROLE option now.

kaxil · 2022-09-01T14:18:06Z

Based on the offline discussion, we're removing the options to allow using creds. We're only supporting IAM_ROLE option now.

We should cover it in the docs on why we don't support creds though

pankajkoti · 2022-09-01T16:01:46Z

Based on the offline discussion, we're removing the options to allow using creds. We're only supporting IAM_ROLE option now.

We should cover it in the docs on why we don't support creds though

We have enhanced the docs now. It appears as below . Thanks @kaxil for directing to use the Sphinx note directive.

cc: @utkarsharma2

python-sdk/src/astro/databases/aws/redshift.py

utkarsharma2 · 2022-09-01T20:41:01Z

python-sdk/src/astro/databases/aws/redshift.py

+    # With this option, matching is case-sensitive. Column names in Amazon Redshift tables are always lowercase,
+    # so when you use the 'auto' option, matching JSON field names must also be lowercase.
+    # Refer: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json
+    FileType.JSON: "JSON 'auto'",


@pankajkoti can you add a test case where we have mixed casing in cols names for load_file operator? use - tests/data/homes_upper.csv.

I just want to make sure this works with mixed col casing.

Added a test. Also usingjson 'auto ignorecase' instead of json 'auto' option to allow non-lower case columns as mentioned here https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json

for more information, see https://pre-commit.ci

utkarsharma2

LGTM 👍

pankajkoti and others added 29 commits August 10, 2022 15:57

Bootstrap code

a31eda8

Cover tests for Redshift database load file

f39a903

Fix pre-commit errors

d6ab200

Cover append operator test for Redshift database

27bcb45

Cover dataframe tests for Redshift database

b0164e0

Cover drop_table tests for Redshift database

01bfeea

Cover export_file tests for Redshift database

485acd3

Bootstrap code

651bd39

Cover tests for Redshift database load file

b8f46e6

Fix pre-commit errors

7b01866

Cover append operator test for Redshift database

e58a954

Cover dataframe tests for Redshift database

98a6515

Cover drop_table tests for Redshift database

bca2583

Cover export_file tests for Redshift database

9606bb6

[pre-commit.ci] auto fixes from pre-commit.com hooks

e782e1f

for more information, see https://pre-commit.ci

Merge branch '612-add-redshift-support' of https://github.com/astrono…

8cc01e0

…mer/astro-sdk into 612-add-redshift-support

Cover transform & raw_sql tests for Redshift database

feabd31

[pre-commit.ci] auto fixes from pre-commit.com hooks

04bf10b

for more information, see https://pre-commit.ci

Add pre-commit.ci badge (#647)

3944080

Remove pre-commit from GH Actions (#646)

26b4fd3

- pre-commit now runs in pre-commit.ci Co-authored-by: feluelle <felix.uellendall@astronomer.io>

Spin Local Airflow Container to Test DAG (#597)

f75ab84

Locally spin-up airflow container to test the DAG closes: #572

Add get_file_list for dynamic task template (#596)

990663f

Adds a `get_file_list` task that allows using it with Dynamic Task Mapping that was introduced in Airflow 2.3. closes: #507

Doc: change "drop operator" to "drop table operator" (#648)

5b1dabf

Minor fix to Drop Table operator docs. Wanted to give it a clear name in the docs, "Drop operator" isn't clear. Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>

Move DatabaseCustomError to astro/exceptions.py (#649)

7937ffe

Looks like we already had a custom exceptions file with 2 custom exceptions. So just making it consistent and moving `DatabaseCustomError` to this file

Add Example Docs for run_raw_sql (#608)

0a1a796

Add example docs for run_raw_sql

Fix development environments (#651)

0edad43

* Fix development environments - fix local.mk to respect the correct directory - fix container.mk to use native docker compose from Docker Desktop - improve development docs * Change some rst to md - fix python version prerequisites

feluelle changed the title ~~Feature/613 redshift native path~~ Add redshift native path Aug 19, 2022

pankajkoti and others added 5 commits August 30, 2022 12:34

Refer to same region bucket as Redshift cluster in test

e24d0b1

Merge remote-tracking branch 'origin/feature/613-redshift-native-path…

3ca0680

…' into feature/613-redshift-native-path

[pre-commit.ci] auto fixes from pre-commit.com hooks

794bfcc

for more information, see https://pre-commit.ci

Disable native support for load file from cloud test

3309041

Merge remote-tracking branch 'origin/feature/613-redshift-native-path…

ecabb5c

…' into feature/613-redshift-native-path

utkarsharma2 reviewed Aug 30, 2022

View reviewed changes

src/astro/databases/aws/redshift.py Outdated Show resolved Hide resolved

src/astro/databases/aws/redshift.py Outdated Show resolved Hide resolved

tests/databases/test_redshift.py Outdated Show resolved Hide resolved

pankajkoti added 4 commits August 30, 2022 14:19

Add example DAG for Redshift native load with IAM role authorization

6fee618

Address @utkarsharma2's comment on adding comment for json auto option

43a8b0d

Merge branch 'main' into feature/613-redshift-native-path

6150770

Nit: usage of pop

f4a3305

pankajkoti requested a review from utkarsharma2 August 30, 2022 18:04

Merge branch 'main' into feature/613-redshift-native-path

9d74ebf

pankajkoti requested review from phanikumv and kaxil September 1, 2022 06:33

kaxil reviewed Sep 1, 2022

View reviewed changes

python-sdk/src/astro/databases/aws/redshift.py Outdated Show resolved Hide resolved

pankajkoti added 2 commits September 1, 2022 19:00

Merge branch 'main' into feature/613-redshift-native-path

6ac1ee2

Remove using creds options for native load, support only IAM_ROLE option

e3d8aa1

Enhance docs to add note for Redshift copy command authorization

4b7e734

utkarsharma2 reviewed Sep 1, 2022

View reviewed changes

pankajkoti and others added 2 commits September 2, 2022 15:35

Add test to load json with mixed case columns

a9f26df

[pre-commit.ci] auto fixes from pre-commit.com hooks

0a65df5

for more information, see https://pre-commit.ci

pankajkoti requested a review from utkarsharma2 September 2, 2022 10:09

utkarsharma2 approved these changes Sep 2, 2022

View reviewed changes

pankajkoti merged commit bd67729 into main Sep 2, 2022

pankajkoti deleted the feature/613-redshift-native-path branch September 2, 2022 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add redshift native path #700

Add redshift native path #700

feluelle commented Aug 19, 2022 •

edited by pankajkoti

Loading

pankajkoti commented Aug 30, 2022

codecov bot commented Aug 30, 2022 •

edited

Loading

pankajkoti commented Sep 1, 2022

kaxil commented Sep 1, 2022

pankajkoti commented Sep 1, 2022

utkarsharma2 Sep 1, 2022

pankajkoti Sep 2, 2022

utkarsharma2 left a comment

Add redshift native path #700

Add redshift native path #700

Conversation

feluelle commented Aug 19, 2022 • edited by pankajkoti Loading

Description

What is the current behavior?

What is the new behavior?

Does this introduce a breaking change?

Checklist

pankajkoti commented Aug 30, 2022

codecov bot commented Aug 30, 2022 • edited Loading

Codecov Report

pankajkoti commented Sep 1, 2022

kaxil commented Sep 1, 2022

pankajkoti commented Sep 1, 2022

utkarsharma2 Sep 1, 2022

Choose a reason for hiding this comment

pankajkoti Sep 2, 2022

Choose a reason for hiding this comment

utkarsharma2 left a comment

Choose a reason for hiding this comment

feluelle commented Aug 19, 2022 •

edited by pankajkoti

Loading

codecov bot commented Aug 30, 2022 •

edited

Loading