-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add redshift native path #700
Conversation
for more information, see https://pre-commit.ci
…mer/astro-sdk into 612-add-redshift-support
for more information, see https://pre-commit.ci
- pre-commit now runs in pre-commit.ci Co-authored-by: feluelle <felix.uellendall@astronomer.io>
Locally spin-up airflow container to test the DAG closes: #572
Adds a `get_file_list` task that allows using it with Dynamic Task Mapping that was introduced in Airflow 2.3. closes: #507
Minor fix to Drop Table operator docs. Wanted to give it a clear name in the docs, "Drop operator" isn't clear. Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Looks like we already had a custom exceptions file with 2 custom exceptions. So just making it consistent and moving `DatabaseCustomError` to this file
# Description ## What is the current behavior? An example: tests/integration_test_dag.py::test_full_dag[bigquery] https://github.com/astronomer/astro-sdk/runs/7386289598?check_suite_focus=true ``` [2022-07-18 09:20:54,978] ***debug_executor.py:87*** ERROR - Failed to execute task: Reason: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas. Traceback (most recent call last): File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/gbq.py", line 591, in load_data chunks = load.load_chunks( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/load.py", line 238, in load_chunks load_parquet( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/pandas_gbq/load.py", line 130, in load_parquet client.load_table_from_dataframe( File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 728, in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/home/runner/work/astro-sdk/astro-sdk/.nox/test-3-8-airflow-2-2-5/lib/python3.8/site-packages/google/api_core/future/polling.py", line 137, in result raise self._exception google.api_core.exceptions.Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas ``` There are a couple of ways to improve this, including: reducing the number of requests we make to BQ during load (e.g. not iterate through all files within the Python Astro SDK - but using the wildcard support in the native implementations) increasing the quota running BQ tests sequentially (? not sure if this would work) Adding a retry to bq dags closes: #553 ## What is the new behavior? Adding a retry to bq dags only when a specific error will be raised. Using Tenacity. ## Does this introduce a breaking change? Nope ### Checklist - [X] Bigquery Dags should not fail with earlier 429 and rate limit reach errors.
updates: - [github.com/Lucas-C/pre-commit-hooks: v1.2.0 → v1.3.0](Lucas-C/pre-commit-hooks@v1.2.0...v1.3.0) - [github.com/psf/black: 22.3.0 → 22.6.0](psf/black@22.3.0...22.6.0) - [github.com/PyCQA/flake8: 4.0.1 → 5.0.4](PyCQA/flake8@4.0.1...5.0.4) - https://github.com/timothycrosley/isort → https://github.com/PyCQA/isort - [github.com/pre-commit/mirrors-mypy: v0.961 → v0.971](pre-commit/mirrors-mypy@v0.961...v0.971) - [github.com/asottile/pyupgrade: v2.34.0 → v2.37.3](asottile/pyupgrade@v2.34.0...v2.37.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Updat query for bigquery schema from to specfic cols * Update tests/databases/test_bigquery.py Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>a Description What is the current behavior? There is a failing test case in CI - test_bigquery_create_table_with_columns Can be because of PR - #641 (comment) Action run failure - https://github.com/astronomer/astro-sdk/runs/7830499547?check_suite_focus=true Astro: 1.0.0b1 closes: #660 What is the new behavior? Bigquery added an extra col in the INFORMATION_SCHEMA.COLUMNS and the test broke in CI since we checked for the entire cols involved. Now we are only looking for specific columns, this should make the test more robust. Does this introduce a breaking change? Nope Checklist Created tests that fail without the change (if possible)
Add example docs for run_raw_sql
* Fix development environments - fix local.mk to respect the correct directory - fix container.mk to use native docker compose from Docker Desktop - improve development docs * Change some rst to md - fix python version prerequisites
Guide for creating IAM role with needed permissions for COPY command authorization: https://www.dataliftoff.com/iam-roles-for-loading-data-from-s3-into-redshift/ |
…' into feature/613-redshift-native-path
for more information, see https://pre-commit.ci
…' into feature/613-redshift-native-path
Codecov Report
@@ Coverage Diff @@
## main #700 +/- ##
==========================================
- Coverage 93.52% 93.28% -0.25%
==========================================
Files 43 43
Lines 1776 1801 +25
Branches 216 219 +3
==========================================
+ Hits 1661 1680 +19
- Misses 93 97 +4
- Partials 22 24 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Based on the offline discussion, we're removing the options to allow using |
We should cover it in the docs on why we don't support creds though |
We have enhanced the docs now. It appears as below . Thanks @kaxil for directing to use the Sphinx note directive. cc: @utkarsharma2 |
# With this option, matching is case-sensitive. Column names in Amazon Redshift tables are always lowercase, | ||
# so when you use the 'auto' option, matching JSON field names must also be lowercase. | ||
# Refer: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json | ||
FileType.JSON: "JSON 'auto'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pankajkoti can you add a test case where we have mixed casing in cols names for load_file operator? use - tests/data/homes_upper.csv.
I just want to make sure this works with mixed col casing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test. Also usingjson 'auto ignorecase'
instead of json 'auto'
option to allow non-lower case columns as mentioned here https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Description
What is the current behavior?
We currently do not have a native path available for transferring data to redshift.
closes: #613
What is the new behavior?
This PR adds native path for transferring data to redshift.
Does this introduce a breaking change?
No
Checklist