[AIRFLOW-1688] Automatically add load.time_partitioning to bigquery_hook when table name includes $#2669
[AIRFLOW-1688] Automatically add load.time_partitioning to bigquery_hook when table name includes $#2669albertocalderari wants to merge 1 commit intoapache:masterfrom albertocalderari:patch-1
Conversation
Allow automatic creation of a new partitioned tables when the '$' symbol is in the table name adding the option:
load.timePartitioning: {type: 'DAY'}
i.e. my_dataset.my_table$20170101 would be automatically loaded as a new partitioned table without the need of creating it manually.
Codecov Report
@@ Coverage Diff @@
## master #2669 +/- ##
=========================================
+ Coverage 71.69% 71.7% +<.01%
=========================================
Files 154 154
Lines 11807 11807
=========================================
+ Hits 8465 8466 +1
+ Misses 3342 3341 -1
Continue to review full report at Codecov.
|
| } | ||
|
|
||
| # if it is a partitioned table ($ is in the table name) add partition load option | ||
| if '$' in destination_project_dataset_table: |
There was a problem hiding this comment.
I'd consider a regex here rather than just a check on the $ character
There was a problem hiding this comment.
I used $ as it is the partition decorator separator in BQ and won't change.
Also BQ doe not allow special characters in the table or dataset name (except the partition separator).
|
|
||
| # if it is a partitioned table ($ is in the table name) add partition load option | ||
| if '$' in destination_project_dataset_table: | ||
| configuration['load']['timePartitioning'] = dict(type='DAY') |
There was a problem hiding this comment.
Maybe you would want to add support for the other two nested options of timePartitioning: expirationMs and field?
There was a problem hiding this comment.
I agree with you on this, will try to add it sometime this week
|
I would also consider some tests (based on the review comments as well) |
|
@wileeam I changed the review comments. I did perform integration testing on this change. |
Allow automatic creation of a new partitioned tables when the '$' symbol is in the table name adding the option:
load.timePartitioning: {type: 'DAY'}
i.e. my_dataset.my_table$20170101 would be automatically loaded as a new partitioned table without the need of creating it manually.
Dear Airflow maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
The gcs_to_bq operator throws and exception when trying to auto-create a new date partitioned table.
To allow the table creation the load configuration needs the api option:
load.timePartitioning:
{type: 'DAY'}
I will add a fix to identify date partitioned table from the presence of a $ in the table name and add the option.
Tests
I can't unit test this change, but I performed multiple integration tests and the code is now running in our prod environment for the last few months without issue.
Commits