Skip to content

[AIRFLOW-1688] Automatically add load.time_partitioning to bigquery_hook when table name includes $#2669

Closed
albertocalderari wants to merge 1 commit intoapache:masterfrom
albertocalderari:patch-1
Closed

[AIRFLOW-1688] Automatically add load.time_partitioning to bigquery_hook when table name includes $#2669
albertocalderari wants to merge 1 commit intoapache:masterfrom
albertocalderari:patch-1

Conversation

@albertocalderari
Copy link
Contributor

@albertocalderari albertocalderari commented Oct 6, 2017

Allow automatic creation of a new partitioned tables when the '$' symbol is in the table name adding the option:

load.timePartitioning: {type: 'DAY'}

i.e. my_dataset.my_table$20170101 would be automatically loaded as a new partitioned table without the need of creating it manually.

Dear Airflow maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    The gcs_to_bq operator throws and exception when trying to auto-create a new date partitioned table.
    To allow the table creation the load configuration needs the api option:
    load.timePartitioning:
    {type: 'DAY'}
    I will add a fix to identify date partitioned table from the presence of a $ in the table name and add the option.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    I can't unit test this change, but I performed multiple integration tests and the code is now running in our prod environment for the last few months without issue.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Allow automatic creation of a new partitioned tables when the '$' symbol is in the table name adding the option:

load.timePartitioning: {type: 'DAY'}

i.e. my_dataset.my_table$20170101 would be automatically loaded as a new partitioned table without the need of creating it manually.
@albertocalderari albertocalderari changed the title AIRFLOW-1688-allow-automatic-creation-of-partitioned-table [AIRFLOW-1688] allow-automatic-creation-of-partitioned-table Oct 6, 2017
@codecov-io
Copy link

codecov-io commented Oct 6, 2017

Codecov Report

Merging #2669 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2669      +/-   ##
=========================================
+ Coverage   71.69%   71.7%   +<.01%     
=========================================
  Files         154     154              
  Lines       11807   11807              
=========================================
+ Hits         8465    8466       +1     
+ Misses       3342    3341       -1
Impacted Files Coverage Δ
airflow/models.py 87.14% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfc2f73...643af3c. Read the comment docs.

@albertocalderari albertocalderari changed the title [AIRFLOW-1688] allow-automatic-creation-of-partitioned-table [AIRFLOW-1688] Automatically add load.time_partitioning to bigquery_hook when table name includes $ Oct 16, 2017
}

# if it is a partitioned table ($ is in the table name) add partition load option
if '$' in destination_project_dataset_table:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider a regex here rather than just a check on the $ character

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used $ as it is the partition decorator separator in BQ and won't change.
Also BQ doe not allow special characters in the table or dataset name (except the partition separator).


# if it is a partitioned table ($ is in the table name) add partition load option
if '$' in destination_project_dataset_table:
configuration['load']['timePartitioning'] = dict(type='DAY')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you would want to add support for the other two nested options of timePartitioning: expirationMs and field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on this, will try to add it sometime this week

@wileeam
Copy link
Contributor

wileeam commented Nov 17, 2017

I would also consider some tests (based on the review comments as well)

@albertocalderari
Copy link
Contributor Author

@wileeam I changed the review comments. I did perform integration testing on this change.

@albertocalderari
Copy link
Contributor Author

@wileeam so... I managed to add the extra options and the unit tests. Unfortunately I have also managed to corrupt my fork. I have therefore created a new PR #2820 - #2820

Sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants