Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5682] Allow labels in gcs_to_bq operator #6351

Closed
wants to merge 1 commit into from
Closed

[AIRFLOW-5682] Allow labels in gcs_to_bq operator #6351

wants to merge 1 commit into from

Conversation

lyallcooper
Copy link

@lyallcooper lyallcooper commented Oct 16, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title.

Description

  • Adds support for labels when using the GoogleCloudStorageToBigQuery
    operator.

Tests

  • My PR adds the following unit test: test_bigquery.TestLabelsInRunLoad

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2019

According to the integration guide, you should also add default labels with the Airflow version
https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?ts=5bb72dfd#

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2019

@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Oct 16, 2019
@lyallcooper
Copy link
Author

lyallcooper commented Oct 16, 2019

@mik-laj sorry, I don't seem to have access to that google doc.

@lyallcooper
Copy link
Author

Also it seems the other BQ-related operators that support labels don't pass in the Airflow version as a default label.

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2019

Screenshot 2019-10-16 at 21 33 01

Currently, this integration is being analyzed and will be refactored later to make it comply with the guide. We want to ensure similar behavior for all GCP integrations, but BigQuery just isn't done yet.

@TobKed Do you want to add something? I have the impression that you are responsible for this integration.

@TobKed
Copy link
Contributor

TobKed commented Oct 16, 2019

  1. @mik-laj is right, it will be good idea to set default label to something like:
from airflow.version import version as airflow_version
_AIRFLOW_VERSION = 'v' + airflow_version.replace('.', '-').replace('+', '-')
  1. We have plan to refactor BQ hooks and operators.
    Discussion is here: Review an improvement plan for Bigquery operators PolideaInternal/airflow#231
    One of the idea is to pass both the google.cloud.bigquery.job.QueryJobConfig class as well as dict representation. (see also: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJobConfig.html#google.cloud.bigquery.job.QueryJobConfig)

@codecov-io
Copy link

codecov-io commented Oct 16, 2019

Codecov Report

Merging #6351 into master will decrease coverage by 0.37%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6351      +/-   ##
==========================================
- Coverage   80.07%   79.69%   -0.38%     
==========================================
  Files         616      616              
  Lines       35794    35800       +6     
==========================================
- Hits        28662    28532     -130     
- Misses       7132     7268     +136
Impacted Files Coverage Δ
airflow/operators/gcs_to_bq.py 71.01% <100%> (+0.42%) ⬆️
airflow/gcp/hooks/bigquery.py 70.27% <100%> (+0.08%) ⬆️
airflow/operators/mysql_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/mysql_to_hive.py 0% <0%> (-100%) ⬇️
airflow/executors/sequential_executor.py 47.61% <0%> (-52.39%) ⬇️
airflow/utils/log/colored_log.py 72.72% <0%> (-20.46%) ⬇️
airflow/utils/sqlalchemy.py 77.96% <0%> (-15.26%) ⬇️
airflow/executors/__init__.py 63.26% <0%> (-4.09%) ⬇️
airflow/utils/dag_processing.py 56.23% <0%> (-2.67%) ⬇️
airflow/hooks/hive_hooks.py 75.82% <0%> (-1.79%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2b7f5a...9b2565f. Read the comment docs.

@lyallcooper lyallcooper marked this pull request as ready for review October 16, 2019 23:51
@lyallcooper
Copy link
Author

Alright, updated to add in the airflow-version label automatically

Adds support for labels when using the GoogleCloudStorageToBigQuery
operator.

Additionally automatically adds an `airflow-version` label set to the
current airflow version.
config['labels'], {'label1': 'test1', 'label2': 'test2',
'airflow-version': hook._AIRFLOW_VERSION}
)
mocked_rwc.side_effect = run_with_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use assert_called_once_with instead creating special function? This will make the code clearer.

@mik-laj
Copy link
Member

mik-laj commented Oct 23, 2019

   Summary of failed tests
tests.gcp.hooks.test_bigquery.TestLabelsInRunJob.test_run_query_with_arg Failure:builtins.AssertionError
tests.gcp.hooks.test_bigquery.TestLabelsInRunLoad.test_run_query_with_arg Failure:builtins.AssertionError

Travis is sad. Can you fix it?

@stale
Copy link

stale bot commented Dec 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 7, 2019
@TobKed
Copy link
Contributor

TobKed commented Dec 11, 2019

Hi @lyallcooper could rebase on the newest master please? There are some changes which cause travis to be sad

@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 11, 2019
@stale
Copy link

stale bot commented Jan 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 25, 2020
@stale stale bot closed this Feb 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants