Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5350] Fix bug in the num_retires field in BigQueryHook #5955

Merged

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Aug 29, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    The num_retries extra is no set in old connections that were created before 1.10.4, for those fields it's value is None which causes the below error:

From the StackOverflow Post:

 [2019-08-27 02:49:58,076] {cli.py:516} INFO - Running <TaskInstance: cadastro_remessas_paises2.gcs_to_bq 2019-08-27T02:42:43.970619+00:00 [running]> on host cadastroremessaspaises2gcstobq-78f1ea099c3b4e718ba707cb03ffda1e 
    [2019-08-27 02:49:58,136] {logging_mixin.py:95} INFO - [[34m2019-08-27 02:49:58,136[0m] {[34mdiscovery.py:[0m271} INFO[0m - URL being requested: GET [1mhttps://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest[0m[0m 
    [2019-08-27 02:49:59,259] {logging_mixin.py:95} INFO - [[34m2019-08-27 02:49:59,259[0m] {[34mmy_functions_google.py:[0m2224} INFO[0m - Project not included in [1mdestination_project_dataset_table[0m: [1mcadastro_remessas.paises2[0m; using project "[1mbigdata-staging[0m"[0m 
    [2019-08-27 02:49:59,266] {logging_mixin.py:95} INFO - [[34m2019-08-27 02:49:59,266[0m] {[34mdiscovery.py:[0m867} INFO[0m - URL being requested: POST https://www.googleapis.com/bigquery/v2/projects/bigdata-staging/jobs?alt=json[0m 
    [2019-08-27 02:49:59,266] {taskinstance.py:1047} ERROR - unsupported operand type(s) for +: 'NoneType' and 'int' Traceback (most recent call last):   File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 922, in _run_raw_task
        result = task_copy.execute(context=context)   
        File "/airflow/dags/git/subfolder/my_functions_google.py", line 2502, in execute
        cluster_fields=self.cluster_fields)   
        File "/airflow/dags/git/subfolder/my_functions_google.py", line 1396, in run_load
        return self.run_with_configuration(configuration)   
        File "/airflow/dags/git/subfolder/my_functions_google.py", line 1414, in run_with_configuration
        .execute(num_retries=self.num_retries)   
        File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
        return wrapped(*args, **kwargs)   
        File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 851, in execute
        method=str(self.method), body=self.body, headers=self.headers)   
        File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 153, in _retry_request
        for retry_num in range(num_retries + 1): TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
  • TestBigQueryHookWithNumRetries.test_num_retries_is_not_none_by_default

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@kaxil kaxil requested review from ashb, Fokko and mik-laj August 29, 2019 20:40
@kaxil
Copy link
Member Author

kaxil commented Aug 29, 2019

This should be the final bug-fix (hopefully) for 1.10.5

@@ -56,7 +56,7 @@ def __init__(self,
gcp_conn_id=bigquery_conn_id, delegate_to=delegate_to)
self.use_legacy_sql = use_legacy_sql
self.location = location
self.num_retries = self._get_field('num_retries', 5)
self.num_retries = self._get_field('num_retries', 5) or 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated in many places. Does this not cause problems elsewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I have moved it to GCP Base Hook now, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, but we should move this to the GoogleDiscoveryApiHook class in the future.
https://github.com/apache/airflow/pull/5335/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Aug 29, 2019
@kaxil
Copy link
Member Author

kaxil commented Aug 29, 2019

Travis passed on my fork: https://travis-ci.org/kaxil/airflow/builds/578554554

@kaxil kaxil merged commit 890adde into apache:master Aug 29, 2019
@kaxil kaxil deleted the Fix-bug-in-the-num_retires-field-in-BigQueryHook branch August 29, 2019 22:05
kaxil added a commit that referenced this pull request Aug 30, 2019
kaxil added a commit that referenced this pull request Aug 30, 2019
kaxil added a commit that referenced this pull request Aug 30, 2019
Jerryguo pushed a commit to Jerryguo/airflow that referenced this pull request Sep 2, 2019
rajatsri28 pushed a commit to rajatsri28/airflow that referenced this pull request May 13, 2020
msumit pushed a commit to twitter-forks/airflow that referenced this pull request May 14, 2020
…QueryHook (apache#5955) (#48)

cherry-picked from 890adde

Co-authored-by: Rajat Srivastava <rajats@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants