Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP [AIRFLOW-3601] add location support to BigQuery operators #4409

Closed
wants to merge 99 commits into from

Conversation

yohei1126
Copy link

@yohei1126 yohei1126 commented Dec 31, 2018

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    • location support for BigQueryHook was merged by the PR 4324 [AIRFLOW-3327] Add support for location in BigQueryHook #4324
    • To support BigQuery datasets in other than US and EU, the following operators needs to be updated:
      • bigquery_check_operator.py
      • bigquery_get_data.py
      • bigquery_operator.py
      • bigquery_table_delete_operator.py
      • bigquery_to_bigquery.py
        • BigQueryOperator
        • BigQueryCreateEmptyTableOperator
        • BigQueryCreateExternalTableOperator
        • BigQueryDeleteDatasetOperator
        • BigQueryCreateEmptyDatasetOperator
      • bigquery_to_gcs.py
      • gcs_to_bq.py
      • bigquery_sensor.py

Tests

  • My PR adds the unit tests to check location support:
    • bigquery_to_bigquery.py
      • BigQueryOperator
      • BigQueryCreateEmptyTableOperator
        • test_bigquery_hook.py: TestBigQueryBaseCursor test_create_empty_table_with_location. A table insert request can have location in its request body (doc).
        • test_bigquery_operator.py: BigQueryCreateEmptyTableOperatorTest test_execute
      • BigQueryCreateExternalTableOperator
        • test_bigquery_hook.py: TestBigQueryExternalTableSourceFormat test_table_location A table insert request can have location in its request body (doc).
        • test_bigquery_operator.py: BigQueryCreateExternalTableOperatorTest test_execute
      • BigQueryDeleteDatasetOperator
        • test_bigquery_hook.py: No test since a delete request does not have lcoation in the request body (doc).
        • test_bigquery_operator.py: BigQueryDeleteDatasetOperatorTest test_execute
      • BigQueryCreateEmptyDatasetOperator

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Dec 31, 2018

Codecov Report

Merging #4409 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4409      +/-   ##
=========================================
- Coverage   78.25%   78.2%   -0.05%     
=========================================
  Files         204     204              
  Lines       16433   16445      +12     
=========================================
+ Hits        12859   12861       +2     
- Misses       3574    3584      +10
Impacted Files Coverage Δ
airflow/www/api/experimental/endpoints.py 71.12% <0%> (-19.02%) ⬇️
airflow/api/common/experimental/delete_dag.py 84% <0%> (-4%) ⬇️
airflow/jobs.py 74.88% <0%> (-2.56%) ⬇️
airflow/models/__init__.py 92.55% <0%> (+0.02%) ⬆️
airflow/www/views.py 69.69% <0%> (+0.22%) ⬆️
airflow/www_rbac/views.py 73.71% <0%> (+0.93%) ⬆️
airflow/hooks/samba_hook.py 38.88% <0%> (+38.88%) ⬆️
airflow/operators/hive_to_samba_operator.py 100% <0%> (+100%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b50ffa0...d6e6287. Read the comment docs.

@kaxil
Copy link
Member

kaxil commented Dec 31, 2018

Thanks, I was planning to do this. @yohei1126 Can you please add tests for all the corresponding operators.

@yohei1126
Copy link
Author

yes working on it

@yohei1126 yohei1126 changed the title [AIRFLOW-3601] add location support to BigQuery operators WIP [AIRFLOW-3601] add location support to BigQuery operators Jan 1, 2019
@yohei1126
Copy link
Author

@kaxil Is there any instruction to run specific test locally? I want to debug the following error.

======================================================================
11) ERROR: test_table_location (tests.contrib.hooks.test_bigquery_hook.TestBigQueryExternalTableSourceFormat)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/contrib/hooks/test_bigquery_hook.py line 205 in test_table_location
      cursor.create_external_table(external_project_dataset_table, schema_fields, source_uris)
    airflow/contrib/hooks/bigquery_hook.py line 499 in create_external_table
      self.service.tables().insert(
   AttributeError: 'str' object has no attribute 'tables'

and it is weird the test failed but the job did not fail.
https://travis-ci.org/apache/incubator-airflow/jobs/474431499

@ron819
Copy link
Contributor

ron819 commented Jan 3, 2019

@yohei1126
Copy link
Author

@ron819 Thanks!

@yohei1126
Copy link
Author

$ cd $AIRFLOW_HOME
$ virtualenv env
$ source env/bin/activate
$ pip install -e .[devel]
$ pip install tox
$ tox -e py35-backend_mysql -- tests/contrib/hooks/test_bigquery_hook.py

got this error even after pip install cython

    /Users/01087872/Documents/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/setuptools/dist.py:42: DistDeprecationWarning: Do not call this function
      warnings.warn("Do not call this function", DistDeprecationWarning)
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/w1/9cq510fj2719932_y9wpyhynh55kwm/T/pip-wheel-jatqikpp/pymssql/setup.py", line 88, in <module>
        from Cython.Distutils import build_ext as _build_ext
    ImportError: No module named 'Cython'

eladkal and others added 25 commits January 6, 2019 16:16
To help move away from Minikube, we need to remove the dependency on
a local docker registry and move towards a solution that can be used
in any kubernetes cluster. Custom image names allow users to use
systems like docker, artifactory and gcr
* Fix Type Error for BigQueryOperator and support the unicode object.
* Add tests
* Refactor Kubernetes operator with git-sync

Currently the implementation of git-sync is broken because:
- git-sync clones the repository in /tmp and not in airflow-dags volume
- git-sync add a link to point to the revision required but it is not
taken into account in AIRFLOW__CORE__DAGS_FOLDER

Dags/logs hostPath volume has been added (needed if airflow run in
kubernetes in local environment)

To avoid false positive in CI `load_examples` is set to `False`
otherwise DAGs from `airflow/example_dags` are always loaded. In this
way is possible to test `import` in DAGs

Remove `worker_dags_folder` config:
`worker_dags_folder` is redundant and can lead to confusion.
In WorkerConfiguration `self.kube_config.dags_folder` defines the path of
the dags and can be set in the worker using airflow_configmap
Refactor worker_configuration.py
Use a docker container to run setup.py
Compile web assets
Fix codecov application path

* Fix kube_config.dags_in_image
* Remove dagbag from trigger call

* Adding fix to rbac

* empty commit

* Added create_dagrun to DagModel

* Adding testing to /trigger calls

* Make session a class var
…ce Flake8 test was broken (apache#4415)

The flake8 test in the Travis CI was broken since apache#4361
(apache@7a6acbf )

And some Flake8 errors (code style/quality issues. found in 10 files) were introduce since flake8 test was broken.
To help move away from Minikube, we need to remove the dependency on
a local docker registry and move towards a solution that can be used
in any kubernetes cluster. Custom image names allow users to use
systems like docker, artifactory and gcr
* Fix Type Error for BigQueryOperator and support the unicode object.
* Add tests
fix flake8 errors
* Refactor Kubernetes operator with git-sync

Currently the implementation of git-sync is broken because:
- git-sync clones the repository in /tmp and not in airflow-dags volume
- git-sync add a link to point to the revision required but it is not
taken into account in AIRFLOW__CORE__DAGS_FOLDER

Dags/logs hostPath volume has been added (needed if airflow run in
kubernetes in local environment)

To avoid false positive in CI `load_examples` is set to `False`
otherwise DAGs from `airflow/example_dags` are always loaded. In this
way is possible to test `import` in DAGs

Remove `worker_dags_folder` config:
`worker_dags_folder` is redundant and can lead to confusion.
In WorkerConfiguration `self.kube_config.dags_folder` defines the path of
the dags and can be set in the worker using airflow_configmap
Refactor worker_configuration.py
Use a docker container to run setup.py
Compile web assets
Fix codecov application path

* Fix kube_config.dags_in_image
* Remove dagbag from trigger call

* Adding fix to rbac

* empty commit

* Added create_dagrun to DagModel

* Adding testing to /trigger calls

* Make session a class var
…ce Flake8 test was broken (apache#4415)

The flake8 test in the Travis CI was broken since apache#4361
(apache@7a6acbf )

And some Flake8 errors (code style/quality issues. found in 10 files) were introduce since flake8 test was broken.
To help move away from Minikube, we need to remove the dependency on
a local docker registry and move towards a solution that can be used
in any kubernetes cluster. Custom image names allow users to use
systems like docker, artifactory and gcr
fix flake8 errors

fix error
@yohei1126
Copy link
Author

sorry recreated a PR #4448

@yohei1126 yohei1126 closed this Jan 6, 2019
@yohei1126 yohei1126 deleted the AIRFLOW-3601 branch January 6, 2019 08:34
@yohei1126 yohei1126 restored the AIRFLOW-3601 branch January 6, 2019 08:34
@yohei1126 yohei1126 deleted the AIRFLOW-3601 branch January 7, 2019 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet