Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3701] Google Cloud Vision Product Search operators #4665

Merged
merged 1 commit into from
Feb 22, 2019

Conversation

sprzedwojski
Copy link
Contributor

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Implemented the following Cloud Vision Product Search operators:

  • CloudVisionProductSetCreateOperator
  • CloudVisionProductSetUpdateOperator
  • CloudVisionProductSetGetOperator
  • CloudVisionProductSetDeleteOperator
  • CloudVisionProductCreateOperator
  • CloudVisionProductUpdateOperator
  • CloudVisionProductGetOperator
  • CloudVisionProductDeleteOperator

Tests

  • My PR adds the following unit tests:

test_gcp_vision_hook.py
test_gcp_vision_operator.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@sprzedwojski
Copy link
Contributor Author

CC @kaxil, would be great if you could take a look :)

@codecov-io
Copy link

codecov-io commented Feb 7, 2019

Codecov Report

Merging #4665 into master will increase coverage by 0.06%.
The diff coverage is 80.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4665      +/-   ##
==========================================
+ Coverage   74.61%   74.68%   +0.06%     
==========================================
  Files         431      434       +3     
  Lines       28044    28359     +315     
==========================================
+ Hits        20925    21179     +254     
- Misses       7119     7180      +61
Impacted Files Coverage Δ
airflow/contrib/example_dags/example_gcp_vision.py 0% <0%> (ø)
airflow/contrib/operators/gcp_vision_operator.py 100% <100%> (ø)
airflow/contrib/hooks/gcp_vision_hook.py 82.85% <82.85%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4745910...fa11006. Read the comment docs.

@feng-tao
Copy link
Member

feng-tao commented Feb 8, 2019

not to against this pr, but is it possible to move this pr(or other similar pr) to some other plugin repos (e.g. https://github.com/airflow-plugins) ? Airflow release only happens once every few months. If you want to reiterate your code fast and get a new release for your operator bug fix, it would be better put on a separate repo instead Airflow?

Besides not sure of other committers, I personally have no experience on this kinda operators and it would be hard to review as well.

I would like to hear other committers idea on this as well: @ashb @kaxil @bolkedebruin

@mik-laj
Copy link
Member

mik-laj commented Feb 8, 2019

@feng-tao I've been thinking about it for a long time. Unfortunately, this is not a simple and easy matter

Short story:
I have been watching this topic for a long time and difficulties have appeared that made this change difficult.
See: https://lists.apache.org/thread.html/b216d209abff1644fe2cd9e2c144b921b7a84ceead1c8ef6c35103cd@%3Cdev.airflow.apache.org%3E

Long story:
I've been thinking about Airflow's modularity for a long time. When I started working on Airflow, support for plugins was limited. It was not possible to install the plugin via "pip", so the installation was complicated. In my opinion, this excluded the use of plug-in mechanism in public production solutions.

I found an old PR, which introduced support for plugins installed via pip. I fell in love with this PR, so i tried to revive the old PR. For this purpose, i wrote message to @kaxil on Slack for support. He wrote a comment on December 3 in the reaction.

In the meantime, an alternative PRs has appeared. #4412. It was accepted quickly

The discussion about modularity on the dev list has started. During these discussion, it turned out that splitting Airflow into plugins is not such a simple matter.
https://lists.apache.org/thread.html/b216d209abff1644fe2cd9e2c144b921b7a84ceead1c8ef6c35103cd@%3Cdev.airflow.apache.org%3E
@potiuk (I and @sprzedwojski work in one company) participated in this discussion. We was waiting for community reactions, what will happen next in the matter of accepting AIP-8

If you think that it is worth to start changes on this matter, I invite you to the discussion on the mailing list.

I have the hope that my explanations will be helpful to you.

Thank you very much for your suggestion.

Disclaimer: This is just my personal statement. I described my observations that are public. Other people in my company may think differently. I believe that the presented story allows us to better understand the context.

@feng-tao
Copy link
Member

feng-tao commented Feb 8, 2019

If you are talking about that AIP, I am surely +1 on moving those community built hooks/operators etc into other plugins/packages, but just haven't had time to comment on various threads. But I don't agree if the Airflow committers should be the only maintainers to maintain those packages. I know it is a not very simple solution(which one to move out, which one to stay, who should maintain, how to handle dependency etc).

IMO, It is hard from a user standpoint to decide which one to use, which one is actively maintained and whether it is released in a given Airflow version. In fact, we(Lyft) just rely on the plugin system internally to build various operator/hook which makes our development iteration much faster.

@sprzedwojski
Copy link
Contributor Author

Hi @feng-tao, I think the change you're proposing is a major undertaking and looking through previous discussions on the devlist I see that no consensus has been reached so far as to the exact solution.

I think the idea is worth considering but given that we've been contributing GCP operators this way for half a year now I'd suggest that this PR follow the same pattern.
And if the community decides it's worth moving the operators to other packages/projects we can then refactor all of them.

@kaxil, maybe you could take a look at the PR, given you're previous involvement in PRs with other GCP operators? Many thanks in advance :)

@ashb
Copy link
Member

ashb commented Feb 11, 2019

-1000 to operators in plugins. They are just python modules.

I'm not against them being separate python modules, just 100% against making them plugins, there is no need.

pip install google-airflow-operators

and then

from google.airflow.operators import SomeVisionyThingOperator

Nothing plugin-y needed.

@OmerJog
Copy link
Contributor

OmerJog commented Feb 11, 2019

@feng-tao
It's not that simple to have operators in plugins. operator interact with hook (sometimes more than one).
Hooks are in the core and subject to changes. Sometimes operators can not work without a change that needs to be submitted to the hook. When you make a change in hook it could break other operators that relay on this hook. it's very hard to keep track of that if the operators are in another project.

Your idea of frequent releases for plugins can work only if the extended functionality is unrelated to Airflow hooks. This is a big if.

@sprzedwojski sprzedwojski force-pushed the cloud-vision-product-search-pr branch 2 times, most recently from a030300 to 43ab1b2 Compare February 12, 2019 12:37
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with @feng-tao 's argument, as we have not yet decided an approach on splitting out contrib packages into separate repos, let's merge this one. Once an approach has been finalised and work on it is started we can migrate it into its own repo.

@sprzedwojski A minor comment - Can you please resolve it.

PRODUCT_NAME_TEMPLATE = 'projects/{}/locations/{}/products/{}'


# noinspection PyAbstractClass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove any noinspection stuff. Let's aim to fix this on BaseOperator level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaxil, I removed it. Also I've applied small refactoring according to @mik-laj's suggestion (PolideaInternal#51 (comment)) and renamed the get_client() method in hook to get_conn() to be consistent with other hooks.

"""
import os

# [START howto_operator_vision_retry_import]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this?? We don't use this in docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is used in the docs (example: https://github.com/apache/airflow/pull/4665/files#diff-40e5e831dc73c82a2aba583e5f89f8a8R2035, operator.rst file, line 2035).

We wanted to include imports in the "How-to" documentation to allow the user to copy-paste the code from there and use it right away, without the need to refer to the example DAG's source code.

# [START howto_operator_vision_retry_import]
from google.api_core.retry import Retry
# [END howto_operator_vision_retry_import]
# [START howto_operator_vision_productset_import]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: #4665 (comment)

@sprzedwojski
Copy link
Contributor Author

Added minor refactoring changes, mostly to gcp_vision_hook.py.

@sprzedwojski sprzedwojski force-pushed the cloud-vision-product-search-pr branch 2 times, most recently from 1828879 to 32bb0e6 Compare February 19, 2019 17:18
@sprzedwojski
Copy link
Contributor Author

Added a minor change in test_gcp_vision_operator_system.py: changed the gcp_key used to a CloudVision-specific one.

@sprzedwojski sprzedwojski force-pushed the cloud-vision-product-search-pr branch 4 times, most recently from c937c97 to 2cb869e Compare February 20, 2019 15:59
@sprzedwojski
Copy link
Contributor Author

Hi @kaxil, would you be able to find a moment so we could merge this one?

@kaxil kaxil merged commit ce499bb into apache:master Feb 22, 2019
@kaxil
Copy link
Member

kaxil commented Feb 22, 2019

Thanks @sprzedwojski . I am on annual leave, hence have little to no access to the laptop and internet

@sprzedwojski
Copy link
Contributor Author

Thanks @kaxil, I appreciate you taking the time while on vacations. Enjoy your holidays!

antonimaciej pushed a commit to PolideaInternal/airflow that referenced this pull request Feb 26, 2019
ashb pushed a commit to ashb/airflow that referenced this pull request Mar 20, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants