Skip to content

[AIRFLOW-5099] Implement Google Cloud AutoML operators#5720

Merged
mik-laj merged 8 commits intoapache:masterfrom
PolideaInternal:gcp-automl
Aug 26, 2019
Merged

[AIRFLOW-5099] Implement Google Cloud AutoML operators#5720
mik-laj merged 8 commits intoapache:masterfrom
PolideaInternal:gcp-automl

Conversation

@turbaszek
Copy link
Member

@turbaszek turbaszek commented Aug 5, 2019

Adds Google Cloud AutoML operators.

Make sure you have checked all steps below.

Jira

Description

  • Adds Google Cloud AutoML operators.

Tests

  • My PR adds the following unit tests:
  • tests/contrib/hooks/test_gcp_automl_hook.py
  • tests/contrib/operators/test_gcp_automl_operator.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@turbaszek
Copy link
Member Author

@potiuk @mik-laj @lwander

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that most of it automatically generated, it looks good :).

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that most of it automatically generated, it looks good :).

Copy link

@lwander lwander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks clean :) Just a few small comments

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need as get_deploy_dag:? It's not referenced in the snippet below unless I've missed something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common to use as name for DAGs. Usually examples contains only one DAG (as dag).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the model_id must be supplied in this payload? Could it help to update the comment to show the name & format of the required field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, model_id is not a part of payload. The payload is described in AutoML docs and it depends on service type. I assume that user will be familiar / will check the official REST API / Python SDK. I think it could be worth to put there a comment with link to payload definition. @lwander WDYT?

@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Aug 5, 2019
@turbaszek turbaszek force-pushed the gcp-automl branch 2 times, most recently from ee0de9e to 25607cc Compare August 6, 2019 13:22
@potiuk
Copy link
Member

potiuk commented Aug 7, 2019

Still some tests fail :(

@turbaszek turbaszek force-pushed the gcp-automl branch 7 times, most recently from e8dbd30 to 25fe17c Compare August 12, 2019 11:50
@mik-laj mik-laj changed the title [AIRFLOW-5099] Implement Google Cloud AutoML operators [AIRFLOW-5099][WIP-DONT-MERGE] Implement Google Cloud AutoML operators Aug 13, 2019
@rasmi
Copy link

rasmi commented Aug 14, 2019

No review comments here, I'm just excited for this to be merged -- thank you all for your work!

@turbaszek turbaszek force-pushed the gcp-automl branch 5 times, most recently from 7cfa113 to 0e69d4e Compare August 22, 2019 14:57
@turbaszek turbaszek changed the title [AIRFLOW-5099][WIP-DONT-MERGE] Implement Google Cloud AutoML operators [AIRFLOW-5099] Implement Google Cloud AutoML operators Aug 22, 2019
)

GCP_PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "your-project-id")
# For now only this location is supported
Copy link
Member

@mik-laj mik-laj Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# For now only this location is supported
# For now only this location is supported

This comment is not needed. It is better not to add information that may change in the documentation. This is a similar problem as with the prices of services in the documentation. Have you noticed that prices are only in one place and no official documentation contains calculations of service prices? When a price is needed, a reference to the table or calculator is added.

timeout: float = None,
metadata: Sequence[Tuple[str, str]] = None,
retry: Retry = None,
) -> Operation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Operation:
) -> Operation:

I think, hook should wait for the result of the operation. Let's talk about it on Monday.

@mik-laj
Copy link
Member

mik-laj commented Aug 26, 2019

Let's merge. We will not be able to resolve inconsistencies with MessageToDict now. This requires work at the level of all GCP operators. I think that in the near future we will face this problem.

@mik-laj mik-laj merged commit c921812 into apache:master Aug 26, 2019
Jerryguo pushed a commit to Jerryguo/airflow that referenced this pull request Sep 2, 2019
@Ark-kun
Copy link

Ark-kun commented Sep 5, 2019

Knowing that most of it automatically generated, it looks good :).

What generator was used here?

@turbaszek
Copy link
Member Author

@Ark-kun here is our internal repo with generators for GCP stuff:
https://github.com/PolideaInternal/airflow-munchkin

It still requires some final touches and minimal improvements but it gives a really nice base for developing operators. All suggestions are appreciated ;)

@turbaszek turbaszek deleted the gcp-automl branch September 19, 2019 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants