Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Auto ML operators for Vertex AI service #21470

Merged
merged 12 commits into from
Feb 20, 2022

Conversation

MaksYermak
Copy link
Contributor

Create operators for working with Auto ML for Vertex AI service. Includes operators, hooks, example dags, tests and docs.

Co-authored-by: Wojciech Januszek januszek@google.com
Co-authored-by: Lukasz Wyszomirski wyszomirski@google.com
Co-authored-by: Maksim Yermakou maksimy@google.com


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:providers kind:documentation provider:google Google (including GCP) related issues labels Feb 9, 2022
@MaksYermak MaksYermak changed the title Auto ml vertex ai operators Create Auto ML operators for Vertex AI service Feb 9, 2022
@MaksYermak MaksYermak force-pushed the auto-ml-vertex-ai-operators branch 2 times, most recently from 4a541c5 to 809af22 Compare February 11, 2022 15:59
Comment on lines 43 to 60
class VertexAIModelLink(BaseOperatorLink):
"""Helper class for constructing Vertex AI Model link"""

name = "Vertex AI Model"

def get_link(self, operator, dttm):
model_conf = XCom.get_one(
key='model_conf', dag_id=operator.dag.dag_id, task_id=operator.task_id, execution_date=dttm
)
return (
VERTEX_AI_MODEL_LINK.format(
region=model_conf["region"],
model_id=model_conf["model_id"],
project_id=model_conf["project_id"],
)
if model_conf
else ""
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in #21267 (comment) , let's try to be consistent

Comment on lines +87 to +57
project_id: str,
region: str,
display_name: str,
labels: Optional[Dict[str, str]] = None,
training_encryption_spec_key_name: Optional[str] = None,
model_encryption_spec_key_name: Optional[str] = None,
# RUN
training_fraction_split: Optional[float] = None,
test_fraction_split: Optional[float] = None,
model_display_name: Optional[str] = None,
model_labels: Optional[Dict[str, str]] = None,
sync: bool = True,
gcp_conn_id: str = "google_cloud_default",
delegate_to: Optional[str] = None,
impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that some time ago Google recommended to avoid such interfaces because there was a lot of issues like "can we add foo_bar argument to operator XYZ". Instead it was suggested that an operator should accept a "body" - an object that is accepted by the underlying API.

CC @potiuk @mik-laj @kosteev

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with both approaches as long as Google want to maintain it ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaksYermak Is this consistent with approach that we have in other google operators?
I think it should be good as long as we consistent everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kosteev Yes it is consistent. We have the same approach in other google operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx.

Copy link
Member

@turbaszek turbaszek Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaksYermak @kosteev it's a matter of definition: For example Google AutoML operators accept model argument instead of particular elements of model as we do here.

Some time ago we had a move that resulted in refactoring Dataproc and BigQuery (as well others) operators to follow "single input" approach because it was considered more generic and easier to follow. Also it is more inline with Google change between approach in v1 and v2 clients: https://googleapis.dev/python/dataproc/latest/UPGRADING.html#method-calls

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to go with it. this looks like a nicely generated piece of code, doesn't it?

@potiuk
Copy link
Member

potiuk commented Feb 15, 2022

Looks like there are some import errors though :(

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Feb 15, 2022
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see this comment, let's try to be consistent 👌

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see this comment, let's try to be consistent 👌

@MaksYermak
Copy link
Contributor Author

Please see this comment, let's try to be consistent 👌

@potiuk @turbaszek I have updated links for Vertex AI with new approach. One more thing I with @lwyszomi and @wojsamjan have decided to create a separate package for links. @potiuk @turbaszek what do you think about it?

should have a transformation. If an input column has no transformations on it, such a column is
ignored by the training, except for the targetColumn, which should have no transformations
defined on. Only one of column_transformations or column_specs should be passed. Consider using
column_specs as column_transformations will be deprecated eventually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaksYermak can you please clarify this deprecation eventually?
This is the first time we added this operator why do we warn about deprecation (and if there is something that user needs to be aware of why do we warn only in docstring?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal thank you for your comment. It is my mistake I haven´t noticed that this parameter is deprecated in the google-cloud-aiplatform library. I will create PR with deprecation warning for this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal I have added deprecated warning for this parameter in this PR #24467

@potiuk potiuk added this to the Airflow 2.3.1 milestone May 1, 2022
@ephraimbuddy ephraimbuddy removed this from the Airflow 2.3.1 milestone May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:documentation okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants