Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate AutoML Tables operators #39752

Merged
merged 2 commits into from
May 24, 2024

Conversation

e-galan
Copy link
Contributor

@e-galan e-galan commented May 22, 2024

Deprecate AutoML Tables operators:

  • AutoMLTablesListTableSpecsOperator
  • AutoMLTablesListColumnSpecsOperator
  • AutoMLTablesUpdateDatasetOperator

and
AutoMLDeployModelOperator.

The Cloud AutoML platform has gone through several deprecations in the recent months. Some features were moved to Cloud Translation and Vertex AI, and some just stopped being supported.

The AutoML Tables operators are trying to make operations on AutoML datasets, such as getting specs of the tables contained within the datasets or updating the datasets. However after its deprecation, the only available functionality for AutoML is AutoML Translation, which is integrated into Cloud Translation and does not support tabular datasets. So we can query the datasets with the operators, but no tables or their columns will be found, making these operators useless and prone to errors.

Tabular datasets are available in Vertex AI, but unfortunately at this moment only AutoMLTablesUpdateDatasetOperator has a substitute Vertex AI operator with similar functions. AutoMLTablesListTableSpecsOperator and AutoMLTablesListColumnSpecsOperator have no substitutes.

The functionality of AutoMLDeployModelOperator is also not supported by Cloud AutoML anymore. Instead the DeployModelOperator from Vertex AI should be used.

This PR is a continuation of #38673 .


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/providers/google/cloud/operators/automl.py Outdated Show resolved Hide resolved
airflow/providers/google/cloud/operators/automl.py Outdated Show resolved Hide resolved
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch from f97c154 to 56d27e7 Compare May 23, 2024 12:29
@e-galan e-galan changed the title Deprecate AutoMLTablesListTableSpecsOperator and AutoMLTablesListColumnSpecsOperator Deprecate AutoML Tables operators May 23, 2024
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch 2 times, most recently from e8fe9df to 919e157 Compare May 23, 2024 13:22
@e-galan e-galan requested a review from Taragolis May 23, 2024 13:34
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch from 919e157 to 0e2ddb8 Compare May 23, 2024 13:56
@Taragolis
Copy link
Contributor

Taragolis commented May 23, 2024

There is no clear what we try to achieve here. Deprecate or Remove functional because there is not available in upstream service? There is quite a different between two of them

If we deprecate something, than we keep it as is with just a warning until one of option happen

  • Finally removed in upstream service / protocol / library
  • We decide that is a good time to release a major version of provider for avoid maintenance hell ( cc: @eladkal Maybe it is a good time to release new major version of Google provider, there is a lot of deprecations from Airflow 1.10)

If there is not available anymore, we could raise an error, with clear reason why it removed. "This not working anymore due to sunset of service XXX please use YYY"

@e-galan
Copy link
Contributor Author

e-galan commented May 23, 2024

There is no clear what we try to achieve here. Deprecate or Remove functional because there is not available in upstream service? There is quite a different between two of them

If we deprecate something, than we keep it as is with just a warning until one of option happen

  • Finally removed in upstream service / protocol / library
  • We decide that is a good time to release a major version of provider for avoid maintenance hell ( cc: @eladkal Maybe it is a good time to release new major version of Google provider, there is a lot of deprecations from Airflow 1.10)

If there is not available anymore, we could raise an error, with clear reason why it removed. "This not working anymore due to sunset of service XXX please use YYY"

@Taragolis I think we're trying to remove the operators, but I was led to believe that we can't just remove it without giving the users some time and hints about the alternatives.

The exceptions that I placed in the __init__method of the operators should do just that - they raise an error informing the users that the operators can no longer be supported, and suggests an alternative, if it exists.

@Taragolis
Copy link
Contributor

Is this operators still work in current version of provider?

@e-galan
Copy link
Contributor Author

e-galan commented May 23, 2024

Is this operators still work in current version of provider?

@Taragolis No, because AutoML platform has deprecated the corresponding functionality several months ago. Please take a moment to read the description to the PR and my previous comments, I tried to explain the situation the best I could.

@Taragolis
Copy link
Contributor

Taragolis commented May 23, 2024

When you told "AutoML platform has deprecated" you mean "AutoML removed / shutdown / some service no longer available / discontinued"?

@e-galan
Copy link
Contributor Author

e-galan commented May 23, 2024

When you told "AutoML platform has deprecated" you mean "AutoML removed / shutdown / some service no longer available / discontinued"?

Yes, I do mean that. Sorry for misunderstanding, but it is the world Google themselves used when describing it, as seen here or here , for example

@Taragolis
Copy link
Contributor

Yeah I've already I found information: https://cloud.google.com/vision/automl/docs/deprecations

  • Service Deprecated from January 23, 2023
  • Service Shutdown from March 31, 2024

We never deprecate this operators before, so we can't deprecate it now, as I mention before deprecate mean "you still could use by your risk but it remove in some time in the future"

What we could do it is raise an error on class initialise (what you already done) or on attempt to import this operators by utilise PEP-562

# airflow/providers/google/cloud/operators/automl.py

...

def __getattr__(name: str):
    # PEP-562: Lazy loaded attributes on python modules

    if name == "AutoMLTablesUpdateDatasetOperator":
        # Don't forget to remove class AutoMLTablesUpdateDatasetOperator, so it would be also gone from else where
        raise ImportError("Here the message why we pending to remove this operator")
    elif ...:
        ...
    else:
        raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

Now it is more about wording and timeframe when we could deprecate operators. We could deprecate them before March 31, 2024, it's too late now 😞

So we should change error message to something like that. It is necessary to mention that the reason for deletion is that the service is no longer available

Operator/Hook/Sensor/Class FooBar has been removed due to shutdown a legacy version of AutoML Vision on March 31, 2024.
see: https://cloud.google.com/vision/automl/docs/deprecations
Suggestion to switch to another Operator/Hook/Sensor/Class if there is any exists or steps what end users could also do.

And finally we should add information about removals in provider CHANGELOG.rst.

.. note::
The default value of ``parquet_row_group_size`` in ``BaseSQLToGCSOperator`` has changed from 1 to
100000, in order to have a default that provides better compression efficiency and performance of
reading the data in the output Parquet files. In many cases, the previous value of 1 resulted in
very large files, long task durations and out of memory issues. A default value of 100000 may require
more memory to execute the operator, in which case users can override the ``parquet_row_group_size``
parameter in the operator. All operators that are derived from ``BaseSQLToGCSOperator`` are affected
when ``export_format`` is ``parquet``: ``MySQLToGCSOperator``, ``PrestoToGCSOperator``,
``OracleToGCSOperator``, ``TrinoToGCSOperator``, ``MSSQLToGCSOperator`` and ``PostgresToGCSOperator``. Due to the above we treat this change as bug fix.

.. note::
Due to future discontinue of `files.upload <https://api.slack.com/changelog/2024-04-a-better-way-to-upload-files-is-here-to-stay>`__
Slack API method the default value of ``SlackAPIFileOperator.method_version`` and ``SqlToSlackApiFileOperator.slack_method_version``
changed from ``v1`` to ``v2``
If you previously use ``v1`` you should check that your application has appropriate scopes:
* **files:write** - for write files.
* **files:read** - for read files (not required if you use Slack SDK >= 3.23.0).
* **channels:read** - get list of public channels, for convert Channel Name to Channel ID.
* **groups:read** - get list of private channels, for convert Channel Name to Channel ID
* **mpim:read** - additional permission for API method **conversations.list**
* **im:read** - additional permission for API method **conversations.list**
If you use ``SlackHook.send_file`` please consider switch to ``SlackHook.send_file_v2``
or ``SlackHook.send_file_v1_to_v2`` methods.

.. note::
Note: this version contains a fix to ``get_blobs_list_async`` method in ``WasbHook`` where it returned
a list of blob names, but advertised (via type hints) that it returns a list of ``BlobProperties`` objects.
This was a bug in the implementation and it was fixed in this release. However, if you were relying on the
previous behaviour, you might need to retrieve ``name`` property from the array elements returned by
this method.

The same valid for the operators with already removed functional but it could be done by the separate PR

@eladkal
Copy link
Contributor

eladkal commented May 24, 2024

If the Google service is not working anymore we can just remove the operators and it will not be considered a breaking change. We just need to explain this in with changelog entry

@e-galan e-galan force-pushed the deprecate-automl-table-operators branch 2 times, most recently from e712859 to 3593c2c Compare May 24, 2024 09:59
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch from 3593c2c to 982d092 Compare May 24, 2024 10:03
@e-galan
Copy link
Contributor Author

e-galan commented May 24, 2024

@Taragolis I updated the error messages in accordance with the template that you proposed. Please check.

@eladkal Do I need to add some info about the operators to CHANGELOG.rst or will it be added during the release?

Update: the CI failed one check, it does not seem related to my PR

@Taragolis Taragolis dismissed their stale review May 24, 2024 11:08

not relevant anymore

@Taragolis
Copy link
Contributor

Do I need to add some info about the operators to CHANGELOG.rst or will it be added during the release?

You have to add it manually, during release process only auto generated will add automatically, just because release manager can't keep in the head all significant/breaking changes between the providers releases

@e-galan
Copy link
Contributor Author

e-galan commented May 24, 2024

Updated CHANGELOG.rst

@e-galan e-galan force-pushed the deprecate-automl-table-operators branch 2 times, most recently from 8543ce6 to 8fb3ff5 Compare May 24, 2024 13:15
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch from 8fb3ff5 to 9a13dc1 Compare May 24, 2024 13:58
@e-galan e-galan force-pushed the deprecate-automl-table-operators branch from 9a13dc1 to a4eb18e Compare May 24, 2024 14:00
@eladkal eladkal merged commit 4fe55e5 into apache:main May 24, 2024
69 checks passed
RNHTTR pushed a commit to RNHTTR/airflow that referenced this pull request Jun 1, 2024
* Deprecate AutoML Table operators

* Update providers.google.CHANGELOG.rst with info about AutoML shutdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants