Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open search integrations #34693

Closed
wants to merge 6 commits into from
Closed

Conversation

cjames23
Copy link
Contributor

This draft is to reduce the overall amount of code to have to review the first time and to get some eyes on an issue with the providers.yaml pre-commit check error. The intention here is a base for OpenSearch functionality. The Operators allow the use of either a high level or low level version of Open Search clients. The logo that you see is likely going to be replaced as it needs to be resized. Additional Unit Tests will be coming in a second revision.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@cjames23 cjames23 marked this pull request as draft September 30, 2023 03:28
@cjames23 cjames23 changed the title [DRAFT] Open search integrations Open search integrations Sep 30, 2023
Comment on lines +29 to +31
class OpenSearchHook(AwsBaseHook):
"""
This Hook provides a thin wrapper around the OpenSearch client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AwsBaseHook/AwsGenericHook uses only for boto3 / botocore clients. Internals of this hook intend to use with this libraries and may work incorrectly if someone call public methods of AwsBaseHook/AwsGenericHook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense then to create a base OpenSearch hook class in an OpenSearch providers package and use that as the base class for an AWS based Hook for OpenSearch? Most of the logic in this could be moved to a base hook for Open Search and then have the AWS Specific one just override the auth method used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it how it works in general. There is some exceptions but most (if not all) of them created years ago. there is also exists hooks in Amazon Provider which not depends on AwsBaseHook, example:

AwsBaseHook it wrapper around one of the boto3 (botocore) clients, e.g.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a good reason to have any operators that do things like create domains in a DAG? With the above that seems like where an open search hook that is a sub class of AwsBaseHook would come in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it might be useful then I don't see any obstacles for add wrappers around this boto3 clients. For hook it is pretty simple, just create simple thin wrapper

class SqsHook(AwsBaseHook):
"""
Interact with Amazon Simple Queue Service.
Provide thin wrapper around :external+boto3:py:class:`boto3.client("sqs") <SQS.Client>`.
Additional arguments (such as ``aws_conn_id``) may be specified and
are passed down to the underlying AwsBaseHook.
.. seealso::
- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
"""
def __init__(self, *args, **kwargs) -> None:
kwargs["client_type"] = "sqs"
super().__init__(*args, **kwargs)

And optionally add useful methods

conn_type = "opensearch"
hook_name = "AWS Open Search Hook"

def __init__(self, *args: Any, open_search_conn_id: str, log_query: bool, **kwargs: Any):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, *args: Any, open_search_conn_id: str, log_query: bool, **kwargs: Any):
def __init__(self, *, open_search_conn_id: str, log_query: bool, **kwargs: Any):

I would recommend to accept in hook constructor keyword only arguments, positional arguments prevent to do some changes in the future without breaking changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should create support of OpenSearch (as product) in a separate provider. Otherwise this feature would be exclusively available only for AWS.

And after that we could add support of AWS OpenSearch (as managed service) into Amazon provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working now on creating a providers package for OpenSearch and moving the main Hook logic there with the hook for AWS Open Search taking that as a base class and only doing overrides of the AWS Open Search specific requirements such as the auth and the extras requiring a region_name.

Copy link
Contributor

@eladkal eladkal Sep 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have seperate PR for open search integration and depend PR for AWS related code on top of it.
It also requires to follow protocol of adding new provider
https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers in terms of mailing list thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let me revert the changes that added the new open search provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are reverted. @eladkal @Taragolis What would be your preference on the approach for the new provider, I can get everything onto a new branch and cut a different PR and then mail the dev list for lazy consensus given open search is an apache licensed product, or do we need a full discussion and vote?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a discussion is needed it seems as it fits the criteria for lazy consensus vote as popular vendor neutral open source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft PR #34705

Comment on lines +31 to +34
"mypy-boto3-appflow>=1.24.0",
"mypy-boto3-rds>=1.24.0",
"mypy-boto3-redshift-data>=1.24.0",
"mypy-boto3-s3>=1.24.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy packages moved out of the amazon-provider

Suggested change
"mypy-boto3-appflow>=1.24.0",
"mypy-boto3-rds>=1.24.0",
"mypy-boto3-redshift-data>=1.24.0",
"mypy-boto3-s3>=1.24.0",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is logo of OpenSearch as a product, logo of Amazon OpenSearch Service (ElasticSearch and OpenSearch)

image
you could take it from https://aws.amazon.com/architecture/icons/

cjames23 added a commit to cjames23/airflow that referenced this pull request Oct 2, 2023
@@ -0,0 +1,189 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite related to this comment thread but are these operators specific to the AWS managed service? Should not these operators be in the opensearch provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is on hold right now until the other PR is ready. Once that is ready I will be updating this PR with pieces in place that handle the separate provider and really what my thought process here is going to be is that the particular Operators for the managed service just need to be sub classes of the Open Search provider ones and replace the hook with the one required for the managed service piece. I am open to other ideas here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like very good to me 👍

from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook


class OpenSearchHook(AwsBaseHook):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this hook needs to be modified after #34705 is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on that now, everything in the amazon provider will have a prefix of Aws. And this hook will be a child of the open search hook and potentially the AwsBaseHook as well to avoid rewriting the logic for getting a boto3 client.

@cjames23
Copy link
Contributor Author

Closing this PR in favor of actually just adding the auth required in the OpenSearch provider to simplify dependency management and code management. I will open a new PR in the coming days for adding the Auth piece to the OpenSearch provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants