Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Enable AWS Elasticsearch IAM connection #965

Merged

Conversation

jacksbox
Copy link
Contributor

@jacksbox jacksbox commented Apr 14, 2021

In order to use haystack in an aws sagemaker environment with an aws elasticsearch domain I needed the possibility to use the aws request signing. (Issue #966)
The ES client already has functionality for this included (see https://elasticsearch-py.readthedocs.io/en/v7.12.0/index.html?highlight=aws#running-on-aws-with-iam) but there is no option yet to funel this from the ElasticsearchDocumentStore to the Elasticsearch class. (at least I did not find an option)

This implementation enables the passing of an aws4auth object (see https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-request-signing.html#es-request-signing-python) to the ElasticsearchDocumentStore which then correctly instanciates the Elasticsearch with the needed request signing options.

This is my first PR in python related code, so there is probably plenty of room for improvement :)

Here is an example of how this new auth method can be used:

import boto3

from requests_aws4auth import AWS4Auth
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
from elasticsearch import RequestsHttpConnection

host = '<vpc_host>'
port = 443
region = 'eu-central-1'
service = 'es'
 
credentials = boto3.Session().get_credentials()
aws4auth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
 
document_store = ElasticsearchDocumentStore(host=host,
                                            port=port,
                                            aws4auth=aws4auth,
                                            # can't be used with default es client version used in e.g. aws sagemaker
                                            embedding_field=None,
                                            index="document")

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jacksbox congrats on your first python PR. Looks rock solid, also the elaborate issue you created next to it.

It makes sense to not add the requests-aws4auth lib to our dependencies like you did. I added two comments so you understand my thought process for the review. Its meant to give you feedback, it is not actually a request for change : )

Great work!

@@ -60,6 +61,7 @@ def __init__(
:param password: password (standard authentication via http_auth)
:param api_key_id: ID of the API key (altenative authentication mode to the above http_auth)
:param api_key: Secret value of the API key (altenative authentication mode to the above http_auth)
:param aws4auth: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we do not add requests-aws4auth as dependency in our requirements.txt for now.

In your next PR you can directly write that you can install the lib with pip install requests-aws4auth to make it even easier for the users : )

@@ -26,6 +26,7 @@ def __init__(
password: str = "",
api_key_id: Optional[str] = None,
api_key: Optional[str] = None,
aws4auth = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we chose to not import the lib lets stay without type annotation for now.

@Timoeller Timoeller merged commit 84f90e8 into deepset-ai:master Apr 14, 2021
@jacksbox
Copy link
Contributor Author

jacksbox commented Apr 15, 2021

@Timoeller cool, thanks for the quick review and merge :) happy about my first contribution.
I would have two follow up questions if you don't mind:

  1. About the type annotation - how would one annotate this in python (without the class code present in the project)?
  2. I would add some info about the usage of the new auth method to the docs - I just need a small hint where there right place to put it would be ;)

@Timoeller
Copy link
Contributor

Happy to have you on board : )

About the type annotation - how would one annotate this in python (without the class code present in the project)?

Its not really straightforward and mypy type annotations can be rather tricky. We used quatation marks in this PR.

the new auth method to the docs - I just need a small hint where there right place to put it would be ;)

@PiffPaffM could you take over here, please?

@PiffPaffM
Copy link
Contributor

Hi @jacksbox, congrats on your first PR :)
In my opinion, it makes sense to add this information to the docs. I would suggest adding it here. There is a box called ElasticSearch. Here you could add a hint on how to use your implementation. Does this make sense to you?

You will find the markdown file for this page here: https://github.com/deepset-ai/haystack/blob/master/docs/_src/usage/usage/document_store.md. The boxes are defined by HTML divs. If you have any questions regarding this, please let me know. I am also open to other suggestions.

I hope that helps. Happy to review your PR for this and discuss it in more detail :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants