Join GitHub today
Support AWS Elasticsearch Auth #60
AWS Elasticsearch implements a custom signature method to authenticate users .
It would be nice to be able to use this connector to move data into AWS Elasticsearch clusters that require authentication.
This would be really useful. I'm wondering how to implement this without adding dependencies on a bunch of AWS libraries that most people will not need or want on their classpath.
Perhaps with a new property that specifies the class name of a request interceptor? Then this property could be populated with the classname of an AWS request interceptor (like this one: https://github.com/inreachventures/aws-signing-request-interceptor) which adds the required AWS authentication to the ES requests. Then, if you are using AWS's ES, you can drop the required jars into your classpath and specify the request interceptor config in your ES connector config. It's a little cumbersome, am open to other ideas.
I have forked this repo in order to add AWS request signing, but I would like to contribute a solution upstream so I don't need to maintain a separate fork just for AWS's auth stuff.
@thomasdziedzic @zzbennett Definitely seems like a good idea -- I think this will be a matter of exposing a few more configs that are specific to AWS and then wiring up the auth pieces. There's an example of how to do the auth steps in this Jest issue and #77 is working on adding basic authentication support. If anyone is interested in taking a stab, I'd be happy to guide development and review a PR!
Okay, so I'm back to working on the ES connector. I've been mulling this over and although the modifications involved for supporting the AWS authentication are simple, implementing them in a "pluggable" way is somewhat trickier.
Inspired by the pluggable partitioners and formatters in the S3/HDFS connector, this is a possible solution:
Abstract the ES client logic. Currently the connector depends directly on the JestClient and the JestClientFactory. Rather than depending directly on the JestClient for executing ES requests, we could add an ESClient interface and a default implementation that will use the current JestClient logic. A config would be added containing the classname of the ESClient implementation, which would get instantiated using reflection. Most people would use the default for this config, but for people needing the AWS auth (or any kind of special logic around querying ES), they could plop an implementation of the ESClient on their classpath that provides the AWS authentication and change the ESClient classname config. The downsides are it requires a new config that most people won't need to touch, and handling pluggability this way can get a bit unwieldy. It does give users complete control over how the connector queries ES, which could be useful, like if they are doing something fancy like routing to different ES clusters.
Honestly though, for this particular issue it might make more sense to stand up a reverse proxy that will handle the authentication. AWS's ES can do IP based access control, so you could just set up a vanilla nginx reverse proxy and whitelist its IP. Or you could set the proxy up with this.
I guess it boils down to whether it is worth it to abstract the ESClient or not. If the ESClient abstraction makes sense for purposes besides AWS authentication, then handling authentication that way could be easier, otherwise, the reverse proxy is probably the way to go.
Anyone working on a PR for this? Planning to do so myself if not...
We have a company policy that requires signing as per https://docs.aws.amazon.com/general/latest/gr/signature-v4-examples.html
Perhaps a fork specific to AWS elastic search to avoid adding AWS dependencies generally to this connector? Seems a bit heavyweight either way..
Important bit, as already commented out by @joncourt, is that you should issue Signature Version 4 signed requests, basically wrapping all your interaction with the search engine. This operation is of no benefit for any other Elasticsearch installation.
Access control is done with IAM policies, basically allowing or denying HTTP verbs against Resources. This policies let you authorise based on identity but as well on source, etc. This is where both the Signature and the policies take the work of doing the authorisation, at less to my understanding.
From their blog:
I would recommend doing it in a way where people not using AWS does not have to carry a heavy way of AWS deps, for example using a fork.
As well we should not forget that Elasticsearch has support for the security x-packs, this is another way of adding security on top of it as well, but not just that, a fewer people but as well people use https://search-guard.com/ as security solution for elasticsearch.
All of this calls for me for a solution that is portable and let people use their module for security and auth.
I hope it makes sense.