New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.SecurityException: "putProviderProperty.SaslPlainServer" and "insertProvider.SaslPlainServer" for :Plugin Repository HDFS #26868
Comments
This relates to #26513 in that the work required is probably very similar. |
My first hunch from looking through the code previously is it relates to: It looks like the permissions are added to the policy file (and being properly picked up) but not being passed in the context. I don't have a test case for this right now but can look it into it later this week. |
Hi again @risdenk. Thanks for opening this issue. Since HDFS takes a lot of liberties in regards to what permissions it has during operation (forking processes, reflective calls, access to user principals), we try to limit the permissions it has during regular operations to be a subset of all the permissions in the policy file (which includes all permissions it needs during the client's first set up). While we do have them included in the policy file by default, the |
Yea I'll try to get to reproducing this cleanly, but likely won't be till next week. I will most likely focus on 6.0.0 to reproduce since that environment has no one on it (and also is closest to master to make PRs easier). My hunch in case this gives any ideas. It looks like the error is with multiple repositories from a quick glance at the logs. The first repository is initialized just fine. The second (and third) are the ones that have the initialization issue. I haven't looked closer as to why this could be. From the last time I looked at the tests, they were all single repository though. Maybe the second repository initialization takes a different path. |
So I have a new hunch from reviewing the stack trace in more detail. I think this has to do with HDFS NameNode high availability. The code being executed is to create a failover proxy between two NameNodes. I don't think this is tested in the test framework since only a single NameNode is used. I am going to try switching the active NameNode in the cluster and see if that makes a difference as well. The following part of the stacktrace is the same in both cases. It didn't appear in #26513.
I haven't gotten this to reproduce cleanly yet outside of the environment though. |
@risdenk That would make sense - After perusing the failover code, I found that when creating an HA-enabled client the HDFS Client code does not perform any operations or create any connections. Instead, the place where all that initialization would normally occur in a non-HA case is chopped off and done lazily when the user code initiates the first operation. Quick question: Are you using this HA repository as a read-only repository like in #26513 ? I'm wondering if it's read only and the lack of a verification step is causing it to leave the HDFS client uninitialized. |
@jbaiera - Yes all of our HDFS repository for 5.x and 6.x are readonly right now for testing (prevent someone from making a snapshot). We use 2.x to write to the HDFS repositories right now. We will switch from readonly to read/write for 5.x when we finish deployments. |
I was able to reproduce this with a nice portable Docker implementation with NameNode HA. It reproduces with 5.6.2 and 6.0.0-rc1. It doesn't look like this has anything to do with readonly. The failure occurs with a read/write repository. The NameNode HA state is important. The error occurs when the first NameNode contacted is in Standby mode and must failover to the active NameNode. This is lazily done after the repository is setup. I haven't tested this but would be curious to see if the additional permissions are needed later if the NameNode HA state were to switch after initial client setup. Here is the repo with everything needed to reproduce. https://github.com/risdenk/elasticsearch_hdfs_kerberos_testing The The logs from the Travis test run if you don't want to set this up locally is here: https://travis-ci.org/risdenk/elasticsearch_hdfs_kerberos_testing/branches I haven't looked into how much effort it would be to make the Vagrant test framework for the HDFS secure tests support NameNode HA. It was a pain to get Docker and Kerberos to play nicely with hostnames. |
Looks like adding the following to MiniDFSNNTopology miniDFSNNTopology = MiniDFSNNTopology.simpleHATopology();
builder.nnTopology(miniDFSNNTopology); |
Thanks for the footwork toward getting a reproduction. I think you might have hit the nail on the head - If the repository already exists and the leading namenode steps down, allowing a different one to take it's place as leader, the ensuing connection failover would require a whole new client to be initialized underneath the failover manager. I'm working on getting this all to play nice in the test framework. I don't like the idea of throwing away our current permission rendering in all cases. If a user isn't using secured HDFS, there's no reason to permit any kerberos operations on the client at that time. In the event that a user IS using secured HDFS, we want to make sure that only the logged in user can initiate kerberos connections, and so on. Most likely what we'll end up doing is sniff the client configuration for HA settings, and if we find any, we'll open up the rendered permissions only in the case of a client failover to allow for the connection to be re-established. |
@jbaiera - Sounds good. Thanks for digging in further. I think I have the minimum required configs necessary to make HDFS HA work (in the docker-compose.yml) and for an Elasticsearch HDFS repository to use HDFS HA in |
@risdenk Wanted to check in here again to say that I have the HDFS fixture spinning up a topology of HA Namenodes in the integration test environment, and was able to get a very very crude reproduction of the error. The biggest challenge with this bug is finding a way to get the Namenode to failover automatically during the integration test run - as right now I'm pulling all the levers by hand. Once all of that is in place, I can start getting a PR together that fixes the issue. Thanks again for all the work you've put into this. HDFS permission issues always prove to be tricky. |
@jbaiera thanks for the update. If there is a branch/PR that you want me to look at let me know. I've been around the Hadoop stuff quite a bit. |
@jbaiera - Any update here? |
Hi @risdenk, automating the HA integration tests has been fairly difficult in terms of keeping them compliant with Elasticsearch's testing conventions. The good news is that I'm about 95% complete with the testing changes; Just need to pass a few more build-level conventions tests around naming and api usage. I'm hoping to have a preliminary PR up for review this week - but the changes to the testing infra are pretty verbose. We'll be testing HA HDFS (secured and unsecured) alongside the regular HDFS tests going forward. I will link back to this issue when the PR is available. |
Thanks for the update @jbaiera |
This is fixed with #27196 |
Thanks @jbaiera! |
Elasticsearch version (
bin/elasticsearch --version
):Version: 5.6.2, Build: 57e20f3/2017-09-23T13:16:45.703Z, JVM: 1.8.0_121
and
Version: 6.0.0-rc1, Build: b9c0df2/2017-09-25T19:11:45.815Z, JVM: 1.8.0_121
Plugins installed:
JVM version (
java -version
):java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-tdc1-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
OS version (
uname -a
if on a Unix-like system):Linux HOSTNAME 3.0.101-0.113.TDC.1.R.0-default #1 SMP Fri Dec 9 04:51:20 PST 2016 (ca32437) x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Elasticsearch plugin HDFS repository fails to create repositories, with the following two errors
java.security.AccessControlException: access denied ("java.security.SecurityPermission" "putProviderProperty.SaslPlainServer")
andjava.security.AccessControlException: access denied ("java.security.SecurityPermission" "insertProvider.SaslPlainServer")
from the JVM security manager.I worked around each in turn by adding to a
java.policy
file and passing to Elasticsearch on startup. The second permission error was only found after adding an exception for the first one.Steps to reproduce:
Provide logs (if relevant):
The stacktraces below are from 5.6.2. I can grab from 6.0.0-rc1 if necessary.
Stacktrace from missing security policy permission
putProviderProperty.SaslPlainServer
Stacktrace from missing security policy permission
putProviderProperty.SaslPlainServer
The text was updated successfully, but these errors were encountered: