-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10473][YARN]Login again in the driver to avoid the events lossing. #8942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Why would #8867 not be sufficient?It looks like that should be enough. |
|
@harishreedharan The evenLog will still be stopped by the |
|
Hmm, this might be due to the cached token being missed? So it looks like the token got replaced alright, but it seems like the file could not be written with the new token? @tgravescs might know more about this. I am not sure why this would cause an issue. It looks like the new token cannot be used to write an old file? |
|
Test build #43114 has finished for PR 8942 at commit
|
|
Yeah I'm confused why 8867 didn't work. The only time you need a new token is when the connection goes down and needs to be re-established. If you have an existing connection it will continue to stay up based on the old token. It looks like from the exception that it must have been dropping the connection to the namenode and needs a new one. Based on the exception I'm assuming that the new token isn't be adding properly or propogated to where it needs to be (if someone did a doas for instance and addCredentials isn't updating it for that ugi. Can you tell from the log (HDFS_DELEGATION_TOKEN token 2339 for spark), if 2339 was the original token or the new token? Can you tell that a new token was properly added and is valid? What is your token timeout set at, hopefully its not to low that you are hitting a race with the code that waits a minute to get the new token. I'm actually fine with doing it either way (token or from keytab), but if we do it from keytab I would rather see it more of a conditional where it doesn't add Tokens to the current users UGI if the keytab was supplied. That way it should be in "KERBEROS" mode and just login from the keytab for you. It would also be more obvious in the future what is going on and less prone to being broken by order. You are running in yarn client mode? |
|
Yeah @tgravescs I'm running in yarn client mode. I'm sure that |
|
retest this please |
|
Test build #43383 has finished for PR 8942 at commit
|
|
Hmm, I think the real issue is that the event logging does not doAs. I think in In @SaintBacchus Let me open a PR that does the doAs and combine it with your previous one #8867 and can you test it and see if it works? Or you can do it yourself - just add a |
|
I'm not very clear about how to use |
|
Actually that is not right..I posted an explanation on your other PR. |
|
Here it is: OK, I think I know the issue - the reason is probably that the credentials are cached in the FileSystem instance using which the write happens. Since we are replacing the credentials but not the FileSystem instance itself this might not work, which is why #8942 works. We can do with either that approach or we can replace the FileSystem instance which would require a close and reopen of the file. |
|
We had considered to the way to reopen the file. In that way it may have to consider the synchronization problem between event log producer and consumer with more codes. |
|
Agreed - synchronization is painful and we could end up missing events. |
|
@harishreedharan Can you clarify what is going on? You should not have to replace the FileSystem. The EventLogListener is created in SparkContext and just does a FileSystem.get(). Which should get the cached filesystem or create a new one if one doesn't exist. In updateCredentialsIfRequired it just adds the new credentials to the current user. |
|
So this is my theory (I don't have anything to back this up really). My assumption is based on the fact that if we don't set |
|
Hmm, so looking at the FileSystem cache code it creates a new key that stores: So if the current user changes from when first filesystem is created to when we run then it would have this problem. on the client side I don't think we do any specific runAsUser calls but perhaps its an issue with the keytab login and the user already logged in. So perhaps it makes sense to do something like this PR but I would rather see it a if/else type thing where in the code we either login from keytab or we grab the tokens that way we don't end up with 2 different methods of login and we don't accidentally break things if order of things change. |
|
If the current user's ugi is what is used by the FileSystem cache, this should not really be an issue no? Because we actually do update the current user's credentials. I am ok with doing something like this, but I'd rather know why before adding this. Why would new tokens not work? That seems like an HDFS issue no? Let me test this out with @SaintBacchus's other PR. In client mode, that would really mean that tokens are used by executors and keytab used by AM and the driver. I am in half a mind to suggest not supporting long-running apps in client mode on secure HDFS. |
As discussed with @tgravescs and @harishreedharan at the 8867, if the
SaslRpcClient's authentication is TOKEN, it will have thetoken expiredexception.But if the authentication is KERBEROS`, it will renew the token automatically.
This modify can change to authentication from *TOKEN * into *KERBEROS *.