NIFI-997: Periodically Renew Kerberos Tickets#97
NIFI-997: Periodically Renew Kerberos Tickets#97rickysaltzer wants to merge 3 commits intoapache:masterfrom
Conversation
- Renew ticket every 4 hours to avoid inactive Kerberos tickets.
|
I've started digging into this. Testing is the major challenge |
|
The good news. It seems to work when I tested it. One comment on style: I really had to think about what this was doing, and whether the first getValue could return null this is very evident here where you created a method to read the usergroupinfo and then replicate the work of the method the line below. Other comments: I was trying to reason about your error handling if ugi.checkTGTAndReloginFromKeytab threw an IOException. If this was greenfield code, I'd say that behavior I'd expect is that getFileSystem should throw an IOException, but that would change the method signature and change all the processors that use it (and be a breaking api change). Swallowing the exception, then exponentially backing off on a retry almost seems more logical, but I think it is worth discussing. Is there a reason you divide the millisecond times by 1000 for comparisons with seconds instead of multiplying the seconds by 1? You're losing precision on the time stamp. Did you think about trying to get the retry interval be a property instead of being hardcoded? |
|
I agree with you on the tuple within a tuple, it is pretty confusing. I'll work on getting these moved into its own class. You're right, error handling is a bit tricky in this area of the code in order to avoid breaking API changes. I did the division by 1000 since the threshold is in seconds, but I could change that. Precision isn't a big deal here since the renewal isn't millisecond (or even second) time sensitive. |
|
@trkurc would you mind testing this latest commit, as it makes the renewal period configurable now. |
|
I will squash these changes before we commit. Keeping a revision history in this pull request for clarity. |
|
@rickysaltzer - I don't have my kerberos hadoop running atm, but here are some comments: This variable should be camelCase, rather than ALL_CAPS. I think Will send more when I have a chance to build and run. |
|
Good catch, I'll fix that. |
|
I don't think we would have NPE'd because of the order that getFileSystem() ends up being called, but it's good to take care of anyway just in case. |
|
My kerberos hadoop vm has gone south. I've got to rebuild it. |
|
well, think I discovered an (unrelated) bug in the hdfs PutFile processor that will pass a flow file on to success even if the write fails due to SASL problems. |
|
@rickysaltzer one thing I missed is that the HDFSResources class should be static |
|
It looks like due to a bug in hadoop 2.6.0 client code (on a java 8 JVM only?) your code doesn't try to reload the ticket: What Here is how isKeytab is set in 2.6.0 This returns false for me, always! Here is what 2.6.1 is doing I'm going to try a java 7 vm. |
|
And it totally works in java 7. |
|
Also of note, hadoop 2.6 really doesn't like when (max_life < 10m) in your kdc.conf (looks like 2.7 may be better). |
This closes apache#97. Signed-off-by: Aldrin Piri <aldrin@apache.org>
Adding a patch to renew ticket every 4 hours to avoid inactive Kerberos tickets. This was an issue found when running Kerberos enabled Hadoop processors for a long period of time. This technically should have been handled by the Hadoop library, but due to unknown issues, the renewal thread inside of Hadoop doesn't seem to be doing that.
This patch is fairly simplistic, and applies to all Hadoop processors as it's implemented at on the AbstractHadoopProcessor. The kerberos ticket age is checked against a threshold (4 hours is a safe bet) when getFileSystem() is called. If the age exceeds the threshold, we re-login using the UserGroupInformation class before passing back the filesystem.