This repository has been archived by the owner on May 12, 2021. It is now read-only.
METRON-1266 Profiler - SASL Authentication Failed #809
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running the Profiler on a cluster that has multiple nodes and is secured by Kerberos, it was observed that the HBaseBolt was unable to write to HBase. The Storm worker running the HBaseBolt logged the following exception. This does not occur all the time and does not occur in all environments.
Changes
To fix this, the
topology.auto-credentials
property needs to be set on the Profiler topology when running in a Kerberized environment. This is similar to how the other topologies, like Enrichment, are already configured.After finding this, the big mystery for me was why this bug did not cause this issue in all kerberized environments, all the time. Surely this miss should break the Profiler when running in any Kerberized environment and so should have been caught sooner.
The problem is obviously that a ticket cannot be found to authenticate when attempting to flush profile measurements to HBase. Due to this configuration miss, the Profiler topology itself is not able to generate Kerberos tickets for authentication. At the same time, if the ticket cache on the worker node is already populated with a valid ticket, then this issue will not occur. The ticket cache can be populated if another process generates a ticket or a user manually kinits on the same node.
This explains why the problem occurs sporadically and only in some environments. This issue is less likely to occur in an environment, like Full Dev, where there are fewer, more active nodes. In this case, it is likely that some other process or user already pre-populated the ticket cache. In a larger, multi-node cluster, the ticket cache is less likely to be populated.
That's my working theory at least. Feel free to refute.
Testing
I tested this by applying the fix in a 12 node Metron cluster. This fixed the problem and allowed the Profiler to write to HBase. I also tested this in Full Dev on both plain vanilla mode and after kerberization.
To test this change, follow these steps.
PROFILE_GET
to retrieve the data from HBase. Ensure that data can be retrieved./opt/sensor-stubs/bin/start-bro-stub
and include--security-protocol=SASL_PLAINTEXT
as an argument to thekafka-console-producer.sh
command. Then run the script in a terminal so that data is flowing through Metron. Also see this for more information.PROFILE_GET
to retrieve the data from HBase. Ensure that new data is being written, post-kerberization.Pull Request Checklist