New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-19366 trunk - Expose auth mode in system_views.clients, clientstats, metrics #3085
Conversation
Adds 'mode' and 'metadata' fields to AuthenticatedUser to add context about how the user was authenticated and updates system_views.clients, nodetool client stats to include this information. Also adds new metrics to ClientMetrics to help operators identify which authentication modes are being used. for CASSANDRA-19366
src/java/org/apache/cassandra/auth/MutualTlsWithPasswordFallbackAuthenticator.java
Outdated
Show resolved
Hide resolved
int countConnectedClients(Predicate<ServerConnection> predicate) | ||
{ | ||
int count = 0; | ||
for (Channel c : allChannels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when connection is closed while you iterate over the channels? what if channels are closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that even if the Channel is closed, the attributes will still be present. If the ServerConnection
does happen to be null, the instanceof check will fail. The ClientState
on the connection
(which the predicates evaluate) will never be null, so it should generally be safe.
Also, if an exception is thrown here, the calling thread will be a JMX rmi connection thread, which will simply bubble the exception up to the client, so I think it will behave ok even in exceptional states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have some basic test around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 let me think about how I can test this. The challenge will be that I think to reproduce it the channel will need to be closed. This approach was borrowed from other methods on the class (countConnectedClientsByUser
) so maybe there is some existing test precedence I can pull from.
Alternatively I could probably inject an unexpected failure using byteman..will keep thinking on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tolbertam you could also probably mock that when c.attr(Connection.attributeKey).get()
is called that get() returns null or throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ConnectionTrackerTest
that includes:
- Validate that
Channel
lacking aConnection.attributeKey
get skipped over. - Validate that
Channel
having aConnection.attributeKey
that is not aServerConnection
get skipped. - Validate that if
c.attr(Connection.attributeKey).get()
throws an unhandled exception it gets propagated up to the caller (ultimately preventing the calculation from working, which I think is the behavior we want, but let me know if you think that should be different). - Also added tests that validate the expected basic functionality of
countConnectedClients(predicate)
andcountConnectedClientsByUser
(which I didn't change in this PR but figured would be nice to have at a unit level).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! that is enough I guess ...
* Introduce 'AuthenticationMode' enum that encapsulates the supported authentication modes in the included IAuthenticator implementations in Cassandra. * Rename AuthenticatedUser.getMode() -> getAuthenticatedMode() * AuthenticatedUser.getMetadata() is now a Map<String, Object> * Update MutualTlsWithPasswordFallbackAuthenticator.getSupportedAuthenticationModes() to be a union of its two authenticator's modes. * ClientsTable: Rename 'mode' and 'metadata' columns to 'authentication_mode' and 'authentication_metadata' respectively. * ClientMetrics: pass Map in markAuthMeter instead of determining based on provided string. Simplify ConnectedNativeClients gauge init. * ClientStats: Remove --metadata in favor of --verbose, which includes all available fields. Mode -> Auth-Mode, Metadata -> Auth-Metadata * ClientsTableTest: Update to use mTLS authenticated user and verify more fields.
* markAuthMeter was simplified enough that it doesn't need to be a utility method anymore. * public static on AuthenticatorMode not needed as it is an interface member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, I've left some minor comments
/** | ||
* User authenticated using a trusted identity in their client certificate. | ||
*/ | ||
public static final AuthenticationMode MTLS = new AuthenticationMode("Mtls") {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we spell out the label?
public static final AuthenticationMode MTLS = new AuthenticationMode("Mtls") {}; | |
public static final AuthenticationMode MTLS = new AuthenticationMode("Mutual TLS") {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would break JMX as I am not sure how a metric with a space in its name would behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check that the display name does not contain space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'm not sure what's the behavior for jmx. But if spaces break JMX, we should add some guards around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test whether spaces work w/ JMX, I suspect it does, but isn't desirable as might require clients to escape; If it doesn't work, I suggest we throw a RuntimeException in the AuthenticationMode
constructor.
As far as the name, regardless of whether spaces are allowed, how about MutualTls
(for consistency with how the authenticator is named)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it does work:
beans
...
org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Encrypted,type=Client
org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Mutual Tls,type=Client
org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Password,type=Client
org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Unencrypted,type=Client
org.apache.cassandra.metrics:name=ConnectedNativeClients,type=Client
But it isn't easy to use, and I imagine will give some tooling trouble. For it to work with jmxterm for example I had to double \
escape the space:
$>bean org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Mutual Tls,type=Client
#IllegalArgumentException: Please specify domain using either -d option or domain command
$>bean org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Mutual\\ Tls,type=Client
#bean is set to org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Mutual Tls,type=Client
$>get *
#mbean = org.apache.cassandra.metrics:name=ConnectedNativeClients,scope=Mutual Tls,type=Client:
Value = 5;
I think we should document that spaces in the mode name aren't preferred as it may not work well with some tooling. I will do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 8f082a1 and renamed Mtls
-> MutualTls
. I also realized was still using String
in ClientMetrics
instead of AuthenticationMode
so I updated that as well, hopefully that looks good!
Optional.ofNullable(mode) | ||
.flatMap((m) -> Optional.ofNullable(authSuccessByMode.get(m))) | ||
.ifPresent(Meter::mark); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit hard to read, but it is the equivalent of:
Meter meterByMode;
if (mode != null && (meterByMode = authSuccessByMode.get(mode)) != null) {
meterByMode.mark();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 avoids creating some additional Optional objects as well, will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tolbertam you dont need the braces :) just super nit but when I see it I just say it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated 👍
/** | ||
* @deprecated by {@link #markAuthFailure(String)} | ||
*/ | ||
@Deprecated(since="5.1", forRemoval = true) | ||
public void markAuthFailure() | ||
{ | ||
authFailure.mark(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this call markAuthFailure(null)
just like we do for the success case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, this is definitely wrong, will fix it 👍
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
Co-authored-by: Francisco Guerrero <frank.guerrero@gmail.com>
* Use AuthenticationMode instead of string in ClientMetrics * Fix some imports from suggestion changes * null check instead of Optional in mark methods * markAuthFailure should call markAuthFailure(null)
Previously when providing --all --client-options to nodetool clientstats it'd print two tables, which was probably not intended. Adjusts the behavior that establishes the following precedence: --verbose --client-options --all If any combination of these three are present, the one with the most precedence will be used.
Anything additive should be put on the end going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I have no further comments. Looks good to me
To prevent test flakiness, we should block on cassandra user existence in tests. Add AuthTestUtils.waitForExistingRoles() which is adapted from the Auth package which relies on dtest code.
Don't change non-touched tests, we should change this in a separate PR
Adds 'mode' and 'metadata' fields to AuthenticatedUser to add context about how the user was authenticated and updates system_views.clients, nodetool client stats to include this information.
Also adds new metrics to ClientMetrics to help operators identify which authentication modes are being used.
for CASSANDRA-19366