-
Notifications
You must be signed in to change notification settings - Fork 469
Add visibility to grep searches #4468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think the description of the GrepCommand needs to be updated to reflect the change. accumulo/shell/src/main/java/org/apache/accumulo/shell/commands/GrepCommand.java Lines 125 to 129 in 714fdb7
|
|
This is a change in behavior to an existing user-facing class, that will change the data returned. This was likely not done initially because visibilities are typically used as access control, not searchable user data. While there may be some applications for treating the content of visibilities as searchable user data, I don't think it's a typical use case. This will very likely break existing applications. I think this needs to be reverted as-is. To protect existing use cases, an extra option can be added (off by default) to include visibilities in the matching. |
|
@ctubbsii Thanks for the feedback. I have to say though that I disagree. While it may have been done as that isn't typically "searchable user data", I would argue you could also call this a bug as the grep command (and iterator) search every part of the key which is a byte array EXCEPT the visibility. Can you point me to documentation which specifies that the visibility is a special part of the key that should be treated differently in some cases? The data model page makes no such note. On top of that, this PR came out of observing a user of accumulo frustrated as the grep they were running was not matching, when they expected that it would. Seems to me that the logical behavior is that it greps for everything, not that it excludes some parts. |
Each piece of the key are inherently unique and special in the data model, but it's also flexible and you can certainly treat it like other fields, such as in your use case. My point wasn't that visibility must be treated special, but rather that it's also a valid use case if it is. The basis for my objection wasn't to preserve the existing use case, but to show that it's not necessarily a bug because both use cases are valid. Treating both use cases as valid, then this change will break some valid existing workflows. The severity of that breakage could range anywhere from "harmless surprise", to "security-compromise", depending on the additional data returned. I think we can satisfy both use cases, while avoiding making it a breaking change.
Is the main frustration the use of grep in the shell, or is it the GrepIterator itself? I'm okay changing the behavior in the shell, which primarily is intended to support user-interactive ad-hoc workflows anyway. My concern is changing the default behavior of the GrepIterator when used directly in table configuration or as a scan-time iterator. A few things we could do to avoid breaking changes to the GrepIterator:
For the shell, where we don't care about changing behavior as much:
Changing the behavior of the shell's grep command to make it more convenient can be done without changing the behavior of GrepIterator for existing workflows. |
|
Changing the grep iterator to use an option for whether to match against the visibility is fine with me. I think the shell should default that option to true. |
Update the previously merged feature to match on column visibilities using the GrepIterator, which was added in PR #4468 Now, GrepIterator supports a number of fine-grained options to choose exactly which fields to match on. It preserves the behavior prior to the changes in PR #4468 by default for the GrepIterator, but retains the change to the GrepCommand for the shell by enabling the new option to match on column visibilities.
|
I pushed the changes to this in commit bdb8ce9 to add options, keeping the behavior of GrepCommand, but defaulting to the previous behavior for any current user of GrepIterator directly. |
Add visibility as one of the critera that the grep searches