-
Notifications
You must be signed in to change notification settings - Fork 474
ACCUMULO-4501 (work in progress) Very simple implementation of a visib… #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
One option to consider instead of modifying RFile is to make it a decorator like BloomFilterLayer. BloomFilterLayer stores its information in RFile metadata. I'm think will be problems with this approach, but I would not know what they are w/o actually trying it. Are you considering making this a generic histogram functionality, where the user can configure a function that emits counts for a given Key Value? |
| public long increment(Key k) { | ||
| final Text t = buffer.get(); | ||
| Objects.requireNonNull(k.getColumnVisibility(t)); | ||
| AtomicLong count = histogram.get(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an AtomicLong needed for concurrency reasons? If not, then could create a simple class like MutableLong in MapCounter to avoid volatile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be needed. Using something faster would probably be better.
Sam applies to using an unbounded HashMap for storage (e.g. handling tables with 100's to 1000's of visibilities per file).
| public void append(Key key, Value value) throws IOException { | ||
| Text _text = buffer.get(); | ||
| key.getColumnVisibility(_text); | ||
| AtomicLong count = histogram.get(_text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could use the ByteSequence returned by getColumnVisibilityData() to do the lookup in the map (would require keying map on ByteSequence). This would avoid the copy for lookup. Only copy when inserting into the map.
|
|
||
| public void append(Key key, Value value) throws IOException { | ||
| Text _text = buffer.get(); | ||
| key.getColumnFamily(_text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get the family? Is this duplicate code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, bad copy-paste :)
This is my first foray into the RFile codebase, so I am very happy to be redirected into a different implementation :). My liberal use of increasing visibility on classes ought to be apparent haha.
If we can abstract this specific feature into something more generic without it blowing up, I'm ok with that. I just don't have a big picture view right now. |
|
@joshelser I have not forgotten about this. We had discussed writing up a design doc on IRC. I have just been busy. I plan to take a crack at that Thur. |
|
@keith-turner nbd. This has fallen by the way-side for me too :) |
|
I just posted #180 for review. |
…ility histogram