Skip to content

Conversation

@joshelser
Copy link
Member

…ility histogram

@keith-turner
Copy link
Contributor

One option to consider instead of modifying RFile is to make it a decorator like BloomFilterLayer. BloomFilterLayer stores its information in RFile metadata. I'm think will be problems with this approach, but I would not know what they are w/o actually trying it.

Are you considering making this a generic histogram functionality, where the user can configure a function that emits counts for a given Key Value?

public long increment(Key k) {
final Text t = buffer.get();
Objects.requireNonNull(k.getColumnVisibility(t));
AtomicLong count = histogram.get(t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an AtomicLong needed for concurrency reasons? If not, then could create a simple class like MutableLong in MapCounter to avoid volatile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be needed. Using something faster would probably be better.

Sam applies to using an unbounded HashMap for storage (e.g. handling tables with 100's to 1000's of visibilities per file).

public void append(Key key, Value value) throws IOException {
Text _text = buffer.get();
key.getColumnVisibility(_text);
AtomicLong count = histogram.get(_text);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use the ByteSequence returned by getColumnVisibilityData() to do the lookup in the map (would require keying map on ByteSequence). This would avoid the copy for lookup. Only copy when inserting into the map.


public void append(Key key, Value value) throws IOException {
Text _text = buffer.get();
key.getColumnFamily(_text);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the family? Is this duplicate code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, bad copy-paste :)

@joshelser
Copy link
Member Author

One option to consider instead of modifying RFile is to make it a decorator like BloomFilterLayer. BloomFilterLayer stores its information in RFile metadata. I'm think will be problems with this approach, but I would not know what they are w/o actually trying it.

This is my first foray into the RFile codebase, so I am very happy to be redirected into a different implementation :). My liberal use of increasing visibility on classes ought to be apparent haha.

Are you considering making this a generic histogram functionality, where the user can configure a function that emits counts for a given Key Value?

If we can abstract this specific feature into something more generic without it blowing up, I'm ok with that. I just don't have a big picture view right now.

@keith-turner
Copy link
Contributor

@joshelser I have not forgotten about this. We had discussed writing up a design doc on IRC. I have just been busy. I plan to take a crack at that Thur.

@joshelser
Copy link
Member Author

@keith-turner nbd. This has fallen by the way-side for me too :)

@keith-turner
Copy link
Contributor

I just posted #180 for review.

@asfgit asfgit closed this in 94cdcc4 Mar 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants