Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLL++ DataTypes - approximates/estimates number of distinct elements … [JIRA: RIAK-2534] #1380

Closed
wants to merge 52 commits into from

Conversation

zeeshanlakhani
Copy link
Contributor

@zeeshanlakhani zeeshanlakhani commented Apr 4, 2016

…in a set

  • type cleanup and new hll type(s)
  • riak_dt_hll.erl module -> riak_dt behavior, but fits best in riak_kv for now
  • necessary updates for new DT to riak_kv_crdt + riak_kv_crdt_json + riak_object (handle precision props around merge)
  • adds typical DT stats, but also special byte(s) stats for HLL(sets)
  • validation of datatype buckets around precision property and a bit of validation refactoring around dt's
  • able to reduce precision if within acceptable criteria

Still Open Qs:

  • uses branch of riak_api that would eventually be a tag probably? tagging in general really!
  • make sure pipe/yokozuna doesn't break b/c HLL DT. Value should be in use for pipe... YZ use-case may only be storing field to see if it's in a certain range? Thoughts here @JeetKunDoug and @fadushin?

Build Tips

  • I'm setting riak_kv to this branch in riak_ee to run this stuff... no official ee branch yet.
  • I have not created a branch for repl/repl_pb_api that uses the riak_pb feature-zl-hll_datatypes branch in the rebar.config for the api lib. I've just been updating it manually for now.

Related PRs and/or branches to review and/or figure out the right dependecies

zeeshanlakhani and others added 30 commits October 30, 2015 16:25
…ime, meta for build_finished and persist flushed trees

* add open/close (check) meta calls from hashtree lib
* handle closing trees securely, and set next_rebuild to full if not a normal/shutdown reason of the vnode
* Added explicit riak_kv_index_hashtree close/sync_stop call to KV vnode.
  Without it, the node can exit before the hashtree has a chance
  to close because riak_kv_index_hashtree is not supervised inside
  riak_kv, only monitored.
* set_rebuild on close depending on vnode shutdown/normal reason
* mark empty {1,0} within clear_trees and true/false in init, dependent on the vnode being empty or not; have do_new_tree mark correctly based on empty|open
* determine update or flush by next_rebuild of tree on close
* cast to explicitly set incr rebuild after update_perform
…ta-counter+flush

Bugfix/zl/hashtree cleanup for meta counter+flush

Reviewed-by: jonmeredith
Need to extend the vnode manager to clean up non-core stats when
it gets a vnode down message.
Add unit tests to confirm fix to leak in Timer table.
…-clean-exit

Unregister per-vnode stats when cleanly shutting down.

Reviewed-by: JeetKunDoug
Fix memory leak in TimeRef ETS table in memory_backend [RIAK-1576]

Reviewed-by: macintux
- Fixes memory leak in memory backend - #1331
- Fixes stats leak on vnode shutdown  - #1282
Merge/2.0 to 2.2 for leak fixes and hashtree build issue

Reviewed-by: mrallen1
Merging 2.1 -> develop-2.2 for enhanced kv679 logging.

Reviewed-by: javajolt
update branch for riak_dt develop-2.2

Reviewed-by: JeetKunDoug
…deps

Update to point to develop-2.2 branch of riak_pipe/api and bump bitcask to 1.7.4

Reviewed-by: nickelization
bump Bitcask to 1.7.2p2

Reviewed-by: nickelization
Adding a new actual_put/7 function with a MaxCheckFlag so we can bypass
object limits on read repair. This new function could be used by new
features which also want to ignore limits. Old actual_put/6 function
still exists so existing functionality should remain unchanged.
Allow readrepair to ignore object size and sibling limits

Reviewed-by: mrallen1
This merges the 2.0 branch forward into develop-2.2 (note we skipped
merging through 2.1 since that branch is basically end-of-lifed now).

The only changes this pulls in are the ones from PR#1363 for allowing
read repair to ignore object size and sibling limits.

Conflicts:
	rebar.config
	src/riak_kv_vnode.erl
Forward merge 2.0 to develop-2.2, pulling in read repair object size fix

Reviewed-by: bsparrow435
…d after the hashtree has been snapshotted, but before the hash tree update is initiated. This callback is currently being used by the batching feature of Yokozuna, to minimize differences between the YZ and KV AAE hash trees.
Added a version of update that takes a Callback

Reviewed-by: JeetKunDoug
This fixes a race condition that was causing stats corruption if two
processes tried to update the data for the same index at the same time.

As detailed in the code comments, this may not have the best possible
performance characteristics, but it seems to be good enough and is a
very easy workaround for the problems we found.

For future reference, we originally found and reproduced this bug using
yz_aae_test. It didn't always trigger the race, but adding a short
timer:sleep call in between the lookup and insert calls made it very
easy to catch.
…info

Wrap the call to update_index_info in global:trans

Reviewed-by: JeetKunDoug
Pick up cuttlefish changes required a new tag of bitcask. Bumping the patch version in rebar.config
Bump bitcask version to 1.7.2p3

Reviewed-by: bsparrow435
zeeshanlakhani and others added 2 commits May 3, 2016 09:59
Feature zl add expanded is crdt [JIRA: RIAK-2531]

Reviewed-by: bsparrow435
zeeshanlakhani and others added 2 commits May 10, 2016 10:09
  additional values/sibs are involved
- add more exports for diff. use-cases in external applications
…or_calls_from_other_deps

better is_crdt checks

Reviewed-by: JeetKunDoug
@Basho-JIRA
Copy link

[~pbrewer] not sure who to assign this to. The work is in need of a review and plan for release.

_[posted via JIRA by Zeeshan Lakhani]_

@Basho-JIRA
Copy link

[~pbrewer] not sure who to assign this to. The work is in need of a review and plan for release and rollout to clients.

_[posted via JIRA by Zeeshan Lakhani]_

@Basho-JIRA
Copy link

[~zlakhani] This is not assigned to a release yet, but needs to be. I added the 2.next label for now and will change this to "In Review". Doug, Seema and I need to have conversations, soon, about roadmap and where this lands. Thanks for the heads up!

_[posted via JIRA by Patricia Brewer]_

@Basho-JIRA
Copy link

Need to figure out the release train and then get someone assigned to review

_[posted via JIRA by Patricia Brewer]_

Doug Rohrer and others added 10 commits May 23, 2016 18:31
- Making riak_kv_entropy_manager:get_lock have a defined timeout
- Making riak_kv_index_hashtree:expire a cast as that's how it's used.
Fix deadlock in get_lock and expire_trees by:

Reviewed-by: zeeshanlakhani
Fix broken handle_cast

Reviewed-by: fadushin
Dependency updates

Reviewed-by: jvoegele
…in a set

- type cleanup and new hll types
- riak_dt_hll.erl module -> riak_dt behavior, but fits best in riak_kv for now
- necessary updates for new DT to riak_kv_crdt + riak_kv_crdt_json
- adds typical DT stats, but also special byte(s) stats for HLL(sets)
- validation of datatype buckets around precision property and a bit of
- validation refactoring around dt's
- better handling of bucket/bucketprops for
  precision reduction and creating new HLL(set) datatypes
- additional eunit testing for precision and documentation

Still WIP:
- uses branch of riak_api that would eventually be a tag probably?
- move hyper fork to basho-bin or to it's own repo
- make sure pipe/yokozuna don't break b/c HLL DT, add ability
  to use search for hll cards within a range?
- one more r_t to finish for branch - feature-zl-hll_datatypes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants