Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for sunion and sinter #147

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Added support for sunion and sinter #147

wants to merge 10 commits into from

Conversation

balajivenki
Copy link
Contributor

We needed sinter and sunion and so added implementation for the same. For sinter assigning the first set from s_members for initializing and using Guava Sets.intersection for the rest of the sets. For sunion, I am getting all smembers and just simply pushing into a global set. Please review it and approve the same.

@@ -2429,7 +2431,13 @@ public Long msetnx(String... keysvalues) {

@Override
public Set<String> sinter(String... keys) {
throw new UnsupportedOperationException("not yet implemented");
Set<String> allResults = d_smembers(keys[0]).getResult();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign the global set with the first keys smembers.

Set<String> allResults = d_smembers(keys[0]).getResult();
//loop through keys and intersect to a set, use d_smembers
for(int i = 1; i < keys.length; i++) {
allResults = Sets.intersection(allResults, d_smembers(keys[i]).getResult());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Guava set intersection.

@@ -2448,7 +2456,14 @@ public Long sort(String key, String dstkey) {

@Override
public Set<String> sunion(String... keys) {
throw new UnsupportedOperationException("not yet implemented");
Set<String> allResults = new HashSet<String>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just get all the smembers and push it to a single set.

@ipapapa
Copy link
Contributor

ipapapa commented Oct 19, 2016

I was under the impression that for both SINTER and SUNION we need to fetch the keys of the sets from all the nodes. So we probably need an implementation similar to dyno_scan. Hence in your implementation we are only going to get the sets from the local rack node. Does it make sense?

@balajivenki
Copy link
Contributor Author

balajivenki commented Oct 19, 2016

@ipapapa Inside sunion and sinter I am using d_smembers which is done using executeWithFailover. So in this case it will also talk to the replication as well. For the operation we just need the keys and we can do union or intersection at run time. I have unit tested this in house and it works fine too. Since smembers(a single set) cannot be across nodes as its a single key then it will work fine. This operation is not similar to keys or scan. Please let me know your thoughts.

@ipapapa
Copy link
Contributor

ipapapa commented Oct 19, 2016

@balajivenki all operations are executeWithFailover but Dyno will not talk across nodes to get the keys. That was the point of having a separate dyno_scan, because SCAN from Dynomite provides only the keys from the local node.

SMEMBERS returns all the keys of a single SET. Whereas SUNION and SINTER are operations across sets, and therefore across keys that represent those sets. The keys may potentially be in different Dynomite nodes.

It will work only if all the sets are in the same Dynomite node and you are not using token awareness but rather leverage the coordinator node.

I think the implementation should probably be similar to dyno_scan. In addition, it would be nice for those commands, which return distributed keys, to include unit testing. There is a unit testing platform for dyno_scan so you can leverage that as well.

@balajivenki
Copy link
Contributor Author

@ipapapa During my test I was able to do union and intersection on 3 different sets that is stored on 3 different nodes on the same rack. You are right I am leveraging the coordinator code done for smembers and doing the intersection and union at run time. Even if we implement the way how SCAN is done, we will end up doing it at run time only as Redis node is not aware of the other sets. I will take a look at how dyno_scan is implemented and I am not sure how that will help here as these two are two different options. SCAN and KEYS are scatter and gather operations but this is purely gather and operate on operations.

@balajivenki
Copy link
Contributor Author

One another technical difficulty with executeWithRing is that it looks for operation success plus the token aware is not on String[]. So I have my code changed with executeWithRing and I do union based on List<OperationResult<Set>>. In this case instead of running SUNION on all three nodes(my rack is 3 nodes and 3 keys across all three nodes) it is running

SUNION key1 key2 key3

three times on node1 instead of executing it once on all three nodes(I captured it by running MONITOR command on those nodes). I think it executes them based on the first key.

When I execute

SUNION key1 key2 key3

then it runs it 3 times on node1 where key1 sits

SUNION key2 key1 key3

then it runs it 3 times on node2 where key2 sits

Now even if we solve it then for SINTER we will have completely wrong results because intersection will result as empty in this case.

We cannot simply run

SINTER key1 key2 key3

on three nodes and combine the data because it will be wrong.

In a nutshell I think the best way to do is to use SMEMBERS and do union and intersection from client side and that is what my PR is.

Hope this make sense.

@balajivenki
Copy link
Contributor Author

@ipapapa Any update on this? Does my explanation make sense? Please clarify.

@ipapapa
Copy link
Contributor

ipapapa commented Oct 24, 2016

I have added @timiblossom and @jcacciatore to get their feedback on this as well. You may be correct that SMEMBERS with a single key in every iteration fetches all the sets across nodes, and then you combine the keys to a union/intersection on the client side.

I would suggest adding some unit tests for these operations so that at least we have some basis on the correctness.

@timiblossom
Copy link
Contributor

@balajivenki For that technical issue, since currently Dynomite does not have any extra code to handle SINTER and SUNION to help out, the client or Dyno in this case has to do extra works to overcome this. Here are some info that might help you:

  1. Need to do scatter and gather like this
    private List<OperationResult<ScanResult<String>>> scatterGatherScan(final CursorBasedResult<String> cursor, final String... pattern) {
  2. Client can still send SINTER or SUNION to servers to eliminate some elements but still need to do the post merge on all keys
  3. Client need to break on SINTER (or SUNION) into multiple SINTER (or SUNION) requests. How to break depends on the key list. Each command associated to a sub key list would be sent to a Dynomite node representer for those keys. The idea is to group keys which are co-located to help out the server side and then merge the results on the client side.

You can't simply run "SINTER key1 key2 key3" on all nodes because all requests will get sent to one Dynomite server which own key1. Hence the result will be incorrect as you already observed.

You don't want to convert and send to all nodes like this:
"SINTER key1 key2 key3" -> "SINTER key1 key2 key3", "SINTER key2 key3 key1", and "SINTER key3 key1 key2" because this will be an O(n^2) operations on the cluster and very expensive.

Another option is that Dynomite team can support this command at the server side to do all of the scatter-gather operations and the client only needs to send one such request to any Dynomite coordinator.

Let me know if I am not clear on any points.

Thanks.

@jcacciatore
Copy link
Contributor

@timiblossom is correct, the right way to do this is to group the keys by token prior to sending requests. The mechanics of this should be straightforward.

The broader question for this PR lies in error handling. There are several operations that fall into this category of key-grouped requests and they should all behave in the same manner in terms of error handling, the semantics of which would only apply to the dyno client. My cursory thinking on that is to have configurable policies with sensible defaults that app clients can use to tweak the behavior. More concretely, say key2 points to a large set and that while scanning the set one of the requests fails. Should it be retried, fallback, fail the entire SINTER request ?

Of course, having this functionality properly supported by dynomite itself is the more desirable solution, since any client could then take advantage of this functionality.

@balajivenki
Copy link
Contributor Author

@timiblossom I completely agree with you and my code changes are exactly what you have expected and explained above. SINTER or SUNION cannot be executed on all nodes and so I am using SMEMBERS and do the union or intersection after collecting the data. @jcacciatore For the error handling I can add some NPE checks but the failover thing isnt executeWithFailOver does it for us?

@balajivenki
Copy link
Contributor Author

@ipapapa and @timiblossom any update on this PR? The actual changes meet with the review comments.

@ipapapa
Copy link
Contributor

ipapapa commented Nov 7, 2016

From my side, we need to add proper unit testing for it because we need to keep the quality high and validate the implementation (We recently got burned from a PR that did not have a proper test coverage). @balajivenki there is a unit test framework and a number of unit tests for you to take a look.
I mentioned that in my previous review comment.

@balajivenki You also need to address the (1), (2) and (3) points @timiblossom mentioned above, unless @timiblossom is OK moving forward and pushing off the implementation to the server.

@balajivenki
Copy link
Contributor Author

Thanks @ipapapa I will add the unit test cases to my changes soon.

Copy link
Contributor

@ipapapa ipapapa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the above comments if you want to push this forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants