New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geode 2469 #404
Geode 2469 #404
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi ggreen, would you update the PR to include the missing SetInterpreter class?
Hello,
Sorry about that.
It is there now
geode/SetExecutor.java at GEODE-2469 · ggreen/geode · GitHub
|
|
|
| | |
|
|
|
| |
geode/SetExecutor.java at GEODE-2469 · ggreen/geode · GitHub
geode - Mirror of Apache Geode | |
|
|
On Wednesday, February 22, 2017 12:21 PM, Bruce Schuchardt <notifications@github.com> wrote:
@bschuchardt requested changes on this pull request.Hi ggreen, would you update the PR to include the missing SetInterpreter class?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for the contribution. Could you fix the test failure in |
I like the fact that we are now dealing with collections on a single machine, rather than all the elements spread distributed. Would you possibly have some perf metrics in relation to larger collections. e.g 1mil entries vs 100. I imagine that as the number of entries in the collections grow, so will the insert performance. |
ByteArrayWrapper regionName = HashInterpreter.toRegionNameByteArray(key); | ||
Region<?, ?> region = this.getRegion(regionName); | ||
if (region != null) { | ||
region.remove(HashInterpreter.toEntryKey(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to return from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return true, but that is what the next line after the if statement already returns true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to track following issues
- I think the intention was to create a region for Hashes like "huser:1000", but it is putting key "1000" in the region "region=/ReDiS_HASH". Look following logs
oRegionNameByteArray keyString huser:1000
hashinterpreter getRegion huser
[debug 2017/03/01 14:11:19.675 PST tid=28] PR.virtualPut putting event=EntryEventImpl[op=UPDATE;region=/ReDiS_HASH;key=1000;oldValue=null;newValue={};callbackArg=null;originRemote=false;originMember=hkhamesra-T440p(11928:loner):0:1870ed8b;id=EventID[threadID=2;sequenceID=3]]
-
HIncrByFloatExecutor: we are incrementing field value but not saving that back into region
-
HExistsExecutor: We are not looking value inside map
-
Need to look metaType region logic with new changes; (Fixed this in HExistsExecutor)
|
||
String regionName = key.toString(); | ||
|
||
// check if type is hash TODO: remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to look hash key here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely remove this TODO comment.
The special processing for hashing is now handled at the HashExecutor (along with its sub-classes) commands level. It does not need to be handled at the RegionProvider level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @galen-pivotal ,
You mean why do we need to create a region for an object, correct?
If you mean that, then yes, I guess we could just use the same ReDiS_HASH region for objects and non-objects. Of course, there are pros and cons for each approach.
Prior to this pull request, the Redis adapter created objects for all HASH keys. I originally liked the idea of limiting the creation of regions to just HASH objects. Having a separate region when objects are used can provide additional flexibility. For example, we could pre-define the region with unique definitions based on a given use case.
While testing with Spring Data Redis example, I created a "companies" region with my desired properties that were totally under my control. These regions for objects could be managed separately from the default ReDiS_HASH region. For example, we could different data policies, disk stores, listeners, etc.
What do you think?
RE: "huser:1000 The intent is to detect the use of objects in the Redis HASH commands. The character ":" will appear in the key if an object is used (note: this is unique for HASH commands). The format of the key will be object:key. So if a HASH command was used with the key "huser:1000", this should have resulted in an "huser" region being created @with a key of "1000". If this did not happen, this is a bug. The /ReDiS_HASH region should be used in cases where ":" does not exist in the key (because this is not necessarily an object). I can fix this along with the indicated issues (2-4) |
@ggreen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggreen Redis users often use ":" to number users or other objects (the Redis docs mention this use), but Redis itself isn't aware of the "types" of data -- it just stores them as hashes that can be looked up by strings. Putting the key in a hash region may semantically correspond with the usage of "user:1000" as being an object of type user, but according to the API it's just keys to hashes. I think that having two different routes based on the type of key is unneeded complexity and extra code. I think we should do what's fastest, and add an abstraction later if we want to access data with a particular lens.
The idea of having something in Java that matches the common use case of using hashes to store objects in Redis is nice, though. If we stabilize the API on something that is fast enough, it could be a useful abstraction to access what's already stored.
@@ -442,9 +502,10 @@ protected void checkDataType(ByteArrayWrapper key, RedisDataType type) { | |||
return; | |||
if (currentType == RedisDataType.REDIS_PROTECTED) | |||
throw new RedisDataTypeMismatchException("The key name \"" + key + "\" is protected"); | |||
if (currentType != type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been removed?
// save map | ||
saveMap(map, context, key); | ||
|
||
// if (map.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to remove code rather than comment it out.
if (currentType != type) | ||
throw new RedisDataTypeMismatchException( | ||
"The key name \"" + key + "\" is already used by a " + currentType.toString()); | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to remove code rather than comment it out.
* @return the region were the command's data will be processed | ||
*/ | ||
@SuppressWarnings("unchecked") | ||
public static Region<ByteArrayWrapper, Map<ByteArrayWrapper, ByteArrayWrapper>> getRegion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the abstraction would be cleaner if the HashInterpreter implemented setHash
and getHash
(and setHashKey
and getHashKey
) methods, and callers didn't have to think about regions at all. It would make it clearer where the single point of concern is for where and how hashes get stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galen-pivotal I will look to implement your recommendations. One clarification question, You are recommending that we do not create a separate region for hash objects, but store everything in /ReDiS_HASH with the region key of object:key, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggreen yeah, I'd put everything in one region because I think it's easier to understand, and because it's much cheaper to create new hash objects in a region than it is to create new regions. Though if you want to see my hidden agenda, look ahead:
if I were to redesign data storage in the Redis adapter, I think I would do away with the separate region per type and a metadata region that just stores types, and implement the whole thing as one region that stored collections of all the types we support. During the lookup we could catch a ClassCastException
if the key is the wrong type, and then we'd propagate that up as the same error Redis throws when you try to modify a key that is of the wrong type.
Storing collections as Java objects rather than via some translation to a Region means that as the objects get larger, the cost of transferring them between members in the system increases as well. Geode contains a Delta
interface that I think we could use to avoid the overhead of transferring the whole object every time. Then the only downside is that we can't scale a single hash/set/list across servers, which I think is fine -- do you really need to store a list in Redis that takes more than however many gigabytes of RAM are in a single Geode instance? Some folks on the user/dev lists seem to disagree with this view, though, so take it with a bit of salt.
The other thing I'd like to look at to implement this is how we want to store and process data -- the whole ByteArrayWrapper
/String
thing could be simplified, but I think that's fairly orthogonal to the region and command interpreter implementation.
I don't know that much about the end users of Redis, though, so I'm curious: what are you using the Redis adapter for? What use case are you optimizing for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galen-pivotal: I really like the idea of a single region. The less the number of regions, then the less overhead, but for now I just limited the change to using the Redis_Hash region. This same region will be used for HASH data types of objects and non-objects.
This pull request was originally based on an attempt to get some Spring Data Redis example code to work with the GemFire Redis adaptor.
Also, @wwilliams-pivotal and @hiteshk25: I believe I incorporated your observations.
I also tried to improve the integration tests for the hash related commands.
Please review my latest commit, and let me know if any additional changes are needed as a pre-requisite for accepting this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! If we get a yes vote on marking the Redis adapter as experimental, I'm all for merging this pending a few changes. I put in a few comments, mostly minor. I'll do a bit of looking into the right way to lock entries.
* @param key the raw key | ||
* @return the ByteArray for the key | ||
*/ | ||
public static ByteArrayWrapper toEntryKey(ByteArrayWrapper key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be private?
Command command = (Command) msg; | ||
executeCommand(ctx, command); | ||
} catch (Exception e) { | ||
logger.error(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to be logging exceptions as they propagate, I'd like to think about the best place to catch all of them and also it would be nice to add a little error message so we know where we caught the exception.
@@ -235,6 +242,8 @@ private void executeWithoutTransaction(final Executor exec, Command command) thr | |||
exec.executeCommand(command, this); | |||
return; | |||
} catch (Exception e) { | |||
logger.error(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see error logging comment above
Region<ByteArrayWrapper, Set<ByteArrayWrapper>> setRegion) { | ||
|
||
this(stringsRegion, hLLRegion, redisMetaRegion, expirationsMap, expirationExecutor, | ||
defaultShortcut, hashRegion, setRegion, GemFireCacheImpl.getInstance()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for making the cache an argument to the constructor (even if it's optional). Anything that makes things a little more decoupled.
} else if (type == RedisDataType.REDIS_LIST) { | ||
return this.destroyRegion(key, type); | ||
} else if (type == RedisDataType.REDIS_SET) { | ||
// remove the set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is redundant and the returns aren't necessary (& distract from overall control flow).
A comment saying "sets & hashes get removed from the region they're put in rather than destroying a region" would be fine.
command.setResponse(Coder.getIntegerResponse(context.getByteBufAllocator(), NOT_EXISTS)); | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check the data type here, as Redis gives a different error for a key that is of the wrong datatype than it does for a nonexistent key.
command.setResponse(Coder.getIntegerResponse(context.getByteBufAllocator(), NOT_MOVED)); | ||
return; | ||
} | ||
|
||
Object oldVal = sourceRegion.get(mem); | ||
sourceRegion.remove(mem); | ||
sourceSet = new HashSet<ByteArrayWrapper>(sourceSet); // copy to support transactions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying helps with concurrent modifications on the same machine, but we can still lose set adds 'cause race conditions (A reads, B reads, A adds key "foo" to a copy, B adds key "bar" to a copy, A puts, B puts, the key added by A is lost). I think the txn support in ExecutionHandlerContext
provides transaction support...ExecutionHandlerContext
looks like it's only for MULTI/EXEC.
I think we can do transactions in Geode by locking a particular key, or two keys if they're on the same server. However, if they aren't then we'll just have to say "whoops!" and return an error (Redis cluster does the same thing).
Maybe just mark it as a FIXME for now, and carry on?
(Region<ByteArrayWrapper, Boolean>) context.getRegionProvider().getRegion(key); | ||
if (keyRegion == null || keyRegion.isEmpty()) { | ||
// getRegion(key); | ||
Region<ByteArrayWrapper, Set<ByteArrayWrapper>> region = getRegion(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to check the metadata region to determine whether to throw WRONGTYPE or return nil.
@SuppressWarnings("unchecked") | ||
Region<ByteArrayWrapper, Boolean> keyRegion = | ||
(Region<ByteArrayWrapper, Boolean>) context.getRegionProvider().getRegion(key); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to check the metadata region here again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should make getRegion check the Metadata region and throw a checked TypeException if the type is wrong?
firstSet = new HashSet<ByteArrayWrapper>(region.keySet()); | ||
} | ||
ArrayList<Set<ByteArrayWrapper>> setList = new ArrayList<Set<ByteArrayWrapper>>(); | ||
// if (!isStorage()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gnarly. I'll have to look at this later.
@@ -55,9 +69,11 @@ public void executeCommand(Command command, ExecutionHandlerContext context) { | |||
Object oldValue; | |||
|
|||
if (onlySetOnAbsent()) | |||
oldValue = keyRegion.putIfAbsent(field, new ByteArrayWrapper(value)); | |||
oldValue = map.putIfAbsent(field, new ByteArrayWrapper(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail with ConcurrentModificationException
if other connections try to access the same hash at the same time. I think we need to clone, modify the clone and then put, which will give us eventual consistency but not ACID. (or lock the region, or a single key in the region if that's possible)
Use DLockService to avoid concurrency errors.
@galen-pivotal dlock approach seems too heavy-weight for updating a key. Every hash is single-threaded through this lock. I think the granularity we want is concurrent updates over different hashes, but thread-safe updates of a particular hash. |
… Redis lock service
@galen-pivotal and @metatype, please take a look at my latest change. It uses a java.util.concurrent.locks.Lock instead of a distribute lock for the hash and set synchronization. |
… to correct CI issue
@galen-pivotal and @metatype, are there any suggestions on how to move this pull request forward? |
@ggreen, I haven't had a lot of time to look at this lately. It looks like you checked in some classes in the latest commit; we shouldn't have classes checked into the repository. I'm also having trouble reading your commit history; it's hard to tell what the merge commits did, in particular d3b2642 . If you have a Review Board account, it may be easier to read the changes as diffset, and avoids some of the git merging-develop nastiness. It looks like there's a concurrency bug in the LockService, getting keys out of the WeakHashMap isn't synchronized. I've got a WIP at my fork, branch |
@ggreen If you can clean up the history from where my PR was merged this will be much easier to review. In general, even though we've added the |
Hi Team, I want to use Geode Redis Adaptor in Conductor (https://github.com/Netflix/conductor) that enables the use of other backends implementing the database access interfaces (DAO) as discussed in this link (https://netflix.github.io/conductor/extend/). Best regards, |
@yhilem @ggreen @bbaynes, there is currently a feature branch https://github.com/apache/geode/tree/feature/GEODE-2444, which started splitting the Redis Adapter into its own module. |
What would it take to get this PR merged? |
At this point this PR is a bit out of date. To merge, all of the .class, .pdf, .fig, etc. files would need to be removed, the merge fix done and whatever other cleanup is necessary. It also looks like there may be a concurrency bug, so that should be tested and fixed if necessary. And whatever other fixes are necessary. It's not a trivial amount of work. |
I'm going to close this PR due to its age and inactivity. |
- This work optimizes and simplifies storage of hashes by NOT having a separate region for every hash. Authored-by: Jens Deppe <jdeppe@pivotal.io>
- This work optimizes and simplifies storage of hashes by NOT having a separate region for every hash. Authored-by: Jens Deppe <jdeppe@pivotal.io>
- This work optimizes and simplifies storage of hashes by NOT having a separate region for every hash. Authored-by: Jens Deppe <jdeppe@pivotal.io>
… regions This PR builds on the work originally submitted by Greg Green in apache#404. Also acknowledgements to Galen O'Sullivan for addressing locking issues in that PR. This commit changes the storage model of Redis hashes and sets from one region per each Redis key to a single hash region and single set region. The Redis key is now also the region key and the data is stored in a Map and Set respectively in the region. Currently, the backing values do not implement Delta changes, however this will be a future optimization. This also fixes the inability of Redis keys to contain other characters commonly used, such as colons (':'). - Add `RedisLockService` which manages a lock per key within a single JVM. Locks are held in a WeakHasMap to allow for automatic cleanup (prior PR work, using a pure ConcurrentHashMap, ended up leaking memory since there is no straight-forward way to clean up unused keys/locks). - Add new tests including concurrency tests for hashes and sets Co-authored-by: Greg Green <ggreen@pivotal.io> Co-authored-by: Ray Ingles <ringles@pivotal.io> Co-authored-by: Sarah Abbey <sabbey@pivotal.io> Co-authored-by: John Hutchison <jhutchison@pivotal.io> Co-authored-by: Murtuza Boxwala <mboxwala@pivotal.io> Co-authored-by: Prasath Durairaj <prasathd@pivotal.io> Co-authored-by: Jens Deppe <jdeppe@pivotal.io>
… regions (#4745) * GEODE-7828: Convert backing store for Redis Hashes and Sets to single regions This PR builds on the work originally submitted by Greg Green in #404. Also acknowledgements to Galen O'Sullivan for addressing locking issues in that PR. This commit changes the storage model of Redis hashes and sets from one region per each Redis key to a single hash region and single set region. The Redis key is now also the region key and the data is stored in a Map and Set respectively in the region. Currently, the backing values do not implement Delta changes, however this will be a future optimization. This also fixes the inability of Redis keys to contain other characters commonly used, such as colons (':'). - Add `RedisLockService` which manages a lock per key within a single JVM. Locks are held in a WeakHasMap to allow for automatic cleanup (prior PR work, using a pure ConcurrentHashMap, ended up leaking memory since there is no straight-forward way to clean up unused keys/locks). - Add new tests including concurrency tests for hashes and sets Co-authored-by: Greg Green <ggreen@pivotal.io> Co-authored-by: Ray Ingles <ringles@pivotal.io> Co-authored-by: Sarah Abbey <sabbey@pivotal.io> Co-authored-by: John Hutchison <jhutchison@pivotal.io> Co-authored-by: Murtuza Boxwala <mboxwala@pivotal.io> Co-authored-by: Prasath Durairaj <prasathd@pivotal.io> Co-authored-by: Jens Deppe <jdeppe@pivotal.io>
The updated Geode Redis Adapter now works with a sample Spring Data Redis Example
GEODE-2469.pdf
These changes are focused on the HASH and Set Redis Data Types to support Spring Data Redis sample code located at the following URL
https://github.com/Pivotal-Data-Engineering/gemfire9_examples/tree/person_example_sdg_Tracker139498217/redis/spring-data-redis-example/src/test/java/io/pivotal/redis/gemfire/example/repository
The Hash performance changes from this pull request had a 99.8% performance improvement.
This is based on the Hashes JUNIT Tests.
https://github.com/Pivotal-Data-Engineering/gemfire9_examples/blob/person_example_sdg_Tracker139498217/redis/gemfire-streaming-client/src/test/java/io/pivotal/gemfire9/HashesJUnitTest.java
This code executed in 12.549s against Gemfire 9.0.1 code. After the changes, the test executed in 0.022s with the GEODE-2469 pull request.
Redis Set related command performance had a 99.9% performance improvement.
See https://github.com/Pivotal-Data-Engineering/gemfire9_examples/blob/person_example_sdg_Tracker139498217/redis/gemfire-streaming-client/src/test/java/io/pivotal/gemfire9/SetsJUnitTest.java
The previous Set Junit tests executed against GemFire 9.0.1 executed in 31.507 seconds. These same test executed in 0.036 seconds with the GEODE-2469 pull request changes.
The GemFire 9.0.1 Geode (1.1.0) version for the Geode Redis adapter created a Geode Region for each key provided in the Redis Hash or Set related command. Each Redis command to remove key entry previously destroyed the region. The big performance gain is based on using a new ReDiS_HASH and ReDiS_SET region. Note the changed will create or reuse an existing region with a matching name for Redis HASH commands references objects. For Redis HASH object's key will have a format of object:key
Please see https://redis.io/topics/data-types-intro HASH section on objects for information on Redis objects.