Permalink
Commits on Apr 22, 2011
  1. Search more with the KetamaIterator.

    The existing KetamaIterator implementation, with a small number
    of nodes, may actually hit the same down node multiple times in
    a row leading to failing to find a node when it should find
    another one.
    
    The original libketama[1] hashes each server to 160 numeric
    values.  These are spread out in a 64-bit value.  The key is
    then hashed to a numeric value within that 64-bit value and
    walked forward until it finds a server.
    
    Previously, this library's ketama implementation would only look
    in the consistent hash for a number of iterations limited by the
    number of servers.  With two servers (similar to flipping a
    coin, you'd get heads twice in a row sometimes) you would have
    a 1 in 4 chance of picking the same dead server twice.
    
    The new implementation will iterate based on the number of
    servers, but attempts to keep the probability of hitting the
    same dead server to less than 1% for a two node configuration.
    This will guarantee less than 1% possibility with two or more
    servers.
    
    Because we iterate by simply appending the number of tries on
    the front of the key, we'll be quite random about where in
    the continuium we hit.  Each selection is rather random, but
    for a set of results already calculated, half of which are
    alive and half of which are dead, we can say that in seven
    iterations, there is only a 1/128 [1/(2^7)] chance that we
    would not select once at least one alive server.  The
    probability for any given test still 1/2, but we can describe
    the probability of the iterations.  The key info on this came
    from the "gambler's fallacy"[2].
    
    1. https://github.com/RJ/ketama/blob/master/libketama/ketama.c
    2. http://en.wikipedia.org/wiki/Gambler's_fallacy
    
    Other references:
    http://answers.google.com/answers/threadview/id/568615.html
    http://en.wikipedia.org/wiki/Combinations
    
    Change-Id: I6fa52c0b02516b68ca8da26e4fd85bb1730b82b2
    Reviewed-on: http://review.membase.org/5207
    Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Mar 23, 2011
  2. Separate the KetamaIterator for future dynamic configuration.

    Some future implementations may want to have dynamic changes to
    the nodes list, so the KetamaIterator has been refactored to its
    own class so it can be replaced while a client is instantiated.
    
    Change-Id: I0c8102bf737226c054662b043661ec97907a283b
    Reviewed-on: http://review.membase.org/5206
    Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Apr 1, 2011
Commits on Apr 19, 2011
  1. Fixed cancellation issue.

    It was found that an operation which had been canceled would
    block the rest of the queue from being processed.  This needs
    to be removed from the queue so the other data may flow.
    
    Change-Id: Ibac73fa9816855976b80fd7248b63f36eb2c1b44
    Reviewed-on: http://review.membase.org/5205
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Mar 21, 2011
  2. Fixed minor comment formatting.

    Change-Id: Ie7256580316f08f7bff676525cead3dd872878e1
    Reviewed-on: http://review.membase.org/5204
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Feb 21, 2011
  3. Added ability to see if op unsent but timedout.

    Change-Id: If32de603bdb597db993a22b47ffbe3367e566488
    Reviewed-on: http://review.membase.org/5203
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Feb 21, 2011
  4. Fixed small log typo.

    Change-Id: Ie1b2deffd89e778f5ac0ec4762e73fe5b852f66a
    Reviewed-on: http://review.membase.org/5202
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Feb 21, 2011
  5. Warn when redistribute cannot find another node.

    Change-Id: I7f4eece7d52638c92b305b0f2af35c458e57b0d3
    Reviewed-on: http://review.membase.org/5201
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Feb 21, 2011
Commits on Mar 7, 2011
  1. Spring FactoryBean support.

    Added a Spring FactoryBean for simplifying
    MemcachedClient creation in a Spring applications.
    This is a patch for
    http://code.google.com/p/spymemcached/issues/detail?id=164
    
    Change-Id: Ib4051608631d976487ab8114083f6d32d35258a7
    Reviewed-on: http://review.membase.org/4752
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    eranharel committed with Matt Ingenthron Feb 23, 2011
Commits on Mar 2, 2011
  1. Changed transcoder logging to more appropriate defaults.

    Change-Id: Ia097e245b5be75926165c4e482a86c92a80b5fa0
    Reviewed-on: http://review.membase.org/4612
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    ingenthr committed with Matt Ingenthron Feb 17, 2011
Commits on Jan 12, 2011
  1. Catch RuntimeException instead.

    Timeouts from the get() without a time value specified will return
    simply a RuntimeException, while those from calling the get() with
    a time value can receive a TimeoutException.
    
    This also removes some debugging traces that were left in
    unfortunate places which could also cause test failures.
    
    Change-Id: Ie64aa5bedcbe36b4717c17750a63a08a7de1f12e
    Reviewed-on: http://review.membase.org/4248
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Jan 11, 2011
Commits on Jan 11, 2011
  1. Avoid potential NPE as reported by eclipse.

    Change-Id: I2bedcea366bca83597cc9863da9c63a9966eeee9
    Reviewed-on: http://review.membase.org/4232
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    committed with Matt Ingenthron Jan 10, 2011
  2. Some import cleanups.

    Change-Id: I54bdc264566684208e5273ce51d56f38d14be852
    Reviewed-on: http://review.membase.org/4231
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    committed with Matt Ingenthron Jan 10, 2011
Commits on Jan 10, 2011
  1. Fixes to testSyncGetTimeouts.

    Previous to enforcing the timeouts at an operation level, this test
    would pass.  In fact, Dustin said the test had never failed before.
    
    However, it turns out that the really short default timeouts would
    be too short and not waiting a bit after encountering a timeout would
    still see timeouts.
    
    Change-Id: If1fbe77aa02f7cacabca91915927bf7b5e086284
    Reviewed-on: http://review.membase.org/4211
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 31, 2010
  2. Add a TIMEDOUT state to ops and make callbacks correct.

    There would be some situations where the latch would not expire
    due to the callback not having been called.  Callbacks were
    typically called on state transition for the operation, so I
    thought it appropriate to add a TIMEDOUT state.
    
    Change-Id: Ia02b5bf6a91cf987dae3fc9faf02a41751653773
    Reviewed-on: http://review.membase.org/4212
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Jan 3, 2011
  3. Recognize operation may be null at times. e.g.: flush

    Both the timeout changes from myself and some of the continuious timeout
    changes from Boris assumed there would always be an operation.  In
    some cases, like flush, that is not necessarily the case.
    
    Looking at the existing code, there were lots of guards against null
    access already, so I just continued that tradition.
    
    The tradition may need to be broken though in the future.
    
    Change-Id: Ic1344ef2df2ab0ba4c03b4e401a4f98436a39772
    Reviewed-on: http://review.membase.org/4206
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 6, 2010
  4. Fix for stats sizes test.

    Change-Id: I4d9a13f55ec0c15ebb07c924584aa33492a57a12
    Reviewed-on: http://review.membase.org/4209
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 15, 2010
  5. Test fixes after adding new timeout logic.

    Several tests were expecting things to happen within 1ms,
    which is too short.  The new timeout functionality made these
    tests fail, where before they'd pass.
    
    Change-Id: I81473b25cfd4aa73c8c4473c1f337338162a0222
    Reviewed-on: http://review.membase.org/4210
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 30, 2010
  6. Test for timeout from operation epoch.

    Change-Id: I81530461187509026cc18e995b3ceddcc3c76afb
    Reviewed-on: http://review.membase.org/4208
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 8, 2010
  7. Increased default timeout to 2500ms.

    The increase of the timeout to this seemingly high value is due to
    a few findings.
    
    First, by default, garbage collection times may easily go over 1sec.
    Testing with simple toy tests shows this quite clearly, even on
    systems with lots of CPUs and a decent amount of memory.  Of course,
    much of this can be controlled with GC tuning on the JVM.  With the
    hotspot JVM, look to this whitepaper:
    http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf
    
    Testing showed the following to be particularly useful:
    -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=850
    
    There is a CPU time tradeoff for this.
    
    Even with these, testing showed some 1 second timeouts when GCs near a
    half a second.  To use this software though, we shouldn't expect people
    to have to tune the GC, so raising the default seems like the
    right thing to do.
    
    Second, many systems use spymemcached on virtualized or cloud environments.
    The processes running there do not have any guarantee of execution
    time.  It'd be really unlikely for a thread to be starved for more than
    a second, but it is possible and shouldn't make things stop.  Raising this
    a bit will help.
    
    Third, and perhaps most importantly, most people run applications on
    networks that do not offer any guarantee around response time.  If
    the network is oversubscribed or even minor blips occur on the network
    can cause TCP retransmissions.  While many TCP implementations ignore
    it, RFC 2988 specifies rounding up to 1sec when calculating
    TCP retransmit timeouts.  Blips will occur, and rather than force
    this seemingly synchronous get to timeout, it may be better to
    just wait a bit longer by default.
    
    Change-Id: Ie53ca774458466d9a2e6f70e65ea6663699a9f6f
    Reviewed-on: http://review.membase.org/4207
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Dec 7, 2010
  8. Do not write timedout operations to the MemcachedNode.

    This commit and related ones add support to an operation to
    have new methods and a state of TIMEDOUT.  The intent is to
    keep track of when an operation is created and if it either
    times out due to a latch timeout expiration or it is found
    to be already too old when thinking about sending the op
    to the network, just consider it timed out then and there.
    
    Note, object creation time is actually possibly quite a bit
    after when the request is made, depending on how that request
    is made.  Any number of things could have happened in
    between with GC, JIT, scheduling, etc.
    
    Also note that in order to avoid needing to rely on the latch
    which is in a different thread, this allows us to keep track
    of the creation time of the operation and check for whether or
    not it has timed out via the isTimedOut() method on the
    operation.
    
    Doing this correctly and with as little API change as possible
    required getting the default operation timeout down to the
    MemcachedNode level.  That information was not previously known
    to the Operation or the node, but simply relied on a latch.
    
    Change-Id: I60228433bfa121ed031dd81fc05a9d65cae5bf20
    Reviewed-on: http://review.membase.org/4204
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ingenthr committed with Nov 16, 2010
Commits on Jan 9, 2011
  1. Tiny performance improvement.

    It's OK to have the method return an interface, but use the concrete
    class name in the method so it doesn't need to invoke the methods
    through the interface.
    
    Change-Id: Ibd3638e574f9bd0c0928af5bada53de72a59e9f1
    Reviewed-on: http://review.membase.org/3641
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    blair committed with Matt Ingenthron Nov 12, 2010
Commits on Jan 7, 2011
  1. Minor performance improvement for bulk gets.

    If the size of an ArrayList is known before construction, then pass
    the size to the constructor.  This will either save a tiny bit of
    memory or save reallocation's, depending upon the number of elements
    that will be inserted into the ArrayList.
    
    Change-Id: If1db3a8578e2d8603e0c6dbbe781ed7258908eee
    Reviewed-on: http://review.membase.org/3640
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    blair committed with Matt Ingenthron Nov 12, 2010
Commits on Jul 18, 2010
  1. support timeout based disconnects for bulk ops

    Refactored some repeating logic into helper methods,
    track continuous timeouts from bulk operations,
    log at warning.
    
    Change-Id: Ic0e3d14c8d7ff7001c3440683fa4274b119e4d31
    Reviewed-on: http://review.northscale.com/999
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
    magictoken committed with Matt Ingenthron Jul 1, 2010
Commits on Jun 30, 2010
  1. Allow per-key transcoders to be used with asyncGetBulk().

    This change allows the transcoder to save state for each key.  An
    example is shown in the unit test that encodes into the byte array
    sent to the memcached server the key along with the value.  Upon
    decoding a value from memcached, the actual and expected keys are
    compared.
    
    Change-Id: Ie4697bc3f9923e7c2ba981ca334b0df9d1ab7315
    Reviewed-on: http://review.northscale.com/936
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 28, 2010
  2. Add an iterator that returns a single element forever.

    This iterator will be used to add a version of asyncGetBulk() that
    allows per-key transcoders to be used.  To efficiently allow the
    normal use of multiple keys with a single transcoder, instead of
    creating an array or list of identical transcoders, other
    asyncGetBulk() methods will create a SingleElementInfiniteIterator
    instead.
    
    Change-Id: Ica58e45f3e0e49a72c7a7a8743bf9180ea9cb7ed
    Reviewed-on: http://review.northscale.com/935
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 12, 2010
Commits on Jun 26, 2010
  1. Support daemon mode for TranscodeService threads.

    Change-Id: I28b58a9e1832abedfd8e4177bb38e5cdac158bcb
    Reviewed-on: http://review.northscale.com/920
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    ddlatham committed with Apr 2, 2010
  2. Some minor fixes to make eclipse happy with the code again.

    - Removed @Override annotations where there are no overrides.
    - Renamed a couple of variables that were shadowing fields.
    
    Change-Id: I7e0f74da72214cbe4c72cd693ee11461138f172b
    Reviewed-on: http://review.northscale.com/919
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    committed Jun 22, 2010
Commits on Jun 17, 2010
  1. plug potential file descriptor leak

    There is a problem in open/connect sequence that may produce
    file descriptor leaks in some abnormal conditions:
    First, SocketChannel.open() is called,
    then ch.connect(a.getSocketAddress())
    is called and it may throw an IOException under certain conditions.
    Then we catch exception and re-queue the node.
    This will produce a leak since the channel was not closed.
    
    The problem surfaced because of some faulty changes that I made
    and which I fixed. But it's nevertheless a problem.
    
    Sun  bug reference. http://bugs.sun.com/view_bug.do?bug_id=6548464
    They fixed it in a helper method that opens/connects in a single call.
    But in our case the client needs to take care of it.
    
    One more small unrelated change is to
    catch an unchecked exception and log an error:
    it is a serious case because the node
    will be essentially lost and never re-queued.
    
    Change-Id: I54930eb03f5c07fc6966f8d4d5db42548c63f6bd
    Reviewed-on: http://review.northscale.com:8080/630
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    magictoken committed with Jun 17, 2010
Commits on Jun 16, 2010
  1. return partial data from timed out getBulk

    Augmented Future<T> object returned from asynGetBulk
    with an option to return partial data in case timeout is reached.
    
    Change-Id: I3bac849cee69fd6b57b20139832193b97975f6f6
    Reviewed-on: http://review.northscale.com:8080/563
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    magictoken committed with Jun 15, 2010
Commits on Jun 15, 2010
  1. Fix compilation with JDK 1.5.

    Don't use methods and enum's that only exist in JDK 1.6.
    
    The changes successfully compile and the unit tests pass with JDK 1.6,
    but with 1.5 I consistently get a unit test failure here:
    
        [junit] Testcase: testSimpleLoading took 0.265 sec
        [junit] 	      Caused an ERROR
        [junit] java.lang.RuntimeException: blah
        [junit] java.util.concurrent.ExecutionException: java.lang.RuntimeException: blah
        [junit] 					     at net.spy.memcached.internal.ImmediateFuture.<init>(ImmediateFuture.java:25)
        [junit] 					     at net.spy.memcached.util.CacheLoaderTest.testSimpleLoading(CacheLoaderTest.java:48)
        [junit] 					     at org.jmock.core.VerifyingTestCase.runBare(VerifyingTestCase.java:39)
        [junit] Caused by: java.lang.RuntimeException: blah
        [junit]
        [junit] TEST net.spy.memcached.util.CacheLoaderTest FAILED
        [junit] Tests FAILED
    The following tests failed:
    net.spy.memcached.util.CacheLoaderTest
    
    Change-Id: I34921695bec8bea5f4b8b0bace13951a41b3230a
    Reviewed-on: http://review.northscale.com:8080/561
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 15, 2010
Commits on Jun 12, 2010
  1. Delete a duplicate unit test.

    Change-Id: Ibdbbf79455c2e497ae7e121bd88a4f260e24fa54
    Reviewed-on: http://review.northscale.com:8080/416
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 12, 2010
Commits on Jun 8, 2010
  1. Use a faster method to get a MD5 MessageDigest instance.

    Instead of getting a fresh instance with MessageDigest.getInstance(),
    clone an existing MessageDigest that has not been updated with any
    bytes.
    
    Change-Id: If72e112e93014631767ed68d758728f372e9a7d8
    Reviewed-on: http://review.northscale.com:8080/292
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 3, 2010
Commits on Jun 3, 2010
  1. No need to call MessageDigest#reset() on a brand new MessageDigest.

    Change-Id: I214f4fe0edb07c821f139b61526a2547d9772324
    Reviewed-on: http://review.northscale.com:8080/290
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
    blair committed with Jun 3, 2010
Commits on Jun 2, 2010
  1. Use a private static final byte array for "\r\n" instead of always

    converting the string into a byte array.
    
    Change-Id: I68fc5e9dea99a25e708808f14c8d0a58bd314336
    Reviewed-on: http://review.northscale.com:8080/269
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Eric Lambert <eric.d.lambert@gmail.com>
    Tested-by: Eric Lambert <eric.d.lambert@gmail.com>
    blair committed with Jun 2, 2010
Commits on May 27, 2010
  1. Move continuous timeout counter to individual connections.

    Change-Id: I0d275a824017865714af23abbb0eb61418d5d116
    Reviewed-on: http://review.northscale.com:8080/25
    Reviewed-by: Eric Lambert <eric.d.lambert@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: SeongHwa Ahn <neoconnected@gmail.com>
    Andrey Kartashov committed with May 6, 2010