Permalink
Commits on Apr 22, 2011
  1. Search more with the KetamaIterator.

    ingenthr authored and Matt Ingenthron committed Mar 23, 2011
    The existing KetamaIterator implementation, with a small number
    of nodes, may actually hit the same down node multiple times in
    a row leading to failing to find a node when it should find
    another one.
    
    The original libketama[1] hashes each server to 160 numeric
    values.  These are spread out in a 64-bit value.  The key is
    then hashed to a numeric value within that 64-bit value and
    walked forward until it finds a server.
    
    Previously, this library's ketama implementation would only look
    in the consistent hash for a number of iterations limited by the
    number of servers.  With two servers (similar to flipping a
    coin, you'd get heads twice in a row sometimes) you would have
    a 1 in 4 chance of picking the same dead server twice.
    
    The new implementation will iterate based on the number of
    servers, but attempts to keep the probability of hitting the
    same dead server to less than 1% for a two node configuration.
    This will guarantee less than 1% possibility with two or more
    servers.
    
    Because we iterate by simply appending the number of tries on
    the front of the key, we'll be quite random about where in
    the continuium we hit.  Each selection is rather random, but
    for a set of results already calculated, half of which are
    alive and half of which are dead, we can say that in seven
    iterations, there is only a 1/128 [1/(2^7)] chance that we
    would not select once at least one alive server.  The
    probability for any given test still 1/2, but we can describe
    the probability of the iterations.  The key info on this came
    from the "gambler's fallacy"[2].
    
    1. https://github.com/RJ/ketama/blob/master/libketama/ketama.c
    2. http://en.wikipedia.org/wiki/Gambler's_fallacy
    
    Other references:
    http://answers.google.com/answers/threadview/id/568615.html
    http://en.wikipedia.org/wiki/Combinations
    
    Change-Id: I6fa52c0b02516b68ca8da26e4fd85bb1730b82b2
    Reviewed-on: http://review.membase.org/5207
    Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
  2. Separate the KetamaIterator for future dynamic configuration.

    ingenthr authored and Matt Ingenthron committed Apr 1, 2011
    Some future implementations may want to have dynamic changes to
    the nodes list, so the KetamaIterator has been refactored to its
    own class so it can be replaced while a client is instantiated.
    
    Change-Id: I0c8102bf737226c054662b043661ec97907a283b
    Reviewed-on: http://review.membase.org/5206
    Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Apr 19, 2011
  1. Fixed cancellation issue.

    ingenthr authored and Matt Ingenthron committed Mar 21, 2011
    It was found that an operation which had been canceled would
    block the rest of the queue from being processed.  This needs
    to be removed from the queue so the other data may flow.
    
    Change-Id: Ibac73fa9816855976b80fd7248b63f36eb2c1b44
    Reviewed-on: http://review.membase.org/5205
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
  2. Fixed minor comment formatting.

    ingenthr authored and Matt Ingenthron committed Feb 21, 2011
    Change-Id: Ie7256580316f08f7bff676525cead3dd872878e1
    Reviewed-on: http://review.membase.org/5204
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
  3. Added ability to see if op unsent but timedout.

    ingenthr authored and Matt Ingenthron committed Feb 21, 2011
    Change-Id: If32de603bdb597db993a22b47ffbe3367e566488
    Reviewed-on: http://review.membase.org/5203
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
  4. Fixed small log typo.

    ingenthr authored and Matt Ingenthron committed Feb 21, 2011
    Change-Id: Ie1b2deffd89e778f5ac0ec4762e73fe5b852f66a
    Reviewed-on: http://review.membase.org/5202
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
  5. Warn when redistribute cannot find another node.

    ingenthr authored and Matt Ingenthron committed Feb 21, 2011
    Change-Id: I7f4eece7d52638c92b305b0f2af35c458e57b0d3
    Reviewed-on: http://review.membase.org/5201
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Mar 7, 2011
  1. Spring FactoryBean support.

    eranharel authored and Matt Ingenthron committed Feb 23, 2011
    Added a Spring FactoryBean for simplifying
    MemcachedClient creation in a Spring applications.
    This is a patch for
    http://code.google.com/p/spymemcached/issues/detail?id=164
    
    Change-Id: Ib4051608631d976487ab8114083f6d32d35258a7
    Reviewed-on: http://review.membase.org/4752
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Mar 2, 2011
  1. Changed transcoder logging to more appropriate defaults.

    ingenthr authored and Matt Ingenthron committed Feb 17, 2011
    Change-Id: Ia097e245b5be75926165c4e482a86c92a80b5fa0
    Reviewed-on: http://review.membase.org/4612
    Reviewed-by: Blair Zajac <blair@orcaware.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Jan 12, 2011
  1. Catch RuntimeException instead.

    ingenthr authored and dustin committed Jan 11, 2011
    Timeouts from the get() without a time value specified will return
    simply a RuntimeException, while those from calling the get() with
    a time value can receive a TimeoutException.
    
    This also removes some debugging traces that were left in
    unfortunate places which could also cause test failures.
    
    Change-Id: Ie64aa5bedcbe36b4717c17750a63a08a7de1f12e
    Reviewed-on: http://review.membase.org/4248
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
Commits on Jan 11, 2011
  1. Avoid potential NPE as reported by eclipse.

    dustin authored and Matt Ingenthron committed Jan 10, 2011
    Change-Id: I2bedcea366bca83597cc9863da9c63a9966eeee9
    Reviewed-on: http://review.membase.org/4232
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
  2. Some import cleanups.

    dustin authored and Matt Ingenthron committed Jan 10, 2011
    Change-Id: I54bdc264566684208e5273ce51d56f38d14be852
    Reviewed-on: http://review.membase.org/4231
    Tested-by: Matt Ingenthron <matt@northscale.com>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
Commits on Jan 10, 2011
  1. Fixes to testSyncGetTimeouts.

    ingenthr authored and dustin committed Dec 31, 2010
    Previous to enforcing the timeouts at an operation level, this test
    would pass.  In fact, Dustin said the test had never failed before.
    
    However, it turns out that the really short default timeouts would
    be too short and not waiting a bit after encountering a timeout would
    still see timeouts.
    
    Change-Id: If1fbe77aa02f7cacabca91915927bf7b5e086284
    Reviewed-on: http://review.membase.org/4211
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
  2. Add a TIMEDOUT state to ops and make callbacks correct.

    ingenthr authored and dustin committed Jan 3, 2011
    There would be some situations where the latch would not expire
    due to the callback not having been called.  Callbacks were
    typically called on state transition for the operation, so I
    thought it appropriate to add a TIMEDOUT state.
    
    Change-Id: Ia02b5bf6a91cf987dae3fc9faf02a41751653773
    Reviewed-on: http://review.membase.org/4212
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
  3. Recognize operation may be null at times. e.g.: flush

    ingenthr authored and dustin committed Dec 6, 2010
    Both the timeout changes from myself and some of the continuious timeout
    changes from Boris assumed there would always be an operation.  In
    some cases, like flush, that is not necessarily the case.
    
    Looking at the existing code, there were lots of guards against null
    access already, so I just continued that tradition.
    
    The tradition may need to be broken though in the future.
    
    Change-Id: Ic1344ef2df2ab0ba4c03b4e401a4f98436a39772
    Reviewed-on: http://review.membase.org/4206
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
  4. Fix for stats sizes test.

    ingenthr authored and dustin committed Dec 15, 2010
    Change-Id: I4d9a13f55ec0c15ebb07c924584aa33492a57a12
    Reviewed-on: http://review.membase.org/4209
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
  5. Test fixes after adding new timeout logic.

    ingenthr authored and dustin committed Dec 30, 2010
    Several tests were expecting things to happen within 1ms,
    which is too short.  The new timeout functionality made these
    tests fail, where before they'd pass.
    
    Change-Id: I81473b25cfd4aa73c8c4473c1f337338162a0222
    Reviewed-on: http://review.membase.org/4210
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
  6. Test for timeout from operation epoch.

    ingenthr authored and dustin committed Dec 8, 2010
    Change-Id: I81530461187509026cc18e995b3ceddcc3c76afb
    Reviewed-on: http://review.membase.org/4208
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
  7. Increased default timeout to 2500ms.

    ingenthr authored and dustin committed Dec 7, 2010
    The increase of the timeout to this seemingly high value is due to
    a few findings.
    
    First, by default, garbage collection times may easily go over 1sec.
    Testing with simple toy tests shows this quite clearly, even on
    systems with lots of CPUs and a decent amount of memory.  Of course,
    much of this can be controlled with GC tuning on the JVM.  With the
    hotspot JVM, look to this whitepaper:
    http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf
    
    Testing showed the following to be particularly useful:
    -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=850
    
    There is a CPU time tradeoff for this.
    
    Even with these, testing showed some 1 second timeouts when GCs near a
    half a second.  To use this software though, we shouldn't expect people
    to have to tune the GC, so raising the default seems like the
    right thing to do.
    
    Second, many systems use spymemcached on virtualized or cloud environments.
    The processes running there do not have any guarantee of execution
    time.  It'd be really unlikely for a thread to be starved for more than
    a second, but it is possible and shouldn't make things stop.  Raising this
    a bit will help.
    
    Third, and perhaps most importantly, most people run applications on
    networks that do not offer any guarantee around response time.  If
    the network is oversubscribed or even minor blips occur on the network
    can cause TCP retransmissions.  While many TCP implementations ignore
    it, RFC 2988 specifies rounding up to 1sec when calculating
    TCP retransmit timeouts.  Blips will occur, and rather than force
    this seemingly synchronous get to timeout, it may be better to
    just wait a bit longer by default.
    
    Change-Id: Ie53ca774458466d9a2e6f70e65ea6663699a9f6f
    Reviewed-on: http://review.membase.org/4207
    Reviewed-by: Trond Norbye <trond.norbye@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Michael Wiederhold <mike@membase.com>
    Tested-by: Dustin Sallings <dustin@spy.net>
  8. Do not write timedout operations to the MemcachedNode.

    ingenthr authored and dustin committed Nov 16, 2010
    This commit and related ones add support to an operation to
    have new methods and a state of TIMEDOUT.  The intent is to
    keep track of when an operation is created and if it either
    times out due to a latch timeout expiration or it is found
    to be already too old when thinking about sending the op
    to the network, just consider it timed out then and there.
    
    Note, object creation time is actually possibly quite a bit
    after when the request is made, depending on how that request
    is made.  Any number of things could have happened in
    between with GC, JIT, scheduling, etc.
    
    Also note that in order to avoid needing to rely on the latch
    which is in a different thread, this allows us to keep track
    of the creation time of the operation and check for whether or
    not it has timed out via the isTimedOut() method on the
    operation.
    
    Doing this correctly and with as little API change as possible
    required getting the default operation timeout down to the
    MemcachedNode level.  That information was not previously known
    to the Operation or the node, but simply relied on a latch.
    
    Change-Id: I60228433bfa121ed031dd81fc05a9d65cae5bf20
    Reviewed-on: http://review.membase.org/4204
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jan 9, 2011
  1. Tiny performance improvement.

    blair authored and Matt Ingenthron committed Nov 12, 2010
    It's OK to have the method return an interface, but use the concrete
    class name in the method so it doesn't need to invoke the methods
    through the interface.
    
    Change-Id: Ibd3638e574f9bd0c0928af5bada53de72a59e9f1
    Reviewed-on: http://review.membase.org/3641
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Jan 7, 2011
  1. Minor performance improvement for bulk gets.

    blair authored and Matt Ingenthron committed Nov 12, 2010
    If the size of an ArrayList is known before construction, then pass
    the size to the constructor.  This will either save a tiny bit of
    memory or save reallocation's, depending upon the number of elements
    that will be inserted into the ArrayList.
    
    Change-Id: If1db3a8578e2d8603e0c6dbbe781ed7258908eee
    Reviewed-on: http://review.membase.org/3640
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Jul 18, 2010
  1. support timeout based disconnects for bulk ops

    magictoken authored and Matt Ingenthron committed Jul 1, 2010
    Refactored some repeating logic into helper methods,
    track continuous timeouts from bulk operations,
    log at warning.
    
    Change-Id: Ic0e3d14c8d7ff7001c3440683fa4274b119e4d31
    Reviewed-on: http://review.northscale.com/999
    Reviewed-by: Matt Ingenthron <matt@northscale.com>
    Tested-by: Matt Ingenthron <matt@northscale.com>
Commits on Jun 30, 2010
  1. Allow per-key transcoders to be used with asyncGetBulk().

    blair authored and dustin committed Jun 28, 2010
    This change allows the transcoder to save state for each key.  An
    example is shown in the unit test that encodes into the byte array
    sent to the memcached server the key along with the value.  Upon
    decoding a value from memcached, the actual and expected keys are
    compared.
    
    Change-Id: Ie4697bc3f9923e7c2ba981ca334b0df9d1ab7315
    Reviewed-on: http://review.northscale.com/936
    Tested-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
  2. Add an iterator that returns a single element forever.

    blair authored and dustin committed Jun 12, 2010
    This iterator will be used to add a version of asyncGetBulk() that
    allows per-key transcoders to be used.  To efficiently allow the
    normal use of multiple keys with a single transcoder, instead of
    creating an array or list of identical transcoders, other
    asyncGetBulk() methods will create a SingleElementInfiniteIterator
    instead.
    
    Change-Id: Ica58e45f3e0e49a72c7a7a8743bf9180ea9cb7ed
    Reviewed-on: http://review.northscale.com/935
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 26, 2010
  1. Support daemon mode for TranscodeService threads.

    ddlatham authored and dustin committed Apr 2, 2010
    Change-Id: I28b58a9e1832abedfd8e4177bb38e5cdac158bcb
    Reviewed-on: http://review.northscale.com/920
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
  2. Some minor fixes to make eclipse happy with the code again.

    dustin committed Jun 22, 2010
    - Removed @Override annotations where there are no overrides.
    - Renamed a couple of variables that were shadowing fields.
    
    Change-Id: I7e0f74da72214cbe4c72cd693ee11461138f172b
    Reviewed-on: http://review.northscale.com/919
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 17, 2010
  1. plug potential file descriptor leak

    magictoken authored and dustin committed Jun 17, 2010
    There is a problem in open/connect sequence that may produce
    file descriptor leaks in some abnormal conditions:
    First, SocketChannel.open() is called,
    then ch.connect(a.getSocketAddress())
    is called and it may throw an IOException under certain conditions.
    Then we catch exception and re-queue the node.
    This will produce a leak since the channel was not closed.
    
    The problem surfaced because of some faulty changes that I made
    and which I fixed. But it's nevertheless a problem.
    
    Sun  bug reference. http://bugs.sun.com/view_bug.do?bug_id=6548464
    They fixed it in a helper method that opens/connects in a single call.
    But in our case the client needs to take care of it.
    
    One more small unrelated change is to
    catch an unchecked exception and log an error:
    it is a serious case because the node
    will be essentially lost and never re-queued.
    
    Change-Id: I54930eb03f5c07fc6966f8d4d5db42548c63f6bd
    Reviewed-on: http://review.northscale.com:8080/630
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 16, 2010
  1. return partial data from timed out getBulk

    magictoken authored and dustin committed Jun 15, 2010
    Augmented Future<T> object returned from asynGetBulk
    with an option to return partial data in case timeout is reached.
    
    Change-Id: I3bac849cee69fd6b57b20139832193b97975f6f6
    Reviewed-on: http://review.northscale.com:8080/563
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 15, 2010
  1. Fix compilation with JDK 1.5.

    blair authored and dustin committed Jun 15, 2010
    Don't use methods and enum's that only exist in JDK 1.6.
    
    The changes successfully compile and the unit tests pass with JDK 1.6,
    but with 1.5 I consistently get a unit test failure here:
    
        [junit] Testcase: testSimpleLoading took 0.265 sec
        [junit] 	      Caused an ERROR
        [junit] java.lang.RuntimeException: blah
        [junit] java.util.concurrent.ExecutionException: java.lang.RuntimeException: blah
        [junit] 					     at net.spy.memcached.internal.ImmediateFuture.<init>(ImmediateFuture.java:25)
        [junit] 					     at net.spy.memcached.util.CacheLoaderTest.testSimpleLoading(CacheLoaderTest.java:48)
        [junit] 					     at org.jmock.core.VerifyingTestCase.runBare(VerifyingTestCase.java:39)
        [junit] Caused by: java.lang.RuntimeException: blah
        [junit]
        [junit] TEST net.spy.memcached.util.CacheLoaderTest FAILED
        [junit] Tests FAILED
    The following tests failed:
    net.spy.memcached.util.CacheLoaderTest
    
    Change-Id: I34921695bec8bea5f4b8b0bace13951a41b3230a
    Reviewed-on: http://review.northscale.com:8080/561
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 12, 2010
  1. Delete a duplicate unit test.

    blair authored and dustin committed Jun 12, 2010
    Change-Id: Ibdbbf79455c2e497ae7e121bd88a4f260e24fa54
    Reviewed-on: http://review.northscale.com:8080/416
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 8, 2010
  1. Use a faster method to get a MD5 MessageDigest instance.

    blair authored and dustin committed Jun 3, 2010
    Instead of getting a fresh instance with MessageDigest.getInstance(),
    clone an existing MessageDigest that has not been updated with any
    bytes.
    
    Change-Id: If72e112e93014631767ed68d758728f372e9a7d8
    Reviewed-on: http://review.northscale.com:8080/292
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 3, 2010
  1. No need to call MessageDigest#reset() on a brand new MessageDigest.

    blair authored and dustin committed Jun 3, 2010
    Change-Id: I214f4fe0edb07c821f139b61526a2547d9772324
    Reviewed-on: http://review.northscale.com:8080/290
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: Dustin Sallings <dustin@spy.net>
Commits on Jun 2, 2010
  1. Use a private static final byte array for "\r\n" instead of always

    blair authored and dustin committed Jun 2, 2010
    converting the string into a byte array.
    
    Change-Id: I68fc5e9dea99a25e708808f14c8d0a58bd314336
    Reviewed-on: http://review.northscale.com:8080/269
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Reviewed-by: Eric Lambert <eric.d.lambert@gmail.com>
    Tested-by: Eric Lambert <eric.d.lambert@gmail.com>
Commits on May 27, 2010
  1. Move continuous timeout counter to individual connections.

    Andrey Kartashov authored and dustin committed May 6, 2010
    Change-Id: I0d275a824017865714af23abbb0eb61418d5d116
    Reviewed-on: http://review.northscale.com:8080/25
    Reviewed-by: Eric Lambert <eric.d.lambert@gmail.com>
    Reviewed-by: Dustin Sallings <dustin@spy.net>
    Tested-by: SeongHwa Ahn <neoconnected@gmail.com>