<?xml version="1.0" encoding="UTF-8"?>
<commit>
  <added type="array"/>
  <modified type="array">
    <modified>
      <diff>@@ -478,6 +478,9 @@ evicted_time           Seconds since the last access for the most recent item
 outofmemory            Number of times the underlying slab class was unable to
                        store a new item. This means you are running with -M or
                        an eviction failed.
+tailrepairs            Number of times we self-healed a slab with a refcount
+                       leak. If this counter is increasing a lot, please
+                       report your situation to the developers.
 
 Note this will only display information about slabs which exist, so an empty
 cache will return an empty set.</diff>
      <filename>doc/protocol.txt</filename>
    </modified>
    <modified>
      <diff>@@ -29,6 +29,7 @@ typedef struct {
     unsigned int evicted;
     rel_time_t evicted_time;
     unsigned int outofmemory;
+    unsigned int tailrepairs;
 } itemstats_t;
 
 static item *heads[LARGEST_ID];
@@ -169,7 +170,26 @@ item *do_item_alloc(char *key, const size_t nkey, const int flags, const rel_tim
         it = slabs_alloc(ntotal, id);
         if (it == 0) {
             itemstats[id].outofmemory++;
-            return NULL;
+            /* Last ditch effort. There is a very rare bug which causes
+             * refcount leaks. We've fixed most of them, but it still happens,
+             * and it may happen in the future.
+             * We can reasonably assume no item can stay locked for more than
+             * three hours, so if we find one in the tail which is that old,
+             * free it anyway.
+             */
+            tries = 50;
+            for (search = tails[id]; tries &gt; 0 &amp;&amp; search != NULL; tries--, search=search-&gt;prev) {
+                if (search-&gt;refcount != 0 &amp;&amp; search-&gt;time + 10800 &lt; current_time) {
+                    itemstats[id].tailrepairs++;
+                    search-&gt;refcount = 0;
+                    do_item_unlink(search);
+                    break;
+                }
+            }
+            it = slabs_alloc(ntotal, id);
+            if (it == 0) {
+                return NULL;
+            }
         }
     }
 
@@ -402,6 +422,8 @@ char *do_item_stats(uint32_t (*add_stats)(char *buf,
                                 &quot;%u&quot;, itemstats[i].evicted_time);
             APPEND_NUM_FMT_STAT(fmt, i, &quot;outofmemory&quot;,
                                 &quot;%u&quot;, itemstats[i].outofmemory);
+            APPEND_NUM_FMT_STAT(fmt, i, &quot;tailrepairs&quot;,
+                                &quot;%u&quot;, itemstats[i].tailrepairs);;
 
             /* check whether binary protocol terminator will fit */
             if (*buflen + hdrsiz &gt; allocated) {</diff>
      <filename>items.c</filename>
    </modified>
  </modified>
  <removed type="array"/>
  <parents type="array">
    <parent>
      <id>7a5a1375cf3220f9c69a9b51ebaf56b6d7f41db4</id>
    </parent>
  </parents>
  <author>
    <name>dormando</name>
    <email>dormando@rydia.net</email>
  </author>
  <url>http://github.com/tmaesaka/memcached/commit/4ad6da605d4708bde44c24b186139c276b4020e1</url>
  <id>4ad6da605d4708bde44c24b186139c276b4020e1</id>
  <committed-date>2009-03-29T10:23:52-07:00</committed-date>
  <authored-date>2009-03-28T00:16:38-07:00</authored-date>
  <message>dumb hack to self-repair stuck slabs

since 1.2.6, most of the refcount leaks have been quashed.
I still get them in production, extremely rarely.
It's possibly we'll have refcount leaks on and off even in the future.

This hack acknowledges this and exists since we want to guarantee, as much as
possible, that memcached is a stable service. Having to monitor for and
restart the service on account of &quot;rare bugs&quot; isn't acceptable.</message>
  <tree>74fd08e4de386fc51c00c5ac4b2a9037ebba346e</tree>
  <committer>
    <name>Dustin Sallings</name>
    <email>dustin@spy.net</email>
  </committer>
</commit>
