Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: 12833-fix_unpa…
Commits on Nov 9, 2011
  1. Correctly support old-style update_seq in unpack_seqs

    Robert Newson authored
    The regexp in unpack_seqs permits - anywhere, causing a full
    old-style update_seq to match, rather than just the opaque part.
    
    This patch prevents that and adds unit tests to prove it.
    
    BugzID: 12833
Commits on Oct 11, 2011
  1. Adam Kocoloski
  2. Adam Kocoloski

    Support conflicts with include_docs in _changes

    kocolosk authored
    BugzID: 12725
  3. Support conflicts=true with include_docs=true

    Robert Newson authored
    Support conflicts=true when used with include_docs=true for 1.1
    compatibility.
    
    BugzID: 12725
  4. Adam Kocoloski
Commits on Oct 10, 2011
  1. Adam Kocoloski

    Merge pull request #23 from cloudant/12634-update-type-specs

    kocolosk authored
    Update -specs to reflect new realities
Commits on Oct 7, 2011
  1. Adam Kocoloski

    Update -specs to reflect new realities

    kocolosk authored
    BugzID: 12634
  2. Adam Kocoloski

    Handle errors when computing ancestry

    kocolosk authored
    BugzID: 12212
  3. Adam Kocoloski

    Merge pull request #22 from cloudant/stuck-continuous-replications

    kocolosk authored
    Preserve state during timeouts in the _changes feed
Commits on Oct 6, 2011
  1. Paul J. Davis

    Fix the changes feed.

    davisp authored
    The handling of timeout messages in fabric_view_changes:handle_message/3
    was incorrectly discarding updates to the state. This means that we
    could accidentally discard things like the handling of rexi_EXIT
    messages which would then later lead to an infinite wait in
    rexi_utils:recv/6 because we forgot that the node died.
    
    BugzID: 12706
Commits on Oct 4, 2011
  1. Adam Kocoloski
Commits on Sep 28, 2011
  1. 1.1.x compatibility

    Robert Newson authored
    In 1.1.x (post 1.1.0) we have removed Style from couch_db:changes_since
    so fabric needs to not call it with that parameter now.
    
    BugzID: 12645
Commits on Sep 23, 2011
  1. Adam Kocoloski
Commits on Sep 22, 2011
  1. Adam Kocoloski

    Merge branch '12220-nodedown-handling-db', close #18

    kocolosk authored
    Conflicts:
    	src/fabric_util.erl
    
    BugzID: 12220
  2. Adam Kocoloski

    Merge pull request #20 from cloudant/12605-continuous-changes-timeouts

    kocolosk authored
    Do not skip heartbeats when DB is updated
  3. Adam Kocoloski

    Allow RPC workers to create database shard files

    kocolosk authored
    An RPC worker should only be invoked on a shard in the partition table,
    so creating the file is kosher.
  4. Adam Kocoloski

    Implement new database creation logic

    kocolosk authored
    The new logic decouples the creation of the shard files on disk from
    the creation of the mapping document in the shard database.  Most
    errors encountered during the creation of files on disk are ignored,
    though we do bubble up 'file_exists' errors to the client.  Conflicts
    encountered in the shard mapping database result in an immediate
    failure.  A matching mem3 topic branch reports those conflicts when the
    body of the revision on disk is different than the one we're trying to
    save.  We report an 'ok' when every node reports that the desired
    document body is saved to disk, an 'accepted' when a majority do so, and
    an {'error', 'internal_server_error'} otherwise.  If any file already
    exists on disk we'll send {'error', 'file_exists'} instead of 'ok' or
    'accepted'.
    
    This patch also incorporates the use of rexi_monitor to prevent database
    creations from hanging when a node is down.
    
    Thanks Bob Dionne for help with the implementation, testing, and review.
  5. Adam Kocoloski

    Implement new database deletion logic

    kocolosk authored
    This patch makes the response to a DB deletion request depend only on
    the content of the shard_db and not on the presence or absence of files
    on disk.  We respond with a 'not_found' if the local node has no entry
    for the database in its partition table cache, or if every worker fails
    to find any live leaf revision (this should be very rare, since the
    workers were generated from an in-memory cache of a live leaf revision),
    an 'ok' if all cluster nodes confirm that all leaf revisions of the
    document are deleted and at least one node actually performed a delete,
    an 'accepted' if a majority confirm the same, and an error tuple
    otherwise.
    
    The patch also uses rexi_monitor to receive information about down
    nodes.  It ensures that database deletions do not block until the
    timeout when a node is down.
    
    Thanks Bob Dionne for various implementations and reviews.
Commits on Sep 21, 2011
  1. Adam Kocoloski

    Merge branch '12533-quorum-202', close #11

    kocolosk authored
    Conflicts:
    	src/fabric_doc_update.erl
  2. Adam Kocoloski

    Merge pull request #19 from cloudant/12003-fix-read-repair-tests

    kocolosk authored
    Fix read repair tests
  3. Adam Kocoloski

    Update tests for new repair logic

    kocolosk authored
    The new repair logic ius described in 0b59b5.  Quoting it here:
    
    "... in most cases we delay responding to the client until the repair
    has completed. The only case where we still respond quickly and issue
    an asynchronous repair is if we achieved the read quorum and the only
    alternative replies that we received were ancestors of the quorum
    reply."
    
    This patch updates the tests to reflect that new reality.
  4. Adam Kocoloski

    Use meck to mock functions for sync repair

    kocolosk authored
    Read repair is a side effect of reading a document with inconsistent
    revision trees.  The repair has always crashed during testing because it
    expects a running server.  It never used to matter, since repair was
    async.  But now that we run the repair in-process the tests fail unless
    we mock a few functions.
  5. Update unit tests to match code changes

    Bob Dionne authored
    Thanks @kocolosk
    
    BugzID:12533
  6. Adam Kocoloski

    Tag individual doc results as 'error' or 'accepted'

    kocolosk authored Bob Dionne committed
    Check for any successful shard update; if one is found, tag the document
    as 'accepted', otherwise, use an arbitrary unsuccessful response.
    
    BugzID:12533
  7. Adam Kocoloski

    Track per-document quorms, respond with overall health

    kocolosk authored Bob Dionne committed
    We have a bug where documents that did not elicit any response from a
    server are assumed to be executed with the replicated_changes option.
    If that's not the case we ought to be handing out 'error' responses for
    those documents, but instead it seems we just suppress them from the
    response entirely. This patch adds a failing test.
    
    Other than that, the idea here is to have the module respond with
    'ok|accepted|error' as the overall health, and then each document gets a
    similar tag. An overall health of 'accepted' means each document was
    saved at least once, but some documents did not meet the write quorum.
    An overall health of 'error' means at least one document was not saved
    at all.
    
    BugzID:12533
  8. Send 202 if quorum not met but copy is written

    Robert Dionne authored
    Detects timeouts and returns {accepted,... up thru to the chttpd
    layer which becomes a 202 in the case that some docs were written
    even though the quorum was not met.
    
    BugzID:12533
Commits on Sep 15, 2011
  1. Adam Kocoloski

    Do not skip heartbeats when DB is updated

    kocolosk authored
    This patch ensures that we send a heartbeat line in the case of regular
    writes to a database, each of which fails the filter condition.
    Previously we could go long intervals without sending a heartbeat if the
    wait_db_updated/1 function returned 'updated' and all of the updates
    failed the filter.  We'd never reach the five second timeout required to
    send a heartbeat inside receive_results, and we'd never send a heartbeat
    from the 'timeout' clause of the keep_sending_changes function.  These
    long intervals would cause a replication to crash.
    
    The patch causes a heartbeat to be sent on every invocation of
    wait_db_updated/1.  Big thanks to Paul Davis for deducing the cause of
    the timeouts.
    
    BugzID: 12605
  2. Adam Kocoloski

    Merge pull request #7 from cloudant/12220-improve-nodedown-handling

    kocolosk authored
    Conflicts:
    	src/fabric_doc_open.erl
    	src/fabric_doc_update.erl
  3. Adam Kocoloski

    Handle rexi_DOWN and rexi_EXIT messages

    Bob Dionne authored kocolosk committed
    The controllers now subscribe to rexi_DOWN messages via rexi_monitor.
    When a controller receives a message indicating that the rexi_server on
    a node is down it assumes all the workers that were supposed to be
    spawned by that server are dead.
    
    This patch also adds clauses to handle rexi_EXIT events in a few
    coordinators where they were missing.
    
    The db_create and db_delete handlers are not modified by this patch.
    We'll deal with them separately as we plan to make some larger-scale
    changes to the algorithm there.
    
    BugzID:12220
Commits on Sep 13, 2011
  1. Adam Kocoloski
  2. Benoit Chesneau

    Fix issue #15 .

    benoitc authored
    Add support for internal `[integer(), binary()]` form of the sequence in
    fabric_view_changes:unpack_seqs/2 .
    
    Passing the internal form to since in fabric:changes/4 was causing the
    error for some values. This patch fix it.
Commits on Aug 29, 2011
  1. Adam Kocoloski
  2. Adam Kocoloski

    Wait for read repair before responding

    kocolosk authored
    Our philosophy up to this point has been to respond to the client as
    soon as possible and execute a repair asynchronously if one was
    warranted. This caused occasional problems where a client would receive
    a 409 Conflict response to an update request based on the Reply that we
    had just sent if the repair was too slow to execute.
    
    This patch changes the behavior so that in most cases we delay
    responding to the client until the repair has completed. The only case
    where we still respond quickly and issue an asynchronous repair is if we
    achieved the read quorum and the only alternative replies that we
    received were ancestors of the quorum reply.
    
    Thanks Bob Dionne for various implementations and review.
    
    BugzID: 12003
Commits on Aug 24, 2011
  1. Adam Kocoloski
  2. Make clustered update_seq sort correctly

    Robert Newson authored
    The CouchDB replicator gets confused when it sees an update sequence
    of "100-" follow "99-" as it's treated as a string (where CouchDB uses
    integers). This commit changes the format to [integer(), binary()]
    which sort correctly.
    
    The revised regular expression matches both the old and new update_seq
    pattern. It is anchored at the end of the string and ignores any
    trailing quotation marks and right brackets (which are present when
    the replicator passes an update_seq as a string). It then greedily
    matches all consecutive base64 (url-encoded variant) characters.
    
    BugzID: 10986
Something went wrong with that request. Please try again.