Improved _active_tasks API Tasks are now free to set any properties they wish (as an Erlang proplist). Different tasks can have different properties and the status string doesn't exist anymore - instead client applications can build it using more granular properties from _active_tasks. Some of these properties are: 1) "progress" (an integer percentage, for all tasks) 2) "database" (for compactions and indexer tasks) 3) "design_document" (for indexer and view compaction tasks) 4) "source" and "target" (for replications) 5) "docs_read", "docs_written", "doc_write_failures", "missing_revs_found", "missing_revs_checked", "source_seq", "checkpointed_source_seq" and "continuous" for replications BugzID: 14269 Conflicts: apps/couch/src/couch_db_updater.erl apps/couch/src/couch_rep.erl apps/couch/src/couch_task_status.erl apps/couch/src/couch_view_compactor.erl apps/couch/src/couch_view_updater.erl
Previously we didn't check if an os_process was in use by a process before closing it. This ended up generating noproc errors in the couch_view_updaters which would then spider out to the couch_view_group processes causing client errors and resetting compaction. BugzId: 13798
We have observed periods of couchjs processes spiking into the hundreds and thousands for short periods of time since the new couch_proc_manager was released. Today I happened to catch one in the act and poked at couch_proc_manager's ets table. There seemed to be a few more couchjs processes with clients than I would have expected so I skimmed the code looking for a place where we didn't clear the client value (which would prevent it from being reused so that it would eventually just timeout). I found a case where if the Pid that checked out the process dies without the OS process dying, we were forgetting to clear the client in the ets table. This patch refactors the two places we return processes into a single function call which clears the OS process client.
For large numbers of os processes its possible that we have a slowdown when requesting a new process. The old code matches all possible processes out of the table to find an appropriate candidate. We avoid the issue by using ets:select_reverse to also prefer keeping newer processes and releasing longer lived processes. Length of life is based on the implicit sorting of pids having newer pids sorting larger.
When system load exceeds the ability of os_process_soft_limit to keep up with demand we enter a fork-use-kill (FUK) cycle. The constant spawning and destruction os these processes thrashes system resources and causes general instability. This patch changes the behavior from killing each process as its returned to letting it idle for a configurable amount of time (default five minutes) which allows it to be reused by other clients. This way we can avoid adding unnecessary load when demand for couchjs processes exceeds os_process_soft_limit. As a happy benefit this should also allow os_process_soft_limit to be set much lower since the number of processes will now more closely follow actual demand (instead of provisioning for the worst case scenario). Conflicts: apps/couch/src/couch_os_process.erl apps/couch/src/couch_proc_manager.erl Conflicts: apps/couch/src/couch_os_process.erl
Our current implementation for closing an LRU DB involves a full scan of a public ets table. This scan blocks all other activity in couch_server and can become a serious bottleneck when the LRU cache hit rate drops too low. In the worst-case all_dbs_active scenario we end up with O(N**2) algorithmic complexity. This patch adds a new index keyed on LRU for faster access to the least recently used databases. It also moves the ets table to a dict on the couch_server heap. The downside is an increased message rate inbound on the couch_server, as clients are no longer allowed to update the LRU data structures without sending a message. BugzID: 12879 Conflicts: apps/couch/src/couch_server.erl
When a call is made to retrieve a specific revision, latest=true will retrieve any descendent leaves instead. This enables the replicator to better keep up with edits that occur whilst it's retrieving revisions BugzID: 14241
We were pulling a list of design documents and then ignoring the result when the #db was a partition of a clustered database. Also, the call to fabric:reset_validation_funs/1 can occasionally cause a stray rexi_EXIT message to arrive in the db_updater mailbox (and subsequently kill the server) if a worker fails. I don't think that's desired behavior, though it's a debatable point. This patch spawns a middleman process to act as a sink for those stray messages. BugzID: 13087
This patch also adds extra tests of the key tree merging logic as well as edoc-formatted documentation for the module and a few of the merge functions. Closes COUCHDB-902. Thanks Paul Davis, Bob Dionne, Klaus Trainer. git-svn-id: https://svn.apache.org/repos/asf/couchdb/trunk@1065471 13f79535-47bb-0310-9956-ffa450edef68