The "group_numrows" is new query option for the CouchDB HTTP View API. It provides a functionality like the "SELECT COUNT(DISTINCT column)" query of SQL.
The "group=true" query option provides a set of distinct keys in the specified view. This is similar to the "SELECT DISTINCT(column)" query.
The "group_numrows" works with the "group=true" query option, and provides the total number of distinct keys.
The number of unique group keys will be calculated like the following;
query with "group=true" [Ruby] temporary object
+-----------------------------------+ +------------------------+
| {"key":"A","value":xxxx} | ---> | results["rows"].length | ---> 12345
| {"key":"B","value":xxxx} | +------------------------+
| ... skip ... |
| {"key":"ZZZZZZZZZZ","value":xxxx} |
+-----------------------------------+
## pseudo code in ruby
json = couch.get("/example/_design/test/_view/test?group=true")
h = JSON.parse(json)
total_numrows = h["rows"].length
In this case, the "results" hash object has to store all results temporarily. It might be a problem if the number of results is quite large.
The "group_numrows=true" query returns only the number of unique keys.
query with "group_numrows=true&group=true"
+--------------------------+
| {"group_numrows":12345} |
+--------------------------+
## pseudo code in ruby
json = couch.get("/example/_design/test/_view/test?group=true&group_numrows=true")
h = JSON.parse(json)
total_numrows = h["group_numrows"]
For instance, there are almost 100K result lines (almost 3MB json string) in my linux box. The group_numrows=true operation is 8.5 times faster than calculating the length of a hash object.
This release tested on following systems.
- CouchDB 1.0.1 with Ubuntu 10.04 LTS x86_64, Debian 5 and 6 (i386, x86_64)
- CouchDB 1.0.2 with Ubuntu 10.04 LTS x86_64, Debian 5 and 6 (i386, x86_64)
- CouchDB 1.2.0 with Ubuntu 12.04 LTS x86_64, Debian 5 and 6 (i386, x86_64)
To recompile beam files are recommended from the source.
$ cd apache-couchdb-1.2.0/src/couchdb/
$ mv couch_db.hrl couch_db.hrl.orig
$ mv couch_httpd_view.erl couch_httpd_view.erl.orig
$ curl -o couch_db.hrl https://raw.github.com/YasuhiroABE/CouchDB-Group_NumRows/master/couch_db.hrl
$ curl -o couch_httpd_view.erl https://raw.github.com/YasuhiroABE/CouchDB-Group_NumRows/master/couch_httpd_view.erl
$ make
$ make install
Otherwise, replacing .beam files with .erl files is a simple way on existing system.
$ rm /usr/local/lib/couchdb/erlang/lib/couch-1.2.0/ebin/couch_db.beam
$ cp couch_db.hrl /usr/local/lib/couchdb/erlang/lib/couch-1.2.0/ebin/
$ cp couch_db.hrl /usr/local/lib/couchdb/erlang/lib/couch-1.2.0/include/
$ rm /usr/local/lib/couchdb/erlang/lib/couch-1.2.0/ebin/couch_httpd_view.beam
$ cp couch_httpd_view.erl /usr/local/lib/couchdb/erlang/lib/couch-1.2.0/ebin/
As an example, the "example" database has four document and defines the "all" view.
$ curl 'http://localhost:5984/example/_design/all/_view/all?group=true'
The result is;
{"rows":[
{"key":["bar","35"],"value":3},
{"key":["foo","25"],"value":3},
{"key":["somebody","20"],"value":8},
{"key":["yasu","32"],"value":4}
]}
To send the same request with the "group_numrows";
$curl 'http://localhost:5984/example/_design/all/_view/all?group=true&group_numrows=true'
{"group_numrows":"4"}
The operation of the "group_numrows" counts up all the number of the 'rows' array in the CouchDB by the erlang thread.
If the "limit=3" is set in the above example, the results of the group_numrows will be the '{"group_numrows":"3"}'.
The results is always same as the number of the array["rows"].length.
The "total_rows" of the _all_docs always returns the total rows, so it is the difference between "total_rows" and "group_numrows."
- English CouchDB: Implementation of SELECT COUNT(DISTINCT field)...
- Japanese CouchDBでSELECT COUNT(DISTINCT ...)をするために必要なこと
The original code is licensed under the Apache License, Version 2.0.
The partial code of the couch_db.hrl and couch_httpd_view.erl are also licensed under the Apache License, Version 2.0.
Copyright (C) 2010-2012 Yasuhiro ABE <yasu@yasundial.org>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
EOF