Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix builtin _sum reduce function #1574

Merged
merged 1 commit into from Aug 23, 2018
Merged

Fix builtin _sum reduce function #1574

merged 1 commit into from Aug 23, 2018

Conversation

@davisp
Copy link
Member

@davisp davisp commented Aug 22, 2018

The builting _sum reduce function has no protection against overflowing
reduce values. Users can emit objects with enough unique keys to cause
the builtin _sum to create objects that are exceedingly large in the
inner nodes of the view B+Tree.

This change adds the same logic that applies to JavaScript reduce
functions to check if a reduce function is properly reducing its input.

Testing recommendations

make check

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
@davisp
Copy link
Member Author

@davisp davisp commented Aug 22, 2018

Trivially reproduced with this Python script:

I wanted to get the PR out but I'm also going to try and contemplate how to test this a bit more directly.

#!/usr/bin/env python

import base64
import json
import os
import random

import requests

S = requests.session()
S.auth = ("adm", "pwd")
S.headers["Content-Type"] = "application/json"
BASE = "http://127.0.0.1:15986/"
DB = BASE + "foo"
DDOC = DB + "/_design/bar"
VIEW = DDOC + "/_view/bam"
SIG = "1d2f27b7abd18837599f078927c7bc1f"


def gen_key():
    data = os.urandom(50)
    return base64.b64encode(data)


def gen_device():
    return {
        "tx_bytes": random.randint(1, 8192),
        "rx_bytes": random.randint(1, 8192)
    }


def gen_data():
    ret = {}
    for i in range(1024):
        ret[gen_key()] = gen_device()
    return ret


def load_docs():
    for i in range(25):
        print "Generating: %d - %d" % (i*100, (i+1)*100)
        docs = []
        for j in range(100):
            docs.append({
                "_id": "%06d" % ((i*100) + j),
                "data": gen_data()
            })
        body = json.dumps({"docs": docs})
        r = S.post(DB + "/_bulk_docs", data=body)
        r.raise_for_status()


def load_ddoc():
    ddoc = {
        "language": "javascript",
        "views": {
            "bam": {
                "map": """
                    function(doc) {
                        emit(doc._id, doc.data);
                    }""",
                #"reduce": "_sum"
                "reduce": """
                    function(keys, values) {
                        var obj = {};
                        for(var v in values) {
                            for(var k in values[v]) {
                                log(k);
                                obj[k] = values[v][k];
                            }
                        }
                        return obj;
                    }
                """
            }
        }
    }
    body = json.dumps(ddoc)
    r = S.put(DB + "/_design/bar", data=body)
    r.raise_for_status()


def get_size():
    fname = "dev/lib/node1/data/.foo_design/mrview/%s.view" % SIG
    return os.stat(fname).st_size


def get_task():
    r = S.get(BASE + "/_active_tasks")
    for t in r.json():
        return t

def main():
    S.delete(DB)
    r = S.put(DB)
    r.raise_for_status()
    load_docs()
    load_ddoc()

    r = S.get(VIEW, params={"limit": 0})
    r.raise_for_status()

    print "Initial build: %d" % get_size()

    r = S.post(DDOC + "/_compact")
    r.raise_for_status()

    print get_task()


if __name__ == "__main__":
    main()
@davisp
Copy link
Member Author

@davisp davisp commented Aug 22, 2018

@nickva pointed me to the module tests which I'll add tonight or tomorrow before this is merged.

{<<"error">>, <<"builtin_reduce_error">>},
{<<"reason">>, Msg}
]};
"log" when OutSize > 4096 andalso OutSize > 2 * InSize ->

This comment has been minimized.

@nickva

nickva Aug 23, 2018
Contributor

Use Overflowed variable here as well, it also the updated correct logic for overflow.

To double check:

reduce_length > 4096 && ((reduce_length * 2) > input_length)) {

This comment has been minimized.

@davisp

davisp Aug 23, 2018
Author Member

Ah, good catch. I refactored that out intending to have it both places.

@davisp davisp force-pushed the fix-builtin-sum branch 2 times, most recently from c97121d to 84505f0 Aug 23, 2018
@nickva
nickva approved these changes Aug 23, 2018
Copy link
Contributor

@nickva nickva left a comment

With log option on see error log lines but view completes (thought had to reduce iterations to only 2 batches in python script).

With error see:

{
    "rows": [
        {
            "key": null,
            "value": {
                "error": "builtin_reduce_error",
                "reason": "Reduce output must shrink more rapidly: input size: 282351 output size: 282249"
            }
        }
    ]
}

as expected and also see log error lines like:

[error] 2018-08-23T18:46:42.366773Z node1@127.0.0.1 <0.1549.0> -------- Reduce output must shrink more rapidly: input size: 282387 output size: 282285

EUnit tests pass:

======================== EUnit ========================
module 'couch_query_servers_tests'
  Test overflow detection in the _sum reduce function
    couch_query_servers_tests: sum_overflow_test_...[0.021 s] ok
    couch_query_servers_tests: sum_overflow_test_...[0.006 s] ok
    couch_query_servers_tests: sum_overflow_test_...[0.007 s] ok
    [done in 0.069 s]
  [done in 0.206 s]
=======================================================
  All 3 tests passed.
The builting _sum reduce function has no protection against overflowing
reduce values. Users can emit objects with enough unique keys to cause
the builtin _sum to create objects that are exceedingly large in the
inner nodes of the view B+Tree.

This change adds the same logic that applies to JavaScript reduce
functions to check if a reduce function is properly reducing its input.
@davisp davisp force-pushed the fix-builtin-sum branch from 84505f0 to 2d141af Aug 23, 2018
@davisp davisp merged commit d2d2261 into master Aug 23, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@nickva nickva deleted the fix-builtin-sum branch Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants