Skip to content

Commit

Permalink
Simply treat Bucket as Index
Browse files Browse the repository at this point in the history
Abondon the idea of having an implicit lookup from bucket to index
when using Yokozuna to drive search input to map-reduce.

This is a very long-winded commit message because the subject matter
is confusing.  Scroll to end for TL;DR.

Riak Search and Yokozuna can both be used as engines to produces
results for a search input to a map-reduce job.  For example, in
pre-Yokozuna days if a user specified the following HTTP JSON
map-reduce job Riak Search would be used to run a query against the
"bucket" foo.

"inputs":{"bucket":"foo", "query":"bar"}

Calling it bucket is really a misnomer because what is being searched
is actually an _index_ of bucket foo.  However, since Riak Search
forced a 1:1 mapping from bucket to index name there wasn't much of a
difference.  The possibility of a M:1 bucket to index in the future
was never considered.

Fast forward to today.  There are two search systems in Riak now.
Yokozuna will eventually replace Riak Search but for the time being
there will be migrations from one to the other.  Furthermore, Search
APIs have already been exposed that assumed the functionality of Riak
Search, which is a small subset of Yokozuna.  Therefore, Yokozuna must
work around these weird cases.

In this case, passing the previous HTTP JSON map-reduce to Yokozua,
the user might expect 1 of 2 behaviors.

1. A bucket has an associated index.  Get the index from the mapping
and run the query against that.

2. Since the input should have been an index name to being with, and
1:1 mapping was just a coincidence in Riak Search, then simply treat
the 'bucket' input as the index name to search against.

The first option might seem like the most obvious but what if the user
decided to index multiple buckets under the same index?  The results
would now include results from other buckets.  An implicit filter
could be added to the query but we've started introducing more
implicit (magical) behavior.  Furthermore, under the covers, it
complicates the code.  For security purposes do we check for
permission on the bucket, on both?  But even if that wasn't enough,
things get EVEN STRANGER when you consider that you can use 2i as
input to a map-reduce job with the following JSON.

"inputs":{"bucket":"foo", "index":"field1_bin", "key":"bar"}

Hey, I know, we could overload this "inputs" field even more and
covert the following into a search request.

"inputs":{"index":"foo", "query":"bar"}

But if we chose option #1 from above we have a problem here.  The
Yokozuna code needs to know if the incoming search request is using an
index name or a bucket name.  If the later if needs to lookup the
index to query.  But in order to do this the Riak KV code needs to
create a new data structure to pass as the "Bucket" parameter so that
the Yokozuna callback knows which is which.  Does this sound very
confusing?  I hope so, because it is!

So after a long talk with Bryan Fink we decided to stop trying to be
cute and just face the fact that "bucket" was a poor name for the
input structure.  Just pretend it is actually "index" and Yokozuna
will treat it as such.

TL;DR;

If you are migrating from Riak Search then you should make sure to
keep a 1:1 mapping for bucket to index if you don't want to change
your map-reduce input.

If you are starting fresh then just use the new JSON format feed
map-reduce with Yokozuna.

"inputs":{"index":"foo", "query":"field1:bar", "filter"field2:baz"}

(Note that "filter" is optional)
  • Loading branch information
rzezeski committed Sep 25, 2013
1 parent 56162e0 commit 1cd7f96
Showing 1 changed file with 2 additions and 15 deletions.
17 changes: 2 additions & 15 deletions src/riak_kv_mapred_json.erl
Original file line number Diff line number Diff line change
Expand Up @@ -179,23 +179,10 @@ is_search_input(_) -> false.

parse_search_input(Inputs) ->
Bucket = proplists:get_value(<<"bucket">>, Inputs),
Index = proplists:get_value(<<"index">>, Inputs),
Query = proplists:get_value(<<"query">>, Inputs),
Filter = proplists:get_value(<<"filter">>, Inputs, []),
Group = case Index of
undefined ->
Bucket;
_ ->
%% this is mochijson2 format, which is what the user
%% would pass in if they used the modfun format:
%% {
%% "module":...,
%% "function":...,
%% "arg":[{"index":Index},Query,...filter...]
%% }
{struct,[{<<"index">>,Index}]}
end,
{ok, {search, Group, Query, Filter}}.
Index = proplists:get_value(<<"index">>, Inputs, Bucket),
{ok, {search, Index, Query, Filter}}.

%% Allowed forms:
%% {"input":{"bucket":BucketName, "index":IndexName, "key":SecondaryKey},
Expand Down

0 comments on commit 1cd7f96

Please sign in to comment.