Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Abondon the idea of having an implicit lookup from bucket to index when using Yokozuna to drive search input to map-reduce. This is a very long-winded commit message because the subject matter is confusing. Scroll to end for TL;DR. Riak Search and Yokozuna can both be used as engines to produces results for a search input to a map-reduce job. For example, in pre-Yokozuna days if a user specified the following HTTP JSON map-reduce job Riak Search would be used to run a query against the "bucket" foo. "inputs":{"bucket":"foo", "query":"bar"} Calling it bucket is really a misnomer because what is being searched is actually an _index_ of bucket foo. However, since Riak Search forced a 1:1 mapping from bucket to index name there wasn't much of a difference. The possibility of a M:1 bucket to index in the future was never considered. Fast forward to today. There are two search systems in Riak now. Yokozuna will eventually replace Riak Search but for the time being there will be migrations from one to the other. Furthermore, Search APIs have already been exposed that assumed the functionality of Riak Search, which is a small subset of Yokozuna. Therefore, Yokozuna must work around these weird cases. In this case, passing the previous HTTP JSON map-reduce to Yokozua, the user might expect 1 of 2 behaviors. 1. A bucket has an associated index. Get the index from the mapping and run the query against that. 2. Since the input should have been an index name to being with, and 1:1 mapping was just a coincidence in Riak Search, then simply treat the 'bucket' input as the index name to search against. The first option might seem like the most obvious but what if the user decided to index multiple buckets under the same index? The results would now include results from other buckets. An implicit filter could be added to the query but we've started introducing more implicit (magical) behavior. Furthermore, under the covers, it complicates the code. For security purposes do we check for permission on the bucket, on both? But even if that wasn't enough, things get EVEN STRANGER when you consider that you can use 2i as input to a map-reduce job with the following JSON. "inputs":{"bucket":"foo", "index":"field1_bin", "key":"bar"} Hey, I know, we could overload this "inputs" field even more and covert the following into a search request. "inputs":{"index":"foo", "query":"bar"} But if we chose option #1 from above we have a problem here. The Yokozuna code needs to know if the incoming search request is using an index name or a bucket name. If the later if needs to lookup the index to query. But in order to do this the Riak KV code needs to create a new data structure to pass as the "Bucket" parameter so that the Yokozuna callback knows which is which. Does this sound very confusing? I hope so, because it is! So after a long talk with Bryan Fink we decided to stop trying to be cute and just face the fact that "bucket" was a poor name for the input structure. Just pretend it is actually "index" and Yokozuna will treat it as such. TL;DR; If you are migrating from Riak Search then you should make sure to keep a 1:1 mapping for bucket to index if you don't want to change your map-reduce input. If you are starting fresh then just use the new JSON format feed map-reduce with Yokozuna. "inputs":{"index":"foo", "query":"field1:bar", "filter"field2:baz"} (Note that "filter" is optional)
- Loading branch information