Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for KV datatypes (counters, sets, maps) #250

Merged
merged 6 commits into from Dec 18, 2013

Conversation

Projects
None yet
4 participants
@seancribbs
Copy link

commented Dec 11, 2013

  • Extracts top-level types into fields as described in yz_dt_extractor.
  • Defines content types that match those used in riak_kv
  • Adds fields and dynamic fields to the default schema.

TODO:

  • riak_test for indexing CRDTs
@seancribbs

This comment has been minimized.

Copy link
Author

commented Dec 12, 2013

@rzezeski I'd love a little help here, for some reason yokozuna with my branch creates an unusable cluster in riak_test. In the original directory, the dev nodes start up just fine. In my rtdev, solr bombs. Here's a console output: https://gist.github.com/7936269

@@ -32,7 +32,11 @@
{"application/json",yz_json_extractor},
{"application/xml",yz_xml_extractor},
{"text/plain",yz_text_extractor},
{"text/xml",yz_xml_extractor}]).
{"text/xml",yz_xml_extractor},

This comment has been minimized.

Copy link
@rzezeski

rzezeski Dec 16, 2013

Contributor

This list needs to be ordered as the ordsets modules is used to determine extractor.

This comment has been minimized.

Copy link
@seancribbs

seancribbs Dec 16, 2013

Author

Addressed in a0fb872

Fix some bugs.
* Extractor registrations have to be sorted.
* There's a tiny delay (about 1s) in committing to Solr, so we have to
  use rt:wait_until/1 for the queries.
* URL-escape fields and terms in search queries going over HTTP.
* Cleanup whitespace

@ghost ghost assigned coderoshi Dec 17, 2013

%% {<<"name_register">>, <<"Ryan Zezeski">>},
%% {<<"phones_set">>, <<"555-5555">>},
%% {<<"phones_set">>, <<"867-5309">>},
%% {<<"page_views_counter">>, <<"1502">>},

This comment has been minimized.

Copy link
@coderoshi

coderoshi Dec 17, 2013

Contributor

I'm curious why *_flag is boolean, but *_counter is binary?

This comment has been minimized.

Copy link
@seancribbs

seancribbs Dec 17, 2013

Author

I'm following convention of the JSON extractor. Should it be an integer?

This comment has been minimized.

Copy link
@coderoshi

coderoshi Dec 17, 2013

Contributor

Nevermind, this will work fine. I forgot we converted json numbers to binary too.

@lenary

This comment has been minimized.

Copy link

commented Dec 17, 2013

👍 seems all good to me (though I should really run the tests)

@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2013

Yes, please someone make sure they paste the riak_test results output in the PR. All tests but yz_schema_admin should pass.

@seancribbs

This comment has been minimized.

Copy link
Author

commented Dec 17, 2013

@rzezeski Is it that my schema changes broke yz_schema_admin or something else?

@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2013

@seancribbs No, caused by upgrade to solr 4.6 (#245).


-spec field_name(field_path_name(), datatype(), binary()) -> binary().
field_name(undefined, map, _Sep) ->
%% (timeforthat)

This comment has been minimized.

Copy link
@coderoshi

This comment has been minimized.

Copy link
@seancribbs

seancribbs Dec 18, 2013

Author

Got some time for that in 0a31d51

<!-- Riak datatypes embedded fields -->
<dynamicField name="*_flag" type="boolean" indexed="true" stored="true" />
<dynamicField name="*_counter" type="int" indexed="true" stored="true" />
<dynamicField name="*_register" type="string" indexed="true" stored="false" />

This comment has been minimized.

Copy link
@coderoshi

coderoshi Dec 18, 2013

Contributor

I can see why set would be not be stored, but why is register not stored? By default all dynamic fields are stored except text_general because of their size. Is register generally going to be larger than a string type?

This comment has been minimized.

Copy link
@seancribbs

seancribbs Dec 18, 2013

Author

We don't know what people will put in a register (or a set for that matter), so the size could be unbounded. Does Solr have a limit on the size of string fields? Is there an opportunity here for some documentation?

This comment has been minimized.

Copy link
@coderoshi

coderoshi Dec 18, 2013

Contributor

The difference between a string and text is that text is analyzed (stemmed, downcase, punctuation ignored). We don't store text simply because it tends to store larger values, not because of any hard limit. Since it's really about reasonable defaults, it should depend on how people tend to use registers. Are users going to be more commonly storing a few bytes like strings, or kilo/megabytes like text?

This comment has been minimized.

Copy link
@seancribbs

seancribbs Dec 18, 2013

Author

I would hope smallish values (i.e. < 1KB), but there's no hard limit. How about I switch it to stored="true" and then we plan document this particular quirk.

This comment has been minimized.

Copy link
@coderoshi

coderoshi Dec 18, 2013

Contributor

👍

This comment has been minimized.

Copy link
@seancribbs
@coderoshi

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

+1

@seancribbs

This comment has been minimized.

Copy link
Author

commented Dec 18, 2013

👯 WOO!

seancribbs added a commit that referenced this pull request Dec 18, 2013

Merge pull request #250 from basho/feature/sdc/crdt-extractors
Add support for KV datatypes (counters, sets, maps)

@seancribbs seancribbs merged commit 38e37ee into develop Dec 18, 2013

@seancribbs seancribbs deleted the feature/sdc/crdt-extractors branch Dec 18, 2013

@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

Where's my riak test output?

@seancribbs

This comment has been minimized.

Copy link
Author

commented Dec 18, 2013

@rzezeski Would mine be sufficient? Do you need a full run of verify.sh?

@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

@seancribbs I just want to make sure someone ran the full suite. If no one did it's not a huge deal as I will run it on develop soon enough. I know I don't always follow this rule myself but moving forward it would be nice if the result output was a prerequisite for a PR being merged.

@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

Dialyzer had two issues with new extractor. I haven't looked at them closely as I'm trying to do something else ATM.

yz_dt_extractor.erl:93: The variable Err can never match since previous clauses completely covered the type #state{}
yz_dt_extractor.erl:99: The call yz_dt_extractor:extract_map(Name::binary()) breaks the contract (field_path_name()) -> fun(({{binary(),module()},term()},state()) -> state())
@seancribbs

This comment has been minimized.

Copy link
Author

commented Dec 18, 2013

@rzezeski Thanks, will fix.

@coderoshi

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

@rzezeski Sorry about that. My security failure is unrelated to this

yz_wm_extract_test-bitcask   : pass
yz_stat_test-bitcask         : pass
yz_solr_start_timeout-bitcask: pass
yz_siblings-bitcask          : pass
yz_security-bitcask          : fail
yz_schema_admin-bitcask      : pass
yz_rs_migration_test-bitcask : pass
yz_pb-bitcask                : pass
yz_monitor_solr-bitcask      : pass
yz_mapreduce-bitcask         : pass
yz_languages-bitcask         : pass
yz_index_admin-bitcask       : pass
yz_fallback-bitcask          : pass
yz_errors-bitcask            : pass
yz_dt_test-bitcask           : pass
yokozuna_essential-bitcask   : pass
aae_test-bitcask             : pass
---------------------------------------------
1 Tests Failed
16 Tests Passed
@rzezeski

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

@coderoshi No problem, thanks for attaching the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.