Skip to content

micro optimization #101

Merged
merged 4 commits into from Apr 18, 2012

2 participants

@rzezeski

This PR includes two very small micro optimizations. More importantly it cleans up code around extracting scoring properties and exposes preflist calculation which is handy for debugging/profiling at the console.

rzezeski added some commits Mar 2, 2012
@rzezeski rzezeski Remove trailing space 7a7b6e3
@rzezeski rzezeski Add better scoring property extraction
This avoids calling `lists:flatten` on unnecessairly deeps lists.  It
also has the added bonus on keeping op logic where it belongs -- in
the ops themselves.

Below are the relevant fprof results.  Basically, in the case tested
(a single term query, which is best case for this function), the new
code cut the number of function calls to a third of the original.

    {ok, Ops} = RC:parse_query(<<"tweets">>, <<"created_at:20100113T032128">>).
    fprof:apply(RC, get_scoring_info, [Ops], [start]).

Before
------

%                                               CNT       ACC       OWN
[{ totals,                                      152,    3.082,    3.082}].  %%%

{[{{riak_search_client,get_scoring_info2,2},      1,    2.599,    0.027}],
 { {riak_search_client,get_scoring_props,2},      1,    2.599,    0.027},     %
 [{{lists,flatten,1},                             1,    1.289,    0.008},
  {{riak_search_client,get_scoring_props_1,2},    1,    1.283,    0.005}]}.

After
-----

%                                               CNT       ACC       OWN
[{ totals,                                       48,    0.694,    0.667}].  %%%

{[{{fprof,apply_start_stop,4},                    1,    0.672,    0.038}],
 { {riak_search_client,get_scoring_info,2},       1,    0.672,    0.038},     %
 [{{riak_search_op,extract_scoring_props,1},      1,    0.522,    0.012},
  {{lists,flatten,1},                             1,    0.064,    0.018},
  {{riak_search_client,'-get_scoring_info/2-lc$^0/1-0-',1},   1,    0.025,    0.011},
  {{lists,foldl,3},                               1,    0.015,    0.004},
  {{lists,sum,1},                                 1,    0.007,    0.002},
  {{math,pow,2},                                  1,    0.001,    0.001}]}.
6a8e211
@rzezeski rzezeski Don't call `ensure_loaded`
Search nodes shouldn't see ops they don't understand.  They code with
a mixed cluster but this code won't help.  It's cost, added latency
per request, is not worth what it provides.
6f418c9
@rzezeski rzezeski Centralize and expose preflist calculation
This will help when testing/debugging/benchmarking.  You need this
info if you want to manually target a specific index instance.
7239503
@rzezeski rzezeski was assigned Mar 29, 2012
@russelldb russelldb was assigned Apr 18, 2012
@russelldb russelldb commented on the diff Apr 18, 2012
src/riak_search_op.erl
@@ -42,11 +49,4 @@ chain_op(Op, OutputPid, Ref, SearchState) ->
op_to_module(Op) ->
ModuleString = "riak_search_op_" ++ atom_to_list(element(1, Op)),
@russelldb
Basho Technologies member
russelldb added a note Apr 18, 2012

You've done away with the code to ensure the Mod is loaded, is this 'cos you consider it defensive programming? Doh! Just read the commit comment. Scratch this, I know why you did away with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@russelldb russelldb commented on the diff Apr 18, 2012
src/riak_search_op.erl
@@ -42,11 +49,4 @@ chain_op(Op, OutputPid, Ref, SearchState) ->
op_to_module(Op) ->
ModuleString = "riak_search_op_" ++ atom_to_list(element(1, Op)),
- Module = list_to_atom(ModuleString),
- case code:ensure_loaded(Module) of
- {module, Module} ->
- Module;
- {error, _} ->
- ?PRINT({unknown_op, Op}),
- throw({unknown_op, Op})
- end.
+ list_to_atom(ModuleString).
@russelldb
Basho Technologies member
russelldb added a note Apr 18, 2012

Won't 'list_to_existing_atom/1' do here? If the module is loaded then it will exist as an atom and any crash/error from a bad Op name will be closer to the cause.

@rzezeski
rzezeski added a note Apr 18, 2012

Yea list_to_existing_atom would probably be the right thing. I plan to refactor the query engine to realize this stuff one time up-front so I'll keep this in the back of my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@russelldb
Basho Technologies member

I don't understand why.

op_range_sized is a leaf on the operator tree (like op_term) so it must produce some value for scoring props unlike the inner nodes that just recursively call extract_scoring_props/1 for their nested ops.

The reason for {0, 1} is to use the addition/multiplication identities so that the NumDocs and SumOfSquaredWeights are not affected by a range op in the following code: https://github.com/basho/riak_search/blob/master/src/riak_search_client.erl#L426

However, it looks like my change will produce a different value for NumTerms since the {0, 1} will not be filtered out by the lists:flatten whereas the old code just inserted an empty list. But a quick glance at the code shows no use of num_terms anywhere.

Did I answer your question or confuse matters more?

@russelldb
Basho Technologies member

A fair few white space only changes worked in with the functional changes. Built, ran, tests pass +1 to merge.

@rzezeski rzezeski was assigned Apr 18, 2012
@rzezeski rzezeski merged commit 7239503 into basho:master Apr 18, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.