[mongodb] Improve mongodb quantization #463

pawelchcki · 2018-06-14T14:12:19Z

This PR backports Hash Quantization from 0.13-dev with some necessary changes to make it a tiny bit more compatible with previous MongoDb quantization.

It does so by adding "truncate_arrays" option ensuring the quantization will only include first item from array with nested arrays or objects. Its an important facet to effective MongoDb quantization.

in addition it normalizes hash keys used in resource string. Allowing expected merging of to occur and avoiding situation when one field was included twice.

{.., :collection: "name", "collection": "name",...}

dd-trace-rb/test/contrib/mongodb/client_test.rb

Line 126 in a9087ad

    
           assert_equal('{"operation"=>:insert, "database"=>"test", "collection"=>"people", "documents"=>[{:name=>"?", :hobbies=>["?"]}, "?"], "ordered"=>"?"}', span.resource)

TODO:

docs for quantization in mongo

delner · 2018-06-14T17:49:56Z

Can you explain what truncate_arrays does and why it's necessary? And the other features like indifferent equals and so on?

I'm concerned this is making the quantization more complicated than it should be, and thus reducing performance.

pawelchcki · 2018-06-15T11:18:39Z

Expanding on the PR description:

Array truncation is something that was already used in MongoDB quantization. I reimplemented that in Hash quantization.

The case for it is that mongodb can have quite a lot of embeded documents in its query. And that number can be variable.

Imagine a query that depending on conditions updates 1 to 10000 different documents.
That would mean that there will up to 10000 different resource names created for essentially the same operation. And that would convey little information to the enduser So instead of creating a resource:

{"u"=>[{a: ?}, {a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?}

Array truncation will cut it down to just

{"u"=>[{a:?}, "?"]}

This is something that already exists in current implementation of mongodb quantization.
However it results in following resource name:

{"u"=>{a:?}}

Which I think hides too much information. So I tweaked it a tiny bit, however I'm not married to that idea.

As for performance of the changes, adding one if and a method call overhead that is performed only on arrays should have minimal penalty.

When arrays are truncated it should however give quite measurable speedup due to not having to traverse all objects in array.

delner · 2018-06-15T15:37:55Z

I think your case where {"u"=>[{a: ?}, {a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?},{a:?}]} repeats, it's pretty clear how it benefits. But what if there are cases where that {a:?} isn't a simple hash but a meaningful object? Such as a document bulk insert? Then that wouldn't be so good, I'd think.

I think we should adopt a simple rule for each data type that makes general sense and live with some of the consequences. Otherwise I fear quantization strategy will constantly thrash whenever the general strategy is sub-optimal for a particular schema. Hash::Quantization should be that general strategy, and if it's too sub-optimal for a specific schema (say ElasticSearch) then, that integration should re-implement its own strategy. In this case, I think the generic strategy is fine for ES.

But getting back to the specifics, and applying what I said in the previous paragraph, I think your point that Arrays are repetitive content, and should be truncated is probably the best general fit. In which case I would suggest Arrays always become ["?"] in Hash::Quantization.

However, I do like your idea of {"u"=>[{a:?}, "?"]} since it gives a sample of what complex object would have been truncated entirely. Maybe that's a decent compromise, and should be integrated in somehow, or at least be an option. And the more I think about this, the more I think we need to refactor the Hash::Quantization to use a strategy pattern, where an integration or some other part of code can define specifics of what strategy it wants to implement (like your truncation strategy.)

But before we do that, let's do the minimal changes here to fix the bug, then pursue a better quantization strategy.

…ation_improvements + backport tests into spec suite # Conflicts: # lib/ddtrace/quantization/hash.rb # spec/ddtrace/quantization/hash_spec.rb # test/contrib/mongodb/client_test.rb

pawelchcki · 2018-06-18T10:03:37Z

@delner
I've implemented your initial suggestion of using Hash quantization module, to filter MongoDb unwanted object keys. I think it works well, to remove that I'd have to reimplement quite a bit of code thats already in Hash quantization module.

As for {'a'=>?} it works as before, i.e. if ? is a object and not a strings it will be recursed into to perform quantization and reflect the structure of the object. So [{a:{b:{c:1}}}, {a:{b:...}] will be quantised to [{a:{b:{c:"?"}}}, "?"]

I've rebased this branch on 0.13-dev - once #465 is merged, I'll refresh this PR to reflect only relevant changes.

I'll make the array truncation a default per your suggestions.

dd-trace-rb/spec/ddtrace/quantization/hash_spec.rb

Lines 78 to 82 in 47bc33e

    
           context 'given a Array with nested hashes' do 
        
             let(:hash) { [{ foo: { bar: 1 } }, { foo: { bar: 2 } }] } 
        
             it { is_expected.to eq([{ foo: { bar: '?' } }, '?']) } 
        
           end

pawelchcki · 2018-06-19T16:07:23Z

Added this PR #467 to test the changes on ES

delner

👍

pawelchcki added 8 commits June 14, 2018 12:02

Backport hash quantization from 0.13-dev

5197dde

use strings to build resource name

e587e15

make hash quantization stop leaking symbols in older rubies

8be28ea

Use hash quantizer to quantize the mongodb operations

bcb4885

Initial implementation of mongodb quantization using Hash quantization

f30a8bd

Fix test

81a827c

add tests to hash testing new functionality

5adcc19

Cleanup test failures and rubocop

cb7e471

pawelchcki added bug Involves a bug integrations Involves tracing integrations feature Involves a product feature labels Jun 14, 2018

pawelchcki changed the title ~~Improve mongodb quantization~~ [mongodb] Improve mongodb quantization Jun 14, 2018

pawelchcki added 2 commits June 14, 2018 16:22

Add quantization doc

e1e3655

Shuffle the code around to make it look nicer

a9087ad

pawelchcki requested review from delner and removed request for delner June 14, 2018 15:12

Actually implement ability to hide collection value

84ea756

pawelchcki requested a review from delner June 14, 2018 15:47

delner added this to the 0.13.0 milestone Jun 15, 2018

delner assigned pawelchcki Jun 15, 2018

Merge branch 'merge_master_into_0.13-dev' into bugfix/mongodb_quantiz…

41d6a85

…ation_improvements + backport tests into spec suite # Conflicts: # lib/ddtrace/quantization/hash.rb # spec/ddtrace/quantization/hash_spec.rb # test/contrib/mongodb/client_test.rb

Truncate arrays by default

47bc33e

pawelchcki changed the base branch from master to 0.13-dev June 18, 2018 10:11

pawelchcki changed the base branch from 0.13-dev to merge_master_into_0.13-dev June 18, 2018 10:25

pawelchcki changed the base branch from merge_master_into_0.13-dev to 0.13-dev June 18, 2018 15:13

delner approved these changes Jun 20, 2018

View reviewed changes

pawelchcki merged commit 07de906 into 0.13-dev Jun 20, 2018

pawelchcki deleted the bugfix/mongodb_quantization_improvements branch June 21, 2018 10:41

marcotc mentioned this pull request Nov 4, 2021

Deprecate:Unreachable methods in MongoDB integration #1760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mongodb] Improve mongodb quantization #463

[mongodb] Improve mongodb quantization #463

pawelchcki commented Jun 14, 2018 •

edited

delner commented Jun 14, 2018 •

edited

pawelchcki commented Jun 15, 2018

delner commented Jun 15, 2018

pawelchcki commented Jun 18, 2018 •

edited

pawelchcki commented Jun 19, 2018

delner left a comment

[mongodb] Improve mongodb quantization #463

[mongodb] Improve mongodb quantization #463

Conversation

pawelchcki commented Jun 14, 2018 • edited

delner commented Jun 14, 2018 • edited

pawelchcki commented Jun 15, 2018

delner commented Jun 15, 2018

pawelchcki commented Jun 18, 2018 • edited

pawelchcki commented Jun 19, 2018

delner left a comment

Choose a reason for hiding this comment

pawelchcki commented Jun 14, 2018 •

edited

delner commented Jun 14, 2018 •

edited

pawelchcki commented Jun 18, 2018 •

edited