Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store ID in doc values for ~40% faster queries #104

Merged
merged 14 commits into from
Jul 7, 2020

Conversation

alexklibisz
Copy link
Owner

@alexklibisz alexklibisz commented Jul 7, 2020

#102, as described in this discussion: elastic/elasticsearch#17159 (comment).
This removes the LZ4.decompress() hotspot and reduces the L2 LSH benchmark from ~10 seconds to ~6.5 seconds (on my laptop).
Possibly also resolves #95. Need to revisit that one to see if the decompress calls were actually decompressing vectors or the doc body.

Specific changes to scala client:

  • Stores document ID in a doc-values field.
  • Retrieves the ID from the doc-values field, and not the typical document ID.
  • Uses a custom response handler to copy the ID value into its regular position in the elastic4s SearchHit case class, so users don't need to find it in the weakly-typed .fields map.
  • Better comments and more consistent naming conventions.

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

Benchmark Results

dataset similarity algorithm k recallP10 durationP10 recallP50 durationP50 recallP90 durationP90 mapping query
Random1000d50K1K Angular Exact 100 1.0 168.005 1.0 188.0 1.0 200.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"angular","vec":{},"model":"exact"}
Random1000d50K1K Angular LSH 100 0.15 94.004 0.16005 108.015 0.18 113.018 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"angular","dims":1000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"angular","model":"lsh"}
Random1000d50K1K L2 Exact 100 1.0 116.003 1.0 134.0 1.0 145.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"l2","vec":{},"model":"exact"}
Random1000d50K1K L2 LSH 100 0.01 13.0 0.01005 15.0 0.02 16.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"l2","dims":1000,"bands":400,"rows":1,"width":3}} {"field":"vec","candidates":1000,"vec":{},"similarity":"l2","model":"lsh"}
Random3000d50K1K Jaccard Exact 100 1.0 444.011 1.0 470.005 1.0 485.045 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"dims":3000}} {"field":"vec","similarity":"jaccard","vec":{},"model":"exact"}
Random3000d50K1K Jaccard LSH 100 0.24004 42.007 0.29005 50.005 0.30009 53.0 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"model":"lsh","similarity":"jaccard","dims":3000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"jaccard","model":"lsh"}

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

Benchmark Results

dataset similarity algorithm k recallP10 durationP10 recallP50 durationP50 recallP90 durationP90 mapping query
Random1000d50K1K Angular Exact 100 1.0 113.038 1.0 164.005 1.0 169.045 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"angular","vec":{},"model":"exact"}
Random1000d50K1K Angular LSH 100 0.15 100.004 0.16005 111.0 0.18 115.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"angular","dims":1000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"angular","model":"lsh"}
Random1000d50K1K L2 Exact 100 1.0 84.022 1.0 120.01 1.0 125.036 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"l2","vec":{},"model":"exact"}
Random1000d50K1K L2 LSH 100 0.01 14.0 0.01005 16.0 0.02 17.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"l2","dims":1000,"bands":400,"rows":1,"width":3}} {"field":"vec","candidates":1000,"vec":{},"similarity":"l2","model":"lsh"}
Random3000d50K1K Jaccard Exact 100 1.0 383.005 1.0 392.02 1.0 402.018 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"dims":3000}} {"field":"vec","similarity":"jaccard","vec":{},"model":"exact"}
Random3000d50K1K Jaccard LSH 100 0.24004 47.001 0.29005 50.005 0.30009 51.0 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"model":"lsh","similarity":"jaccard","dims":3000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"jaccard","model":"lsh"}

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

Benchmark Results

dataset similarity algorithm k recallP10 durationP10 recallP50 durationP50 recallP90 durationP90 mapping query
Random1000d50K1K Angular Exact 100 1.0 108.032 1.0 142.055 1.0 155.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"angular","vec":{},"model":"exact"}
Random1000d50K1K Angular LSH 100 0.15 99.001 0.16005 105.0 0.18 106.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"angular","dims":1000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"angular","model":"lsh"}
Random1000d50K1K L2 Exact 100 1.0 100.005 1.0 118.0 1.0 119.027 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"l2","vec":{},"model":"exact"}
Random1000d50K1K L2 LSH 100 0.01 12.0 0.01005 13.0 0.02 13.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"l2","dims":1000,"bands":400,"rows":1,"width":3}} {"field":"vec","candidates":1000,"vec":{},"similarity":"l2","model":"lsh"}
Random3000d50K1K Jaccard Exact 100 1.0 320.034 1.0 377.02 1.0 386.027 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"dims":3000}} {"field":"vec","similarity":"jaccard","vec":{},"model":"exact"}
Random3000d50K1K Jaccard LSH 100 0.24004 39.0 0.29005 45.0 0.30009 46.018 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"model":"lsh","similarity":"jaccard","dims":3000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"jaccard","model":"lsh"}

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

Benchmark Results

dataset similarity algorithm k recallP10 durationP10 recallP50 durationP50 recallP90 durationP90 mapping query
Random1000d50K1K Angular Exact 100 1.0 173.013 1.0 198.005 1.0 212.018 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"angular","vec":{},"model":"exact"}
Random1000d50K1K Angular LSH 100 0.15 91.003 0.16005 111.005 0.18 117.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"angular","dims":1000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"angular","model":"lsh"}
Random1000d50K1K L2 Exact 100 1.0 111.032 1.0 145.0 1.0 147.018 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"l2","vec":{},"model":"exact"}
Random1000d50K1K L2 LSH 100 0.01 13.0 0.01005 15.005 0.02 16.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"l2","dims":1000,"bands":400,"rows":1,"width":3}} {"field":"vec","candidates":1000,"vec":{},"similarity":"l2","model":"lsh"}
Random3000d50K1K Jaccard Exact 100 1.0 421.01 1.0 475.015 1.0 485.009 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"dims":3000}} {"field":"vec","similarity":"jaccard","vec":{},"model":"exact"}
Random3000d50K1K Jaccard LSH 100 0.24004 42.003 0.29005 49.005 0.30009 51.009 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"model":"lsh","similarity":"jaccard","dims":3000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"jaccard","model":"lsh"}

@github-actions
Copy link

github-actions bot commented Jul 7, 2020

Benchmark Results

dataset similarity algorithm k recallP10 durationP10 recallP50 durationP50 recallP90 durationP90 mapping query
Random1000d50K1K Angular Exact 100 1.0 143.016 1.0 164.01 1.0 169.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"angular","vec":{},"model":"exact"}
Random1000d50K1K Angular LSH 100 0.15 82.018 0.16005 105.005 0.18 109.0 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"angular","dims":1000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"angular","model":"lsh"}
Random1000d50K1K L2 Exact 100 1.0 93.006 1.0 115.005 1.0 118.018 {"type":"elastiknn_dense_float_vector","elastiknn":{"dims":1000}} {"field":"vec","similarity":"l2","vec":{},"model":"exact"}
Random1000d50K1K L2 LSH 100 0.01 13.0 0.01005 13.005 0.02 14.009 {"type":"elastiknn_dense_float_vector","elastiknn":{"model":"lsh","similarity":"l2","dims":1000,"bands":400,"rows":1,"width":3}} {"field":"vec","candidates":1000,"vec":{},"similarity":"l2","model":"lsh"}
Random3000d50K1K Jaccard Exact 100 1.0 384.02 1.0 418.01 1.0 426.009 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"dims":3000}} {"field":"vec","similarity":"jaccard","vec":{},"model":"exact"}
Random3000d50K1K Jaccard LSH 100 0.24004 43.006 0.29005 52.0 0.30009 53.009 {"type":"elastiknn_sparse_bool_vector","elastiknn":{"model":"lsh","similarity":"jaccard","dims":3000,"bands":400,"rows":1}} {"field":"vec","candidates":1000,"vec":{},"similarity":"jaccard","model":"lsh"}

@alexklibisz alexklibisz merged commit 82ddc48 into master Jul 7, 2020
@alexklibisz alexklibisz deleted the perf-id-in-doc-values branch July 7, 2020 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BinaryDocValues performance regression in ES 7.7.x and Lucene 8.5.1
1 participant