You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently lsh hashes are stored as individual keyword fields, i.e.:
{
"vec_proc": {
"1,1": "123",
"1,2": "345",
...
}
Where "1,1" for the min-hash algorithm corresponds to "table 1, band 1". The reason I stored them this way is to enable boolean matching queries against the individual fields.
It turns out that with sufficiently many fields, elasticsearch starts complaining that you're exceeding the limit of total fields, with exceptions like this:
elasticsearch.helpers.errors.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'elastiknn-auto-similarity_jaccard-lsh-27983-1581962133', '_type': '_doc', '_id': 'RWBKVHABgla2WqqUhhQK', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Limit of total fields [1000] in index [elastiknn-auto-similarity_jaccard-lsh-27983-1581962133] has been exceeded'}, 'data': {'dataset_index': 0, 'vec_raw': {'sparseBoolVector': {'trueIndices': [0, 2, 3, 5, 9, 18, 20, 22, 24, 25, 26, 27, 28, 41, 43, 44, 47, 50, 54, ... 15311, 15312], 'totalIndices': 27983}}}}}])
After reading the docs a bit more, it turns out I should be able to get the same query semantics using a text field with a boolean similarity:
Currently lsh hashes are stored as individual keyword fields, i.e.:
Where "1,1" for the min-hash algorithm corresponds to "table 1, band 1". The reason I stored them this way is to enable boolean matching queries against the individual fields.
It turns out that with sufficiently many fields, elasticsearch starts complaining that you're exceeding the limit of total fields, with exceptions like this:
After reading the docs a bit more, it turns out I should be able to get the same query semantics using a
text
field with aboolean
similarity:Both of the searches return a score of 2 for the stored document.
The text was updated successfully, but these errors were encountered: