Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field [elastiknn_dense_float_vector] is defined twice. #176

Closed
zeppelinen opened this issue Oct 23, 2020 · 5 comments
Closed

Field [elastiknn_dense_float_vector] is defined twice. #176

zeppelinen opened this issue Oct 23, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@zeppelinen
Copy link

Hi!

First of all thank you for great job.

I fail to create an index with two elastiknn_dense_float_vector fields. I get the following error:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Field [elastiknn_dense_float_vector] is defined twice."}],"type":"illegal_argument_exception","reason":"Field [elastiknn_dense_float_vector] is defined twice."},"status":400}

Index with single elastiknn_dense_float_vector field is created with no problems.

Relevant piece of my index mappings:

      "name_vector" : {
        "type" : "elastiknn_dense_float_vector",
        "elastiknn": {
          "dims": 512,
          "model": "lsh",
          "similarity": "l2",
          "L": 99,
          "k": 1,
          "w": 3
        }
      },
      "summary_vector": {
        "type": "elastiknn_dense_float_vector",
        "elastiknn": {
          "dims": 512,
          "model": "lsh",
          "similarity": "l2",
          "L": 99,
          "k": 1,
          "w": 3
        }
      }
@alexklibisz
Copy link
Owner

@zeppelinen Coincidentally I also ran into a closely related issue when updating to es 7.9.2. I'll keep you posted. I should have some to look into it more this weekend and early next week.

@alexklibisz
Copy link
Owner

@zeppelinen Which version are you using? I was trying to reproduce the issue but was actually able to make an index with the following mapping:

{
  "issue-176" : {
    "mappings" : {
      "properties" : {
        "df0" : {
          "type" : "elastiknn_dense_float_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "dims" : 10
          }
        },
        "df1" : {
          "type" : "elastiknn_dense_float_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "angular",
            "dims" : 10,
            "L" : 10,
            "k" : 1
          }
        },
        "df2" : {
          "type" : "elastiknn_dense_float_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "angular",
            "dims" : 10,
            "L" : 10,
            "k" : 1
          }
        },
        "df3" : {
          "type" : "elastiknn_dense_float_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "l2",
            "dims" : 10,
            "L" : 21,
            "k" : 2,
            "w" : 3
          }
        },
        "df4" : {
          "type" : "elastiknn_dense_float_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "permutation_lsh",
            "dims" : 10,
            "k" : 22,
            "repeating" : false
          }
        },
        "id" : {
          "type" : "keyword",
          "store" : true
        },
        "sb0" : {
          "type" : "elastiknn_sparse_bool_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "dims" : 10
          }
        },
        "sb1" : {
          "type" : "elastiknn_sparse_bool_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "sparse_indexed",
            "dims" : 10
          }
        },
        "sb2" : {
          "type" : "elastiknn_sparse_bool_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "jaccard",
            "dims" : 10,
            "L" : 10,
            "k" : 2
          }
        },
        "sb3" : {
          "type" : "elastiknn_sparse_bool_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "jaccard",
            "dims" : 10,
            "L" : 10,
            "k" : 2
          }
        },
        "sb4" : {
          "type" : "elastiknn_sparse_bool_vector",
          "similarity" : "boolean",
          "elastiknn" : {
            "model" : "lsh",
            "similarity" : "hamming",
            "dims" : 10,
            "L" : 10,
            "k" : 3
          }
        }
      }
    }
  }
}

@alexklibisz alexklibisz added the bug Something isn't working label Oct 23, 2020
@zeppelinen
Copy link
Author

zeppelinen commented Oct 24, 2020 via email

alexklibisz added a commit that referenced this issue Oct 24, 2020
… vectors in one doc. (#178)

As far as I could tell, Elastiknn was already able to support creating an index and mapping with multiple vectors in the same doc.
I added a test to make sure this would work, because it was brought up in an issue: #176

For a while the test was failing, which I thought might be due to not being able to access the mapping.
It turned out that I had a typo in the test, so it was trying to query against a field which didn't exist.
But in figuring that out, I figured out how to refactor the VectorMapper so that it uses a less-hacky way to lookup the mapping for a given field.
It used to make a GetFieldMappingRequest. Now it just uses the QueryShardContext#fieldType method to get a MappedFieldType which contains the mapping.
Some more related discussion here: elastic/elasticsearch#64069
@alexklibisz
Copy link
Owner

I added a test to make sure it's working in #178 . The test creates a mapping like the one above, indexes 100 docs with this mapping, runs a count request to count the number of fields with each field, makes sure the result for each request is 100, then runs a query targeting each of the vector fields and makes sure it returns some neighbors. It's possible there's still some edge case that I hadn't thought of.

@zeppelinen
Copy link
Author

zeppelinen commented Oct 27, 2020

I tested same configuration again using plugin from official project releases page and it works just fine.
Looks like my environment is broken. I'd blame adoptopenjdk docker container used for build.
Thank you @alexklibisz for your great work. I continue testing, hope will be back with more useful feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants