How to use image search #67

zhenzi0322 · 2020-05-18T07:40:27Z

When I used image search, I returned all documents from elasticsearch.How does the elastiknn plugin generate image feature vectors?The language I use is python

alexklibisz · 2020-05-18T17:50:16Z

Generating image feature vectors is up to you. You can do it a few ways..

just unroll the image into a vector (e.g. an image with height 28 and width 28 becomes a vector with length 784).
use an algorithm/library like phash to generate a more robust feature vector than just the raw pixel values
use a convolutional network to process the image but extract the values at the next-to-last layer instead of the classification layer.

alexklibisz · 2020-05-19T00:11:14Z

I have briefly considered adding some functionality to the plugin to ingest images but there are many other things to solve first. It might be implemented as an ingest processor with a small handful of common algos for mapping images to vectors, e.g. phash, sift, a few convnets, etc..

zhenzi0322 · 2020-05-19T01:08:51Z

I have briefly considered adding some functionality to the plugin to ingest images but there are many other things to solve first. It might be implemented as an ingest processor with a small handful of common algos for mapping images to vectors, e.g. phash, sift, a few convnets, etc..

{'_index': 'long', '_type': '_doc', '_id': 'ids-1485050', '_score': 0.0625227}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485146', '_score': 0.06249257}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485177', '_score': 0.06245229}

I used vgg16 in karas to obtain the image feature vector, and saved the image feature vector in elasticsearch7.40 version. However, when I used elasticsearch to query the image data, I found them all. How can I obtain the image similarity diagram I want to query?How much does the _score attribute have to be similar?I'm using the L1 function here

alexklibisz · 2020-05-19T01:57:33Z

I have briefly considered adding some functionality to the plugin to ingest images but there are many other things to solve first. It might be implemented as an ingest processor with a small handful of common algos for mapping images to vectors, e.g. phash, sift, a few convnets, etc..

{'_index': 'long', '_type': '_doc', '_id': 'ids-1485050', '_score': 0.0625227}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485146', '_score': 0.06249257}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485177', '_score': 0.06245229}

I used vgg16 in karas to obtain the image feature vector, and saved the image feature vector in elasticsearch7.40 version. However, when I used elasticsearch to query the image data, I found them all. How can I obtain the image similarity diagram I want to query?How much does the _score attribute have to be similar?I'm using the L1 function here

I guess you mean they all had roughly the same score? L1 might not be a good similarity function for those vectors. I would try L2.

alexklibisz · 2020-05-19T01:59:01Z

Here is an example of using L2 (on raw image pixels, not feature vectors): http://demo.elastiknn.klibisz.com/dataset/cifar-l2

You can see the exact mapping and query for each set of results by clicking on the Mapping and Query tabs.

zhenzi0322 · 2020-05-19T02:24:14Z

Here is an example of using L2 (on raw image pixels, not feature vectors): http://demo.elastiknn.klibisz.com/dataset/cifar-l2

You can see the exact mapping and query for each set of results by clicking on the Mapping and Query tabs.

That means I have to save the original image pixels in elasticsearch instead of saving the feature vectors. That's right. How does the elastiknn library create search queries?

{
  "query" : {
    "elastiknn_nearest_neighbors" : {
      "field" : "vec",
      "vec" : {
        "index" : "cifar-l2-lsh-2",
        "id" : "15231",
        "field" : "vec"
      },
      "candidates" : 20,
      "similarity" : "l2",
      "model" : "lsh"
    }
  },
  "size" : 10,
  "_source" : true
}

zhenzi0322 · 2020-05-19T02:29:05Z

I have briefly considered adding some functionality to the plugin to ingest images but there are many other things to solve first. It might be implemented as an ingest processor with a small handful of common algos for mapping images to vectors, e.g. phash, sift, a few convnets, etc..

{'_index': 'long', '_type': '_doc', '_id': 'ids-1485050', '_score': 0.0625227}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485146', '_score': 0.06249257}
{'_index': 'long', '_type': '_doc', '_id': 'ids-1485177', '_score': 0.06245229}
I used vgg16 in karas to obtain the image feature vector, and saved the image feature vector in elasticsearch7.40 version. However, when I used elasticsearch to query the image data, I found them all. How can I obtain the image similarity diagram I want to query?How much does the _score attribute have to be similar?I'm using the L1 function here

I guess you mean they all had roughly the same score? L1 might not be a good similarity function for those vectors. I would try L2.

I mapped to create the elasticsearch index, python as follows:

from elastiknn.client import ElastiKnnClient
from elastiknn.api import Mapping


# create indies
eknn = ElastiKnnClient()
dim = 512
index = "long"
field = "long"
mapping = Mapping.DenseFloat(dims=dim) 
eknn.es.indices.refresh()
eknn.es.indices.create(index=index)
eknn.es.indices.refresh()
m = eknn.put_mapping(index, field, mapping)

print(m)  # {'acknowledged': True}

alexklibisz · 2020-05-19T02:35:23Z

You can use that same mapping with L1, L2, and angular. If you want to use an approximate method you'll have to modify the line mapping = Mapping.DenseFloat to another mapping. Unfortunately it looks like I forgot to add a mapping dataclass for the L2 LSH method. That would go here: https://github.com/alexklibisz/elastiknn/blob/master/client-python/elastiknn/api.py#L64 But you can also just create a dict matching the JSON and submit a PUT request. You can see how the mapping is submitted here: https://github.com/alexklibisz/elastiknn/blob/master/client-python/elastiknn/client.py#L49-L54

You can save either the original pixels or the feature vector. I was just pointing to an example where L2 seems to work well on the original pixels. Most papers I've read also use L2 on feature vectors or they normalize the features vectors to unit norm and use angular. I don't think I've seen L1 used for images.

For exact queries, the plugin creates a FunctionScoreQuery that scores every vector in the index against the query vector. So that's obviously not very efficient. For approximate queries it hashes the stored vectors, indexes the hashes (just like words), uses the same hash function to hash the query vector, and runs a boolean match query to lookup stored vectors which share the most hash values with the query vector. There's a lot more info here: http://elastiknn.klibisz.com/api/

zhenzi0322 · 2020-05-19T02:41:40Z

Thank you. I'll try it first

zhenzi0322 · 2020-05-19T03:42:34Z

You can use that same mapping with L1, L2, and angular. If you want to use an approximate method you'll have to modify the line mapping = Mapping.DenseFloat to another mapping. Unfortunately it looks like I forgot to add a mapping dataclass for the L2 LSH method. That would go here: https://github.com/alexklibisz/elastiknn/blob/master/client-python/elastiknn/api.py#L64 But you can also just create a dict matching the JSON and submit a PUT request. You can see how the mapping is submitted here: https://github.com/alexklibisz/elastiknn/blob/master/client-python/elastiknn/client.py#L49-L54

You can save either the original pixels or the feature vector. I was just pointing to an example where L2 seems to work well on the original pixels. Most papers I've read also use L2 on feature vectors or they normalize the features vectors to unit norm and use angular. I don't think I've seen L1 used for images.

For exact queries, the plugin creates a FunctionScoreQuery that scores every vector in the index against the query vector. So that's obviously not very efficient. For approximate queries it hashes the stored vectors, indexes the hashes (just like words), uses the same hash function to hash the query vector, and runs a boolean match query to lookup stored vectors which share the most hash values with the query vector. There's a lot more info here: http://elastiknn.klibisz.com/api/

I made the following error while creating the elasticsearch index:

Traceback (most recent call last):
  File "D:/zhenzi/es7.4.0/main_create.py", line 14, in <module>
    m = eknn.put_mapping(index, field, mapping)
  File "D:\zhenzi\es7.4.0\elastiknn\client.py", line 56, in put_mapping
    return self.es.transport.perform_request("PUT", f"/{index}/_mapping", body=body)
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\transport.py", line 358, in perform_request
    timeout=timeout,
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 257, in perform_request
    self._raise_error(response.status, raw_data)
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\connection\base.py", line 182, in _raise_error
    status_code, error_message, additional_info
elasticsearch.exceptions.TransportError: TransportError(500, '', 'Incompatible type [elastiknn_dense_float_vector], model [Some(lsh)], similarity [None]')

Add in the file (https://github.com/alexklibisz/elastiknn/blob/master/client-python/elastiknn/api.py) the following:

@dataclass(frozen=True)
    class DenseFloatLong(Base):
        dims: int

        def to_dict(self):
            return {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "model": "lsh",
                    "dims": self.dims,
                    "similarity": "12",
                    "bands": 100,
                    "rows": 1,
                    "width": 3
                }
            }

alexklibisz · 2020-05-19T03:46:47Z

Try "similarity": "l2", not "similarity": "12".
The error isn't particularly helpful, but similarity [None] means it wasn't able to match "12" to a known similarity.

zhenzi0322 · 2020-05-19T03:55:43Z

Try "similarity": "l2", not "similarity": "12".
The error isn't particularly helpful, but similarity [None] means it wasn't able to match "12" to a known similarity.

Do not "similarity": "12".Again, the following error message:

Traceback (most recent call last):
  File "D:/zhenzi/es7.4.0/main_create.py", line 14, in <module>
    m = eknn.put_mapping(index, field, mapping)
  File "D:\zhenzi\es7.4.0\elastiknn\client.py", line 56, in put_mapping
    return self.es.transport.perform_request("PUT", f"/{index}/_mapping", body=body)
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\transport.py", line 358, in perform_request
    timeout=timeout,
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 257, in perform_request
    self._raise_error(response.status, raw_data)
  File "F:\py368\Envs\knn\lib\site-packages\elasticsearch\connection\base.py", line 182, in _raise_error
    status_code, error_message, additional_info
elasticsearch.exceptions.TransportError: TransportError(500, '', 'Incompatible type [elastiknn_dense_float_vector], model [Some(lsh)], similarity [None]')

Create an elasticsearch index file as follows:

from elastiknn.client import ElastiKnnClient
from elastiknn.api import Mapping

# create indies
eknn = ElastiKnnClient()
dim = 3072
index = "test"
field = "test"

mapping = Mapping.DenseFloatLong(dims=dim)
eknn.es.indices.refresh()
eknn.es.indices.create(index=index)
eknn.es.indices.refresh()
m = eknn.put_mapping(index, field, mapping)

print(m)  # {'acknowledged': True}

api.py

@dataclass(frozen=True)
    class DenseFloatLong(Base):
        dims: int

        def to_dict(self):
            return {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "model": "lsh",
                    "dims": self.dims,
                    # "similarity": 12,
                    "bands": 100,
                    "rows": 1,
                    "width": 3
                }
            }

I used elasticsearch version 7.4.0

alexklibisz · 2020-05-19T03:58:53Z

You need to specify the similarity as l2.

    @dataclass(frozen=True)
    class DenseFloatLong(Base):
        dims: int

        def to_dict(self):
            return {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "model": "lsh",
                    "dims": self.dims,
                    "similarity": "l2",
                    "bands": 100,
                    "rows": 1,
                    "width": 3
                }
            }

The similarity field is required when using the lsh model.
It's a very subtle character difference. l2 is the lowercase of L2. 12 is the number twelve.

zhenzi0322 · 2020-05-19T04:03:01Z

You need to specify the similarity as l2.

    @dataclass(frozen=True)
    class DenseFloatLong(Base):
        dims: int

        def to_dict(self):
            return {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "model": "lsh",
                    "dims": self.dims,
                    "similarity": "l2",
                    "bands": 100,
                    "rows": 1,
                    "width": 3
                }
            }

The similarity field is required when using the lsh model.
It's a very subtle character difference. l2 is the lowercase of L2. 12 is the number twelve.

thank you. I made a mistake between the number 1 and the letter l.You can create it successfully

zhenzi0322 · 2020-05-19T07:11:47Z

You need to specify the similarity as l2.

    @dataclass(frozen=True)
    class DenseFloatLong(Base):
        dims: int

        def to_dict(self):
            return {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "model": "lsh",
                    "dims": self.dims,
                    "similarity": "l2",
                    "bands": 100,
                    "rows": 1,
                    "width": 3
                }
            }

The similarity field is required when using the lsh model.
It's a very subtle character difference. l2 is the lowercase of L2. 12 is the number twelve.

Why does the query result differ from what I expected.

The query results are as follows:

{'_index': 'test', '_type': '_doc', '_id': 'ids-1485205', '_score': 1000000.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485185', '_score': 1.2629013}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485238', '_score': 1.2498195}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485149', '_score': 1.2451644}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485198', '_score': 1.2327285}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485219', '_score': 1.2177316}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485212', '_score': 1.1902684}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485229', '_score': 1.1901888}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485152', '_score': 1.1610229}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1488300', '_score': 0.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485209', '_score': 0.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485208', '_score': 0.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1488289', '_score': 0.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485203', '_score': 0.0}
{'_index': 'test', '_type': '_doc', '_id': 'ids-1485202', '_score': 0.0}

How are score values scored?

All the pictures in elasticsearch have some similarities. For example, all my pictures have the words "children's day"

Here are my three pictures:

alexklibisz · 2020-05-19T13:26:15Z

I'm not sure what you are expecting. :)
You can read some more about the scoring method here: http://elastiknn.klibisz.com/api/#similarity-scoring

alexklibisz · 2020-05-19T23:37:46Z

@yu258 I added the missing mappings and queries in this PR #68
Docs are here: http://elastiknn.klibisz.com/python-client/

alexklibisz · 2020-05-19T23:42:24Z

One thing to consider when doing image search with L2 is that the floating point operations might overflow if your vector has large values. You might try to scale your vectors values so they are between 0 and 1.

zhenzi0322 · 2020-05-20T07:49:31Z

One thing to consider when doing image search with L2 is that the floating point operations might overflow if your vector has large values. You might try to scale your vectors values so they are between 0 and 1.

What's a good python library for generating image feature vectors?Currently, I use the image feature vector which is similar to [0.0,0.2...].In this format

alexklibisz · 2020-05-20T13:02:21Z

One thing to consider when doing image search with L2 is that the floating point operations might overflow if your vector has large values. You might try to scale your vectors values so they are between 0 and 1.

What's a good python library for generating image feature vectors?Currently, I use the image feature vector which is similar to [0.0,0.2...].In this format

I've always used the pretrained models from Keras: https://keras.io/api/applications/

alexklibisz · 2020-05-24T01:34:05Z

Closing this. Let me know if there are any other questions and we can open it again if needed.

alexklibisz mentioned this issue May 19, 2020

Expand and document python client #68

Merged

alexklibisz closed this as completed May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use image search #67

How to use image search #67

zhenzi0322 commented May 18, 2020 •

edited

alexklibisz commented May 18, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020 •

edited

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020 •

edited

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 20, 2020

alexklibisz commented May 20, 2020

alexklibisz commented May 24, 2020

How to use image search #67

How to use image search #67

Comments

zhenzi0322 commented May 18, 2020 • edited

alexklibisz commented May 18, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020 • edited

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 19, 2020

zhenzi0322 commented May 19, 2020 • edited

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

alexklibisz commented May 19, 2020

zhenzi0322 commented May 20, 2020

alexklibisz commented May 20, 2020

alexklibisz commented May 24, 2020

zhenzi0322 commented May 18, 2020 •

edited

zhenzi0322 commented May 19, 2020 •

edited

zhenzi0322 commented May 19, 2020 •

edited