## The ELSER Model

In the previous chapter you have seen how to expand an Elasticsearch index with a dense_vector field that is populated with embeddings generated by a Machine Learning model. The model was installed locally on your computer, and the embeddings were generated from the Python code and added to the documents before they were inserted into the index.

In this chapter you are going to learn about another vector type, the sparse_vector, which is designed to store inferences from the Elastic Learned Sparse EncodeR model (ELSER). Embeddings returned by this model are a collection of tags (more appropriately called features), each with an assigned weight.

In this chapter you will also use a different method for working with Machine Learning models, in which the Elasticsearch service itself runs the model and adds the resulting embeddings to the index through a pipeline.




## The sparse_vector Field

Like the dense_vector field type you used in the previous chapter, the sparse_vector type can store inferences returned by Machine Learning models. While dense vectors hold a fixed-length array of numbers that describe the source text, a sparse vector stores a mapping of features to weights.

Let's add a sparse_vector field to the index. This is a type that needs to be defined explicitly in the index mapping. Below you can see an updated version of the create_index() method with a new field called elser_embedding with this type.

```python
class Search:
    # ...

    def create_index(self):
        self.es.indices.delete(index='my_documents', ignore_unavailable=True)
        self.es.indices.create(index='my_documents', mappings={
            'properties': {
                'embedding': {
                    'type': 'dense_vector',
                },
                'elser_embedding': {
                    'type': 'sparse_vector',
                },
            }
        })
    
    # ...

```



## Deploying the ELSER Model

As mentioned above, in this example Elasticsearch will take ownership of the model and automatically execute it to generate embeddings, both when inserting documents and when searching.

The Elasticsearch client exposes a set of API endpoints to manage Machine Learning models and their pipelines. The following deploy_elser() method in search.py follows a few steps to download and install the ELSER v2 model, and to create a pipeline that uses it to populate the elser_embedding field defined above.

```python
class Search:
    # ...

    def deploy_elser(self):
        # download ELSER v2
        self.es.ml.put_trained_model(model_id='.elser_model_2',
                                     input={'field_names': ['text_field']})
        
        # wait until ready
        while True:
            status = self.es.ml.get_trained_models(model_id='.elser_model_2',
                                                   include='definition_status')
            if status['trained_model_configs'][0]['fully_defined']:
                # model is ready
                break
            time.sleep(1)

        # deploy the model
        self.es.ml.start_trained_model_deployment(model_id='.elser_model_2')

        # define a pipeline
        self.es.ingest.put_pipeline(
            id='elser-ingest-pipeline',
            processors=[
                {
                    'inference': {
                        'model_id': '.elser_model_2',
                        'input_output': [
                            {
                                'input_field': 'summary',
                                'output_field': 'elser_embedding',
                            }
                        ]
                    }
                }
            ]
        )

```

Configuring ELSER for us requires a several steps. First, the ml.put_trained_model() method of the Elasticsearch is used to download ELSER. The model_id argument identifies the model and version to download (ELSER v2 is available for Elasticsearch 8.11 and up). The input field is the configuration required by this model.

Once the model is downloaded it needs to be deployed. For this, the ml.start_trained_model_deployment() method is used, just with the identifier of the model to deploy. Note that this is an asynchronous operation, so the model is going to be available for use after a short amount of time.

The final step to configure the use of ELSER is to define a pipeline for it. A pipeline is used to tell Elasticsearch how the model has to be used. A pipeline is given an identifier and one or more processing tasks to perform. The pipeline created above is called elser-ingest-pipeline and has a single inference task, which means that each time a document is added, the model is going to run with on the input_field, and the output will be added to the document on the output_field. For this example the summary field is used to generate the embeddings, as with the dense vector embeddings in the previous chapter. The resulting embeddings are going to be written to the elser_embedding sparse vector field created in the previous section.

To make it easy to invoke this method, add a deploy-elser command to the Flask application in app.py:

```python
@app.cli.command()
def deploy_elser():
    """Deploy the ELSER v2 model to Elasticsearch."""
    try:
        es.deploy_elser()
    except Exception as exc:
        print(f'Error: {exc}')
    else:
        print(f'ELSER model deployed.')
```


You can now deploy ELSER on your Elasticsearch service with the following command:


The last configuration task involves linking the index with the pipeline, so that the model is automatically executed when documents are inserted on this index. This is done on the index configuration with a settings option. Here is one more update to the create_index() method to create this link:


```python
class Search:
    # ...

    def create_index(self):
        self.es.indices.delete(index='my_documents', ignore_unavailable=True)
        self.es.indices.create(
            index='my_documents',
            mappings={
                'properties': {
                    'embedding': {
                        'type': 'dense_vector',
                    },
                    'elser_embedding': {
                        'type': 'sparse_vector',
                    },
                }
            },
            settings={
                'index': {
                    'default_pipeline': 'elser-ingest-pipeline'
                }
            }
        )

```

With this change, you can now regenerate the index with full support for ELSER inferences:




## Semantic Queries

With the index now equipped with ELSER embeddings, the handle_search() function in app.py can be changed to search these embeddings. For now, you'll see how to search only through ELSER, later the previous search methods will be incorporated back to create a combined solution.

To use ELSER inferences when searching, the text_expansion query type is used. Below you can see an updated handle_search() function with this query:


```python
@app.post('/')
def handle_search():
    query = request.form.get('query', '')
    filters, parsed_query = extract_filters(query)
    from_ = request.form.get('from_', type=int, default=0)

    results = es.search(
        query={
            'text_expansion': {
                'elser_embedding': {
                    'model_id': '.elser_model_2',
                    'model_text': parsed_query,
                }
            },
        },
        size=5,
        from_=from_,
    )
    return render_template('index.html', results=results['hits']['hits'],
                           query=query, from_=from_,
                           total=results['hits']['total']['value'])

```

The text_expansion query receives a key with the name of the field to be searched. Under this key, model_id configures which model to use in the search, and model_text defines what to search for. Note how in this case there is no need to generate an embedding for the search text, as Elasticsearch manages the model and can take care of that.

In the above version of handle_search() the filters have been left unused, and the aggregations have been omitted. These can be added back in the same way they were incorporated into the full-text search solution. Below is an updated handle_search() function that moves the text_expansion query inside a bool.must section, with filters included in bool.filter and aggregations added as before.

```python
@app.post('/')
def handle_search():
    query = request.form.get('query', '')
    filters, parsed_query = extract_filters(query)
    from_ = request.form.get('from_', type=int, default=0)

    results = es.search(
        query={
            'bool': {
                'must': [
                    {
                        'text_expansion': {
                            'elser_embedding': {
                                'model_id': '.elser_model_2',
                                'model_text': parsed_query,
                            }
                        },
                    }
                ],
                **filters,
            }
        },
        aggs={
            'category-agg': {
                'terms': {
                    'field': 'category.keyword',
                }
            },
            'year-agg': {
                'date_histogram': {
                    'field': 'updated_at',
                    'calendar_interval': 'year',
                    'format': 'yyyy',
                },
            },
        },
        size=5,
        from_=from_,
    )
    aggs = {
        'Category': {
            bucket['key']: bucket['doc_count']
            for bucket in results['aggregations']['category-agg']['buckets']
        },
        'Year': {
            bucket['key_as_string']: bucket['doc_count']
            for bucket in results['aggregations']['year-agg']['buckets']
            if bucket['doc_count'] > 0
        },
    }
    return render_template('index.html', results=results['hits']['hits'],
                           query=query, from_=from_,
                           total=results['hits']['total']['value'], aggs=aggs)

```

Spend some time experimenting with different searches. You will notice that as with dense vector embeddings, searches driven by the ELSER model work better than full-text search when the exact words do not appear in the indexed documents.



## Q) I want to know the difference between vector search and semantic search. In my opinion, both methods are used to search for documents based on similarity.

Answer: 
- In practice, both methods aim for similarity-based search, but they approach it differently. Vector search is more about overall similarity in a continuous space, while semantic search (especially with ELSER) is about matching on specific, contextually important concepts.


## Hybrid Search: Combined Full-Text and ELSER Results

As with vector search in the previous section, in this section you will learn how to combine the best search results from full-text and semantic queries using the Reciprocal Rank Fusion algorithm.




## Introduction to Sub-Searches

The solution to implementing a hybrid full-text and dense vector search was to send a search request that included the query, knn arguments to request the two searches, and the rrf argument to combine them into a single results list.

The complication that is presented when trying to do the same to combine full-text and sparse vector search requests is that both use the query argument. To be able to provide the two queries that need to be combined with the RRF algorithm, it is necessary to include two query arguments, and the solution to do this is to do it with Sub-Searches.

Sub-searches is a feature that is currently in technical preview. For this reason the Python Elasticsearch client does not natively support it. To work around this limitation, the search() method of the Search class can be changed to send the search request using the body argument. Below you can see a new, yet similar implementation that uses the body argument of the client to send a search request:

```python
class Search:
    # ...

    def search(self, **query_args):
        # sub_searches is not currently supported in the client, so we send
        # search requests using the body argument
        if 'from_' in query_args:
            query_args['from'] = query_args['from_']
            del query_args['from_']
        return self.es.search(
            index='my_documents',
            body=json.dumps(query_args),
        )

```

This implementation does not require any changes to the application, as it is functionally equivalent. The only difference is that the search() method validates all arguments before sending the request, with body being the only exception. The server always validates requests regardless of how the client sends them.

With this version, the sub_searches argument can be used in Search.search() to send multiple search queries as follows:


```python
results = es.search(
    sub_searches=[
        {
            'query': { ... },  # full-text search
        },
        {
            'query': { ... },  # semantic search
        },
    ],
    rank={
        'rrf': {},  # combine sub-search results
    },
    aggs={ ... },
    size=5,
    from_=from_,
)

```

## Hybrid Search Implementation
To complete this section, let's bring back the full-text logic and combine it with the semantic search query presented earlier in this chapter.

Below you can see the updated handle_search() endpoint:

```python
@app.post('/')
def handle_search():
    query = request.form.get('query', '')
    filters, parsed_query = extract_filters(query)
    from_ = request.form.get('from_', type=int, default=0)

    if parsed_query:
        search_query = {
            'sub_searches': [
                {
                    'query': {
                        'bool': {
                            'must': {
                                'multi_match': {
                                    'query': parsed_query,
                                    'fields': ['name', 'summary', 'content'],
                                }
                            },
                            **filters
                        }
                    }
                },
                {
                    'query': {
                        'bool': {
                            'must': [
                                {
                                    'text_expansion': {
                                        'elser_embedding': {
                                            'model_id': '.elser_model_2',
                                            'model_text': parsed_query,
                                        }
                                    },
                                }
                            ],
                            **filters,
                        }
                    },
                },
            ],
            'rank': {
                'rrf': {}
            },
        }
    else:
        search_query = {
            'query': {
                'bool': {
                    'must': {
                        'match_all': {}
                    },
                    **filters
                }
            }
        }

    results = es.search(
        **search_query,
        aggs={
            'category-agg': {
                'terms': {
                    'field': 'category.keyword',
                }
            },
            'year-agg': {
                'date_histogram': {
                    'field': 'updated_at',
                    'calendar_interval': 'year',
                    'format': 'yyyy',
                },
            },
        },
        size=5,
        from_=from_,
    )
    aggs = {
        'Category': {
            bucket['key']: bucket['doc_count']
            for bucket in results['aggregations']['category-agg']['buckets']
        },
        'Year': {
            bucket['key_as_string']: bucket['doc_count']
            for bucket in results['aggregations']['year-agg']['buckets']
            if bucket['doc_count'] > 0
        },
    }
    return render_template('index.html', results=results['hits']['hits'],
                           query=query, from_=from_,
                           total=results['hits']['total']['value'], aggs=aggs)

```

As you recall, the extract_filters() function looked for category filters entered by the user on the search prompt, and returned the left over portion as parsed_query. If parsed_query is empty, it means that the user only enter a category filter, and in that case the query should be a simple match_all with the selected category as a filter. This is implemented in the else portion of the big conditional.

When there is a search query, the sub_searches option is used as shown in the previous section to include the multi_match and text_expansion queries, with the rank option requesting that the results from the two sub-searches are combined into a single list of ranked results. To complete the query, the size and from_ argument are provided to maintain the support for pagination.

Click here to review this version of the application.

