# Upgrade index to use ELSER using Reindex API

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/re-indexing.ipynb)

Elasticsearch [Reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html) can be used when you want to move data from one index to another, update or change mapping of the index or even update data of your index. 

In this workbook we will see example on how to migrate your index to use ELSER model using [Reindex API with ingestion pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#reindex-with-an-ingest-pipeline). 

The two specific scenerios that we will see in this notebook are:

1. Migrating a index which doesn't have generated [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) field to  ELSER model `.elser_model_2` 
2. Upgrade an existing index with `.elser_model_1` to use `.elser_model_2` model
 


# 🧰 Requirements

For this example, you will need:

- An Elastic deployment with minimum **4GB machine learning node**
   - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](   https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))

   

# Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.

- Go to the [Create deployment](https://cloud.elastic.co/deployments/create) page
   - Under **Advanced settings**, go to **Machine Learning instances**
   - You'll need at least **4GB** RAM per zone for this tutorial
   - Select **Create deployment**

# Setup ELSER

ELSER is a trained model by Elastic that help with performing semantic search on your data and retrieve results based on the context. 

To use ELSER, you must have the [appropriate subscription]() level or the trial period activated.

Elasticsearch version < 8.11 supports `.elser_model_1` and from 8.11 Elastic supports `.elser_model_2` model which offers improved retrieval accuracy and faster indexing. 


# Install packages and connect with Elasticsearch Client

To get started, we will need to connect to our Elastic deployment using the Python client. As we are using Elastic Cloud deployment, we will use the **Cloud ID** to identify our deployment. To find your **Cloud ID**, go to https://cloud.elastic.co/deployments and select your deployment.

Next, we will install `elasticsearch` package using `pip`. 

In [27]:
!pip install elasticsearch -qU

Next, we will import all the modules that we need. 

In [28]:
from elasticsearch import Elasticsearch, helpers
from urllib.request import urlopen
import getpass
import json

Now we will instantiate the Python Elasticsearch client. For authorization,on prompt we will provide our `Cloud ID` and `password`, which would enable use us to create `Elasticsearch` instance


In [39]:
# Found in the 'Manage Deployment' page
CLOUD_ID = getpass.getpass('Enter Elastic Cloud ID:  ')

# Password for the 'elastic' user generated by Elasticsearch
ELASTIC_PASSWORD = getpass.getpass('Enter Elastic password:  ')

# Create the client instance
client = Elasticsearch(
    cloud_id=CLOUD_ID,
    basic_auth=("elastic", ELASTIC_PASSWORD)
)

#  Case 1: Migrate an index with no `text_expansion` field

In this example we will see how to upgrade an index which has a simple [ingestion pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) configured to use ELSER model `elser_model_2`. 

# Create Ingestion pipeline 

We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index. 

In [None]:

client.ingest.put_pipeline(
    id="ingest-pipeline-lowercase", 
    description="Ingest pipeline to change title to lowercase",
    processors=[
    {
      "lowercase": {
        "field": "title"
      }
    }
  ]
)

# Create index with mappings

Next, we will create a index `movies` with pipeline `ingest-pipeline-lowercase` that we created in previous step.

In [None]:
client.indices.create(
  index="movies",
  settings={
      "index": {
          "number_of_shards": 1,
          "number_of_replicas": 1,
          "default_pipeline": "ingest-pipeline-lowercase"
      }
  },
  mappings={
    "properties": {
      "plot": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
    }
  }
)

# Insert Documents
we are now ready to insert sample dataset of 12 movies to our index `movies`

In [None]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/movies.json"
response = urlopen(url)

# Load the response data into a JSON object
data_json = json.loads(response.read())

# Prepare the documents to be indexed
documents = []
for doc in data_json:
    documents.append({
        "_index": "movies",
        "_source": doc,
    })

# Use helpers.bulk to index
helpers.bulk(client, documents)

print("Done indexing documents into `movies` index!")

# Upgrade index `movies` to use ELSER model

**`Note:`** Before you begin upgrading index, make sure you are on 8.11 version in cloud. You can follow these [instructions](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-ELSER.html#trained-model) to download and deploy trained model in the Kibana UI or using the Dev Tools **Console**.  

we are ready to re-index  `movies` to a new index with the ELSER model `.elser_model_2`. As a first step, we have to create new ingestion pipeline and a index to use ELSER model. 

# Create a new pipeline with ELSER 
Lets create a new ingestion pipeline with ELSER model `.elser_model_2`. 

In [None]:
client.ingest.put_pipeline(
    id="elser-ingest-pipeline", 
    description="Ingest pipeline for ELSER",
    processors=[
    {
      "inference": {
        "model_id": ".elser_model_2",
        "target_field": "ml",
        "field_map": {
          "plot": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    }
  ]
)

# Create a index with mappings

Next, create an index with [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) query supporting ELSER model and [`rank_features`](https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-features.html) to work with our vectors. 



In [None]:
client.indices.create(
  index="elser-movies",
  mappings={
    "properties": {
      "plot": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "ml.tokens": {
        "type": "rank_features"
      },
    }
  }
)

# Reindex with updated pipeline 

With the help of [Reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html), we can copy data from old index `movies` and to new index `elser-movies` with  ingestion pipeline set to `elser-ingest-pipeline` .  On success, the index `elser-movies` creates tokens on the `text_expansion` terms that you targeted for ELSER inference.

Copy and paste following command in your dev console. 

```
POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "movies"
  },
  "dest": {
    "index": "elser-movies",
    "pipeline":  "elser-ingest-pipeline"
  }
}


```

When you execute the above command with `wait_for_completion=false` you will see the response :

```
{
  "task": "pT5PY9_sQvyqDtaIBiHbYg:218353"
}

```

Use the above task id to check the status of the your reindex API in [Task management API](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html)  in dev console

```

GET _tasks/pT5PY9_sQvyqDtaIBiHbYg:218353

```

For more information on troubleshooting Reindex API checkout this [blog](https://www.elastic.co/blog/3-best-practices-for-using-and-troubleshooting-the-reindex-api) post

Once reindex is complete, inspect a document in the index `elser-movies` and notice that the document now has a additional field `"ml": {"tokens":...}` with terms that we will be using in to search in our `text_expansion` query. 

Also note, you can now delete the old index `movies` if you don't need them anymore. 

# Querying documents with ELSER 

Let's try a semantic search on our index with ELSER model `.elser_model_2`

In [38]:
response = client.search(
    index='elser-movies', 
    size=3,
    query={
        "text_expansion": {
            "ml.tokens": {
                "model_id":".elser_model_2",
                "model_text":"child toy"
            }
        }
    }
)

for hit in response['hits']['hits']:
    doc_id = hit['_id']
    score = hit['_score']
    title = hit['_source']['title']
    plot = hit['_source']['plot']
    print(f"Score: {score}\nTitle: {title}\nPlot: {plot}\n")

Score: 3.3168378
Title: fight club
Plot: An insomniac office worker and a devil-may-care soapmaker form an underground fight club that evolves into something much, much more.

Score: 1.5777297
Title: the godfather
Plot: An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.

Score: 1.1162646
Title: the matrix
Plot: A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers.



# Case 2: Upgrade ELSER model

If you already have a index with ELSER model `.elser_model_1` and would like to upgrade to `.elser_model_2`, you can use the Reindexing API with ingestion pipeline to use ELSER `.elser_model_2` model.

**`Note:`** Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model `.elser_model_2` is deployed. You can follow these [instructions](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-ELSER.html#trained-model) to download and deploy trained model in the Kibana UI or using the Dev Tools **Console**. 


# Create a new ingestion pipeline

We will create a pipeline with `.elser_model_2` to enable us with reindexing. 

In [None]:
client.ingest.put_pipeline(
    id="elser-pipeline-upgrade-demo", 
    description="Ingest pipeline for ELSER upgrade demo",
    processors=[
    {
      "inference": {
        "model_id": ".elser_model_2",
        "target_field": "ml",
        "field_map": {
          "plot": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    }
  ]
)

# Create a new index with mappings
We will create  a new index with required mappings supporting ELSER

In [None]:
client.indices.create(
  index="elser-upgrade-index-demo",
  mappings={
    "properties": {
      "plot": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "ml.tokens": {
        "type": "rank_features"
      },
    }
  }
)

# Use Reindex API
we will use [Reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html) to move data from old index to new index `elser-upgrade-index-demo`. Copy and run below code in your **dev console**

Note that We are excluding target field `ml` from old index.

```
POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "my-old-elser-model-index",
    "_source": {
      "excludes": ["ml"]
    }
  },
  "dest": {
    "index": "elser-upgrade-index-demo",
    "pipeline":  "elser-pipeline-upgrade-demo"
  }
}


```
Once your reindexing is complete, you are ready to query on your data and perform semantic search 

# Querying your data

In [56]:
response = client.search(
    index='elser-upgrade-index-demo', 
    size=3,
    query={
        "text_expansion": {
            "ml.tokens": {
                "model_id":".elser_model_2",
                "model_text":"child toy"
            }
        }
    }
)

for hit in response['hits']['hits']:
    doc_id = hit['_id']
    score = hit['_score']
    title = hit['_source']['title']
    plot = hit['_source']['plot']
    print(f"Score: {score}\nTitle: {title}\nPlot: {plot}\n")


Score: 3.3168378
Title: Fight Club
Plot: An insomniac office worker and a devil-may-care soapmaker form an underground fight club that evolves into something much, much more.

Score: 1.5777297
Title: The Godfather
Plot: An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.

Score: 1.1162646
Title: The Matrix
Plot: A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers.

