# Tutorial: Using Cohere with the elastic open inference API

## TODO [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]()

This tutorial shows you how to compute embeddings with
Cohere using the elastic open inference API and store them for efficient vector or hybrid
search in Elasticsearch. This tutorial uses the Python Elasticsearch client
to perform the operations.

You'll learn how to:
* create an inference endpoint for text embedding using the Cohere service,
* create the necessary index mapping for the Elasticsearch index using semantic search,
* rerank with retrievers using Cohere's rerank model,
* design a RAG system with Cohere's Chat API.

The tutorial uses the [gqd/fictional-characters](https://huggingface.co/datasets/gqd/fictional-characters/blob/main/heros.json) data set.

Refer to [Cohere's tutorial](https://docs.cohere.com/docs/elasticsearch-and-cohere) for an example using a different data set.

## 🧰 Requirements

For this example, you will need:

- An Elastic deployment:
   - We'll be using [Elastic serverless](https://www.elastic.co/docs/current/serverless) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
   
- A paid [Cohere account](https://cohere.com/) is required to use the Open Inference API with 
the Cohere service as the Cohere free trial API usage is limited.

- Python 3.7+.

- the 8.15+ python elasticsearch client

## Install and import required packages

Install Elasticsearch and Cohere:

In [1]:
!pip install elasticsearch==8.15


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Import the required packages:

In [2]:
from elasticsearch import Elasticsearch, helpers
import json
import requests
from getpass import getpass

## Create an Elasticsearch client

Now you can instantiate the Python Elasticsearch client.

First provide your password and Cloud ID.
Then create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [18]:
ELASTIC_SERVERLESS_ENDPOINT = getpass("Elastic serverless endpoint: ")
ELASTIC_API_KEY = getpass("Elastic API key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"]
    hosts=[ELASTIC_SERVERLESS_ENDPOINT],
    api_key=ELASTIC_API_KEY,
    request_timeout=120,
    max_retries=10,
    retry_on_timeout=True,
)

# Confirm the client has connected
print(client.info())

{'name': 'serverless', 'cluster_name': 'b84f5a8ed15b4766b3f34666e7e06313', 'cluster_uuid': 'yecqXsgJTzuBufJH9q7zag', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'}


## Create the inference endpoints

Create the inference endpoint first. In this example, the inference endpoint 
uses Cohere's `embed-english-v3.0` model and the `embedding_type` is set to
`byte`.

In [11]:
COHERE_API_KEY = getpass("Cohere API key: ")

In [20]:
client.inference.delete(inference_id="cohere_embeddings", force=True)
client.inference.put(
    task_type="text_embedding",
    inference_id="cohere_embeddings",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "embed-english-v3.0",
            "embedding_type": "byte",
        },
    },
)

ObjectApiResponse({'model_id': 'cohere_embeddings', 'inference_id': 'cohere_embeddings', 'task_type': 'text_embedding', 'service': 'cohere', 'service_settings': {'similarity': 'dot_product', 'dimensions': 1024, 'model_id': 'embed-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}, 'embedding_type': 'byte'}, 'task_settings': {}})

You can find your API keys in your Cohere dashboard under the
[API keys section](https://dashboard.cohere.com/api-keys).

## Create the index mapping

Create the index mapping for the index that will contain the embeddings.

In [21]:
client.indices.create(
    index="characters-index",
    mappings={
        "properties": {
            "infer_field": {
                "type": "semantic_text",
                "inference_id": "cohere_embeddings",
            },
            "title": {"type": "text", "copy_to": "infer_field"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'characters-index'})

## Prepare data and insert documents

This example uses the [SciFact](https://huggingface.co/datasets/mteb/scifact) data
set that you can find on HuggingFace.

In [22]:
url = "https://huggingface.co/datasets/gqd/fictional-characters/resolve/main/heros.json"

# Fetch the JSONL data from the URL
response = requests.get(url)
response.raise_for_status()  # Ensure we notice bad responses

# Split the content by new lines and parse each line as JSON
data = json.loads(response.text)

wanted_fields = ["title", "origin"]
print(data[0])

# Prepare the documents to be indexed
documents = []
for i, line in enumerate(data):
    if i % 100 == 0:
        print(i, "of", len(data))
    data_dict = dict((k, line[k]) for k in wanted_fields if k in line)
    documents.append(
        {
            "_index": "characters-index",
            "_source": data_dict,
        }
    )

# Use the bulk endpoint to index
print(helpers.bulk(client, documents))

print("Data ingestion completed, text embeddings generated!")

{'title': '"Eat" Owner', 'parsed': {'image': 'Vlcsnap-2016-09-06-05h31m34s50.png', 'origin': "''Wander Over Yonder''", 'fullname': 'Unknown', 'alias': '"Eat" Owner', 'occupation': 'Restaurant owner', 'skills': 'Management skills', 'goals': 'Manage his restaurant.<br>\nHelp the rebels defeat Lord Dominator.<br>', 'hobby': '', 'family': 'Unknown', 'friends': 'Michelle', 'enemies': 'Lord Dominator', 'type of hero': 'Alien'}, 'origin': "''Wander Over Yonder''"}
0 of 8058
100 of 8058
200 of 8058
300 of 8058
400 of 8058
500 of 8058
600 of 8058
700 of 8058
800 of 8058
900 of 8058
1000 of 8058
1100 of 8058
1200 of 8058
1300 of 8058
1400 of 8058
1500 of 8058
1600 of 8058
1700 of 8058
1800 of 8058
1900 of 8058
2000 of 8058
2100 of 8058
2200 of 8058
2300 of 8058
2400 of 8058
2500 of 8058
2600 of 8058
2700 of 8058
2800 of 8058
2900 of 8058
3000 of 8058
3100 of 8058
3200 of 8058
3300 of 8058
3400 of 8058
3500 of 8058
3600 of 8058
3700 of 8058
3800 of 8058
3900 of 8058
4000 of 8058
4100 of 8058
4200

Your index is populated with the SciFact data and text embeddings for the text
field.

## Add the rerank inference endpoint

To combine the results more effectively, use 
[Cohere's Rerank v3](https://docs.cohere.com/docs/rerank-2) model through the
inference API to provide a more precise semantic reranking of the results.

Create an inference endpoint with your Cohere API key and the used model name as
the `model_id` (`rerank-english-v3.0` in this example).

In [42]:
client.inference.delete(inference_id="cohere_rerank")
client.inference.put(
    task_type="rerank",
    inference_id="cohere_rerank",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "rerank-english-v3.0",
        },
        "task_settings": {"top_n": 100, "return_documents": True},
    },
)

ObjectApiResponse({'model_id': 'cohere_rerank', 'inference_id': 'cohere_rerank', 'task_type': 'rerank', 'service': 'cohere', 'service_settings': {'model_id': 'rerank-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {'top_n': 100, 'return_documents': True}})

## Semantic search with reranking

Let's start querying the index!

In [82]:
def semantic_search_with_reranking(query):
    return client.search(
        index="characters-index",
        retriever={
            "text_similarity_reranker": {
                "retriever": {
                    "standard": {
                        "query": {"semantic": {"field": "infer_field", "query": query}}
                    }
                },
                "field": "title",
                "inference_id": "cohere_rerank",
                "inference_text": query,
                "rank_window_size": 100,
            }
        },
    )


def pretty_print_characters(docs):
    hits = docs["hits"]["hits"]
    for hit in hits:
        source = hit["_source"]
        print("Title:", source["title"] + " (Origin:", source["origin"] + ")")

In [83]:
sci_fi_characters = semantic_search_with_reranking("Characters from sci-fi movies")
pretty_print_characters(sci_fi_characters)
print()
fantasy_characters = semantic_search_with_reranking("Characters from fantasy movies")
pretty_print_characters(fantasy_characters)

Title: Sarah Connor (Terminator Genisys) (Origin: {{W|Terminator Genisys|Terminator Genisys)
Title: Malachi (Babylon 5) (Origin: ''Babylon 5'' episode ''The Coming of Shadows'')
Title: Michael (E.T.) (Origin: ''E.T. the Extra-Terrestrial'')
Title: Jack (Mass Effect) (Origin: ''{{w|Mass Effect 2)
Title: Jane (Babylon 5) (Origin: ''Babylon 5'')
Title: Jean-Luc Picard (Origin: ''Star Trek: The Next Generation'')
Title: Data (Star Trek) (Origin: ''Star Trek: The Next Generation'')
Title: Walter (Alien Covenant) (Origin: ''Alien: Covenant'')
Title: Adam Warlock (Marvel's Guardians of the Galaxy) (Origin: ''{{w|Marvel's Guardians of the Galaxy)
Title: Cooper (Jurassic Park) (Origin: ''Jurassic Park III'')

Title: Abraham Van Helsing (Hotel Transylvania) (Origin: ''Hotel Transylvania 3: Summer Vacation'')
Title: Rand al'Thor (Origin: ''The Wheel of Time'')
Title: Draco Malfoy (Origin: ''Harry Potter'' series)
Title: Arthur Pendragon (Shrek) (Origin: ''{{W|Shrek the Third)
Title: Robin Hood (P

## Retrieval Augmented Generation (RAG) with Cohere and Elasticsearch

RAG is a method for generating text using additional information fetched from an
external data source. With the ranked results, you can build a RAG system on
top of what you created with 
[Cohere's Chat API](https://docs.cohere.com/docs/chat-api).

Pass in the retrieved documents and the query to receive a grounded response
using Cohere's newest generative model 
[Command R+](https://docs.cohere.com/docs/command-r-plus).

Then pass in the query and the documents to the Chat API, and print out the
response.

In [96]:
client.inference.delete(inference_id="cohere_completion")
client.inference.put(
    task_type="completion",
    inference_id="cohere_completion",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "command-r-plus",
        },
    },
)

ObjectApiResponse({'model_id': 'cohere_completion', 'inference_id': 'cohere_completion', 'task_type': 'completion', 'service': 'cohere', 'service_settings': {'model_id': 'command-r-plus', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {}})

In [107]:
characters = []
for hit in sci_fi_characters["hits"]["hits"]:
    characters.append(hit["_source"]["title"])
for hit in fantasy_characters["hits"]["hits"]:
    characters.append(hit["_source"]["title"])


characters.sort()
print(characters)

query = "What generes of movies are these characters from?"
input = [
    query,
    "characters",
    characters,
]

['Abraham Van Helsing (Hotel Transylvania)', "Adam Warlock (Marvel's Guardians of the Galaxy)", 'Arthur Pendragon (Shrek)', 'Christopher Robin (Disney)', 'Cooper (Jurassic Park)', 'Data (Star Trek)', 'Draco Malfoy', 'Helga Hufflepuff', 'Jack (Mass Effect)', 'Jane (Babylon 5)', 'Jean-Luc Picard', 'Malachi (Babylon 5)', 'Michael (E.T.)', "Rand al'Thor", 'Remus Lupin', 'Robin Hood (Prince of Thieves)', 'Sarah Connor (Terminator Genisys)', 'Wade (Hotel Transylvania)', 'Walter (Alien Covenant)', 'Wilbur (Hotel Transylvania)']


In [108]:
response = client.inference.inference(
    inference_id="cohere_completion", input=str(input)
)

In [109]:
print(f"Query: {query}")
completion_response = response["completion"][0]["result"]
print(completion_response)

Query: What generes of movies are these characters from?
Here is a list of the movie genres that the characters in your list are from:

- Abraham Van Helsing, Wade, and Wilbur (Hotel Transylvania): Animated comedy, Fantasy, Horror
- Adam Warlock (Marvel's Guardians of the Galaxy): Sci-fi, Action, Superhero
- Arthur Pendragon (Shrek): Animated comedy, Fantasy
- Christopher Robin (Disney): Live-action adaptation of animated characters, Fantasy
- Cooper (Jurassic Park): Sci-fi, Adventure, Thriller
- Data and Jean-Luc Picard (Star Trek): Sci-fi, Adventure
- Draco Malfoy, Remus Lupin (Harry Potter series): Fantasy, Magical realism
- Helga Hufflepuff (Harry Potter series): Fantasy, Magical realism
- Jack (Mass Effect): Sci-fi, Action, Military fiction
- Jane and Malachi (Babylon 5): Sci-fi, Drama
- Michael (E.T.): Sci-fi, Fantasy, Family drama
- Rand al'Thor (The Wheel of Time): Fantasy
- Robin Hood (Prince of Thieves): Adventure, Historical fiction
- Sarah Connor (Terminator Genisys): Sci-f