# Cohere's Multilingual Models on AWS

In today's globalized world, the ability to understand and process multiple languages is becoming increasingly important. [Cohere](https://cohere.com/) has developed a suite of multilingual models designed to tackle this challenge. In this notebook, we'll explore three of Cohere's multilingual models and their potential applications.

## Why Multilingual Models Matter

Language is a fundamental aspect of human communication and cultural expression. As businesses and organizations expand their reach across borders, the need for effective multilingual solutions becomes paramount. Multilingual models enable seamless communication, bridging language barriers and facilitating cross-cultural understanding.

[Cohere's multilingual models](https://aws.amazon.com/marketplace/seller-profile?id=87af0c85-6cf9-4ed8-bee0-b40ce65167e0) offer several benefits:

1. **Broad Language Coverage**: Cohere's models support a wide range of languages, allowing you to process and generate content in multiple languages simultaneously.

2. **Improved Accuracy**: By training on diverse language data, these models can better capture nuances, idioms, and cultural contexts, resulting in more accurate translations and language processing.

3. **Scalability**: With a single multilingual model, you can address language needs across multiple regions, reducing the need for maintaining separate models for each language.

4. **Cost-Efficiency**: Deploying a single multilingual model can be more cost-effective than maintaining multiple monolingual models, especially for organizations with global operations.

## The Models

In this notebook, we'll be working with three of Cohere's multilingual models:

1. **Cohere Command R+**: A powerful Large Language Model (LLM) capable of understanding and generating text in multiple languages.

2. **Cohere Embed Multilingual V3**: An embedding model designed to encode text from various languages into dense vector representations, enabling efficient similarity comparisons and semantic search.

3. **Cohere Rerank Multilingual V3**: A reranker model used in conjunction with the Retrieval-Augmented Generation (RAG) approach, which enhances the relevance and accuracy of generated text by leveraging retrieved information from a knowledge base.

Throughout this notebook, we'll explore practical examples and use cases for these models, showcasing their capabilities in areas such as multilingual content generation, and information retrieval.

## Pre-requisites:

1. Use kernel either `conda_python3`, `conda_pytorch_p310` or `conda_tensorflow2_p310`.
2. Install the required packages.
3. Ensure that IAM role used has **AmazonSageMakerFullAccess**
4. To deploy a reranker model from Cohere, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to [cohere-rerank-multilingual](https://aws.amazon.com/marketplace/pp/prodview-ydysc72qticsw).

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Delete the endpoint](#H.-Delete-the-endpoint)
3. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))

## Step 1: Install Dependencies

Here, we will install all the required dependencies to run this notebook.

#### Now lets import the required modules to run the notbook

In [None]:
!pip install wikipedia==1.4.0 --quiet
!pip install cohere-aws==0.8.16 --quiet
!pip install faiss-cpu==1.8.0 --quiet
!pip install langchain-text-splitters==0.2.2 --quiet
!pip install numpy==1.26.4 --quiet

In [1]:
import boto3
from cohere_aws import Client
import faiss
import json
from langchain_text_splitters import RecursiveCharacterTextSplitter
import numpy as np
import re
import wikipedia

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


## Step 2 - Getting some data

To get started, we'll leverage data from wikipedia to answer questions about the diverse culinary traditions across different countries and languages. Our goal is to showcase Cohere's multilingual capabilities by crawling wikipedia to get cuisine information of the United States, Mexico, and Germany from their respective language editions of Wikipedia. 

By retrieving and analyzing data from these multilingual sources, we can gain insights into the unique flavors, ingredients, and cultural influences that shape the food heritage of each nation. This approach not only allows us to explore the richness of global cuisine but also demonstrates the power of multilingual language models in accessing and processing information from diverse linguistic sources.

### 1. USA - English

In [2]:
# English
wikipedia.set_lang('en')

In [3]:
# we'll get some wikipedia data
article = wikipedia.page('American Cuisine')
en_text = article.content
print(f"The text has roughly {len(en_text.split())} words.")

The text has roughly 19051 words.


In [4]:
print(en_text[0:1000])

American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, Texan, Tex-Mex, New Mexican, and Tlingit, and the cuisines of immigrant groups such as Chinese American, Italian American, Jewish American, Greek American and Mexican American. The large size of America and its long history of immigration have created an especially diverse cuisine that varies by region. 
American cooking dates back to the traditions of the Native Americans, whose diet included a mix of farmed and hunted food, and varied widely across the continent. The Colonial period created a mix of new world and Old World cookery, and brought with i

### 2. Mexico - Spanish

In [5]:
# Spanish
wikipedia.set_lang('es')

In [6]:
# we'll get some wikipedia data
article = wikipedia.page('Gastronomia de Mexico')
es_text = article.content
print(f"The text has roughly {len(es_text.split())} words.")

The text has roughly 15592 words.


In [7]:
print(es_text[0:1000])

La gastronomía mexicana es el conjunto de platillos y técnicas culinarias de México que forman parte de las tradiciones y vida común de sus habitantes, enriquecida por las aportaciones de las distintas regiones del país, que deriva de la experiencia del México prehispánico con la cocina española, entre otras. El 16 de noviembre de 2010, la gastronomía mexicana fue reconocida como Patrimonio cultural inmaterial de la Humanidad por la Unesco.[1]​[2]​
La cocina mexicana ha sido influida y ha influido a su vez a cocinas de otras culturas, como la estadounidense, española, francesa, italiana, africana, del Oriente Medio y asiática. Es testimonio de la cultura histórica del país: muchos platillos se originaron en el México prehispánico y otros momentos importantes de su historia. Existe en ella una amplia gama de sabores, colores, olores, texturas e influencias que la convierten en un gran atractivo para nacionales y extranjeros: México es famoso por su gastronomía.
La base de la cocina mexi

### 3. Germany - German

In [8]:
# German
wikipedia.set_lang('de') # Deutsch

In [9]:
# we'll get some wikipedia data
article = wikipedia.page('Deutsches Essen')
de_text = article.content
print(f"The text has roughly {len(de_text.split())} words.")

The text has roughly 3067 words.


In [10]:
print(de_text[0:1000])

Als deutsche Küche bezeichnet man allgemein die Küche Deutschlands. Sie ist gekennzeichnet durch starke regionale Unterschiede und an den Außengrenzen historisch bedingt geprägt durch die Einflüsse der Küche der Nachbarländer (z. B. Frankreich, Österreich, Polen). Starke Regionalküchen gibt es auf den Gebieten ehemals eigenständiger deutscher Staaten (Bayern, Baden) und Großregionen (Franken, Schwaben, Rheinland).

Typisch ist eine sättigende Mahlzeit am Morgen, das Frühstück. Traditionell war die Hauptmahlzeit des Tages das Mittagessen, das etwa zwischen 12 und 14 Uhr eingenommen wird. Am Nachmittag wird häufig eine Zwischenmahlzeit verzehrt, die beispielsweise aus Kaffee und Gebäck bestehen kann. Das Abendessen war früher eine meist kleinere Mahlzeit, die oftmals nur aus ein paar belegten Brotscheiben bestand. Heute ist das Abendessen oft die Hauptmahlzeit des Tages.

Der Fleischkonsum liegt in Deutschland bei etwa 60 kg im Jahr pro Person; 59 % entfallen dabei auf Schweinefleisch, d

## Step 3 - Vector Indexing

We index the document in an open-source vectorstore called FAISS. This requires chunking the documents, creating embeddings, and indexing them into FAISS.

To efficiently store and retrieve relevant information from our multilingual data, we will leverage [FAISS](https://ai.meta.com/tools/faiss/) (Facebook AI Similarity Search), an open-source library for efficient similarity search and clustering of dense vectors. FAISS allows us to index and search large collections of embeddings, enabling quick retrieval of the most relevant documents or passages based on their vector representations. 

Our process involves chunking the retrieved documents into smaller, more manageable segments, creating dense vector embeddings for each chunk using Cohere's Embed Multilingual V3, and indexing these embeddings into FAISS. 

By utilizing FAISS's powerful similarity search capabilities, we can quickly identify the most relevant chunks of information based on their semantic similarity to a given query or context. This approach not only facilitates efficient information retrieval but also unlocks the potential for advanced applications such as semantic search, question answering, and knowledge base construction from multilingual sources. FAISS's scalability and performance make it an ideal choice for handling large volumes of embeddings, ensuring responsive and accurate results even with extensive multilingual data.

In [11]:
# For chunking let's use langchain to help us split the text
def get_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=512, # Limiting chunk size for embedding model
        chunk_overlap=50,
        length_function=len,
        is_separator_regex=False,
    )

    # Split the text into chunks with some overlap
    chunks_ = text_splitter.create_documents([text])
    chunks = [c.page_content for c in chunks_]
    
    return chunks

### 1. USA - English

In [12]:
en_chunks = get_chunks(en_text)
print(f"The text has been broken down in {len(en_chunks)} chunks.")

The text has been broken down in 362 chunks.


### 2. Mexico - Spanish

In [13]:
es_chunks = get_chunks(es_text)
print(f"The text has been broken down in {len(es_chunks)} chunks.")

The text has been broken down in 289 chunks.


### 3. Germany - German

In [14]:
de_chunks = get_chunks(de_text)
print(f"The text has been broken down in {len(de_chunks)} chunks.")

The text has been broken down in 71 chunks.


### Create embeddings for every text chunk

In [15]:
def generate_embeddings(
    bedrock_client,
    model_id,
    texts,
    batch_size=50
):
    """
    Convert text into embeddings.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        texts (list) : The texts to send to the embedding model.
        batch_size (int): Batch size to limit the number of chunks sent to the embedding model.

    Returns:
        response (JSON): The embeddings that the model generated.
    """
    
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        body = {
            "texts": batch_texts,
            "input_type": "search_document"
        }
        # Send the message.
        response = bedrock_client.invoke_model(modelId=model_id, body=json.dumps(body))

        response_body = json.loads(response.get('body').read())
        
        embeddings.extend(response_body["embeddings"])

    return embeddings

In [16]:
# Define bedrock client
bedrock_client = boto3.client(service_name='bedrock-runtime', region_name="us-east-1")

### 1. USA - English

In [17]:
en_embeddings = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", en_chunks)
print(f"We just computed {len(en_embeddings)} embeddings.")

We just computed 362 embeddings.


### 2. Mexico - Spanish

In [18]:
es_embeddings = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", es_chunks)
print(f"We just computed {len(es_embeddings)} embeddings.")

We just computed 289 embeddings.


### 3. Germany - German

In [19]:
de_embeddings = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", de_chunks)
print(f"We just computed {len(de_embeddings)} embeddings.")

We just computed 71 embeddings.


### Store embeddings into FAISS

### 1. USA - English

In [20]:
# Create an index
en_vectorstore = faiss.IndexFlatL2(1024)

# Add data to the index
en_vectorstore.add(np.array(en_embeddings))

### 2. Mexico - Spanish

In [21]:
# Create an index
es_vectorstore = faiss.IndexFlatL2(1024)

# Add data to the index
es_vectorstore.add(np.array(es_embeddings))

### 3. Germany - German

In [22]:
# Create an index
de_vectorstore = faiss.IndexFlatL2(1024)

# Add data to the index
de_vectorstore.add(np.array(de_embeddings))

## Step 4: RAG: Retrieve relevant chunks from the vector database

Retrieval-Augmented Generation (RAG) is a powerful approach that combines the strengths of retrieval and generation models for natural language processing tasks. In this approach, we first utilize an embedding model to identify relevant information from FAISS based on the input query. The retrieved information is then incorporated into a large language model, like Cohere's Command R+, which uses this additional context to generate more informed and accurate outputs. By leveraging the vast knowledge contained in multilingual sources, RAG enables us to produce high-quality, factual responses that go beyond the model's initial training data. This approach is particularly valuable for tasks like question answering, where relevant external knowledge can significantly improve the quality and completeness of the generated responses.

### Define utility function for RAG

In [23]:
def rag(vectorstore, question, chunks, k=4):
    # Embed the user question
    query = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", [question])
    
    # Retrieve the top K indices from the vector database
    D, top_indices = vectorstore.search(np.array(query), k)
    
    top_indices.sort()
    
    # Retrieve the top K most similar chunks
    top_chunks_after_retrieval = [re.sub(r"\t+|\n+", "", chunks[i]) for i in top_indices[0]]
    
    return top_chunks_after_retrieval

### Define utility function for conversation with Bedrock converse API

In [24]:
def generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history=[],
    temperature=0.5,
    max_tokens=400,
    top_p=0.95
):
    """
    Sends messages to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_prompt (str) : The system prompt for the model to use.
        prompt (str) : The message/question to send to the model.
        chat_history (list): The chat history from user and assistant.

    Returns:
        response (str): The text generated output from the model.
        chat_history (str): The full conversation between user and assistant that the model generated.

    """

    system_prompts = [
        {
            "text": system_prompt
        }
    ]

    messages = [
        {
            "role": "user",
            "content": [{"text": prompt}]
        }
    ]

    chat_history.extend(messages)

    # Base inference parameters.
    inference_config = {
        "temperature": temperature,
        "maxTokens": max_tokens,
        "topP": top_p,
    }

    # Additional inference parameters to use.
    additional_model_fields = {}

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    chat_history.append(response["output"]["message"])

    return response["output"]["message"]["content"][0]["text"], chat_history

### Define the system prompt and guardrails

In [25]:
system_prompt = """You are an AI assistant with expertise in American cuisine. Your knowledge is based solely on the information provided between the <documents> and </documents> tags.

Before answering any questions, first check if the user has provided information between the <documents> and </documents> tags. If no information is provided, respond with the following JSON:

{
    "answer": "I do not have enough information to answer that question."
}

If documents are provided, your task is to answer questions accurately and concisely, using only the details from the given documents. Do not use your own knowledge or any external sources to answer the questions, even if you know the answer.

If a question cannot be fully answered using the provided documents, respond with the following JSON:

{
    "answer": "I do not have enough information to answer that question."
}

All responses must be in valid JSON format, with the 'answer' key containing the actual response text.

To provide transparency, include your reasoning process with the 'thinking' key as the following format:

{
    "answer": "Your response here",
    "thinking": "Your reasoning process here"
}

Be concise and objective in your responses, without any personal opinions or subjective statements.
"""
prompt_template = "<documents>\n{documents}\n</documents>\n\nQuestion: {question}\nThink step-by-step."

### Define model ID parameter

In [26]:
model_id = "cohere.command-r-plus-v1:0"

### Test guardrails

In [27]:
chat_history = []
question = "What are some popular regional dishes in American cuisine?"

In [28]:
prompt = prompt_template.format(documents="", question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

{
    "answer": "I do not have enough information to answer that question."
}


### Retrieved top K most relevant chunks from FAISS

In [29]:
# Setting number of nearest neighbours
k = 4

### 1. USA - English

In [30]:
top_chunks_after_retrieval = rag(en_vectorstore, question, en_chunks, k)

print(f"Here are the top {k} chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("== " + t)

Here are the top 4 chunks after retrieval: 
== American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, Texan, Tex-Mex, New Mexican, and Tlingit, and the cuisines of
== Highlights of American cuisine include milkshakes, barbecue, and a wide range of fried foods. Many quintessential American dishes are unique takes on food originally from other culinary traditions, including pizza, hot dogs, and Tex-Mex. Regional highlights include a range of fish dishes in the coastal states, gumbo, and cheesesteak. American cuisine has specific foods that are eaten on holidays, such as a turkey at Thanksgiving dinner or Chr

In [31]:
chat_history = []
prompt = prompt_template.format(documents=top_chunks_after_retrieval, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

```json
{
    "answer": "Some popular regional dishes in American cuisine include gumbo, cheesesteak, and various fish dishes from the coastal states. Barbecue and fried foods are also highlights, and each region may have its own unique variations.",
    "thinking": "The text mentions gumbo and cheesesteak as regional highlights, as well as fish dishes from coastal states. It also mentions barbecue and fried foods as popular American cuisine, which likely includes regional variations."
}
```


### 2. Mexico - Spanish

In [32]:
chat_history = []
question = "¿Cuáles son algunos platos regionales populares en la cocina mexicana?"

In [33]:
top_chunks_after_retrieval = rag(es_vectorstore, question, es_chunks, k)

print(f"Here are the top {k} chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("== " + t)

Here are the top 4 chunks after retrieval: 
== nacionales y extranjeros: México es famoso por su gastronomía.
== La diversidad es la característica esencial de la cocina mexicana, y es la comida regional uno de sus aspectos fundamentales. Cada estado mexicano y región poseen sus propias recetas y tradiciones culinarias.[4]​ Ejemplos de comidas regionales son platillos como el caldillo duranguense (Durango), cochinita pibil (yucateca), el mole oaxaqueño, el mole poblano y el chile en nogada (Puebla), los múltiples tipos de pozole, el cabrito (coahuilense y neoleonense), el pan de cazón campechano, el churipo y las
== país y otras zonas tropicales, se da una amplia diversidad de sabores con una cantidad hasta ahora desconocida de platillos y recetarios locales.
== == Variantes regionales ==Si bien existen platos nacionales (como el mole, los chiles en nogada o la cochinita pibil), muchos autores han definido la gastronomía mexicana más como un conjunto de cocinas regionales, que no es má

In [34]:
prompt = prompt_template.format(documents=top_chunks_after_retrieval, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

{
    "answer": "Algunos platos regionales populares en México incluyen el caldillo duranguense, la cochinita pibil, el mole oaxaqueño y poblano, el chile en nogada, el pozole, el cabrito, el pan de cazón y el churipo.",
    "thinking": "La pregunta se refiere específicamente a los platos regionales mexicanos, por lo que me concentro en las secciones del texto que mencionan la diversidad culinaria regional y los ejemplos de platos característicos de cada región."
}


### 3. Germany - German

In [35]:
chat_history = []
question = "Was sind einige beliebte regionale Gerichte der deutschen Küche?"

In [36]:
top_chunks_after_retrieval = rag(de_vectorstore, question, de_chunks, k)

print(f"Here are the top {k} chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("== " + t)

Here are the top 4 chunks after retrieval: 
== Als deutsche Küche bezeichnet man allgemein die Küche Deutschlands. Sie ist gekennzeichnet durch starke regionale Unterschiede und an den Außengrenzen historisch bedingt geprägt durch die Einflüsse der Küche der Nachbarländer (z. B. Frankreich, Österreich, Polen). Starke Regionalküchen gibt es auf den Gebieten ehemals eigenständiger deutscher Staaten (Bayern, Baden) und Großregionen (Franken, Schwaben, Rheinland).
== Basilikum und Oregano sowie typische mediterrane Gerichte übernommen (siehe auch französische Küche, italienische Küche, griechische Küche, türkische Küche und spanische Küche).
== mit Zwiebeln). Genauso geschätzt werden Miesmuschelgerichte, die berühmte Bergische Kaffeetafel des Bergischen Landes, Schnippelbohnensuppe, Pfannkuchen, Reibekuchen und Himmel un Ääd.
== Bürgerliche KücheHausmannskostKüche der Deutschen Demokratischen RepublikDeutsche Küche in den Vereinigten StaatenPeter Peter: Kulturgeschichte der deutschen Küche

In [37]:
prompt = prompt_template.format(documents=top_chunks_after_retrieval, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

{
    "answer": "Zu den beliebten regionalen Gerichten der deutschen Küche gehören unter anderem: Bergische Kaffeetafel, Schnippelbohnensuppe, Pfannkuchen, Reibekuchen und Himmel un Ääd (ein Kartoffelgericht mit Zwiebeln). Auch Miesmuschelgerichte sind beliebt.",
    "thinking": "Die Frage kann direkt mit Informationen aus den bereitgestellten Dokumenten beantwortet werden."
}


## Step 2 - RAG: Rerank the chunks retrieved from the vector database

Building upon the power of Retrieval-Augmented Generation (RAG), we can further enhance the relevance and accuracy of our generated outputs by incorporating a reranker model. In this approach, we first retrieve a set of potentially relevant information from our knowledge base using a retrieval model like Cohere's Embed Multilingual V3. These retrieved pieces of information are then reranked using a specialized reranker model, such as Cohere's Rerank Multilingual V3, which scores each retrieved item based on its relevance to the input query or context. The top-ranked items are then passed to the generative language model, Cohere's Command R+, which generates the final output while considering this highly relevant contextual information. By incorporating a reranker, we can effectively filter out irrelevant or tangential information, ensuring that the generated responses are focused, coherent, and directly address the specific query or context at hand.

In [38]:
# Create cohere client
co = Client(region_name="us-east-1")

In [39]:
endpoint_name = "cohere-rerank-multilingual-v3-0"
is_endpoint_created = True

In [40]:
# Create endpoint, SKIP THIS STEP IF THE ENDPOINT WAS CREATED ON THE AWS CONSOLE
if not is_endpoint_created:
    model_package_arn = "arn:aws:sagemaker:us-east-1:865070037744:model-package/cohere-rerank-multilingual-v3--13dba038aab73b11b3f0b17fbdb48ea0"
    co.create_endpoint(arn=model_package_arn, endpoint_name=endpoint_name, instance_type="ml.g5.xlarge", n_instances=1)

In [41]:
# If the endpoint is already created, you just need to connect to it
co.connect_to_endpoint(endpoint_name=endpoint_name)

### Define utility function for RAG with Reranker

In [42]:
def rag_with_reranker(vectorstore, question, chunks, k=4, top_n=3):
    # Embed the user question
    query = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", [question])
    
    # Retrieve the top K indices from the vector database
    D, top_indices = vectorstore.search(np.array(query), k)
    
    top_indices.sort()
    
    # Retrieve the top K most similar chunks
    top_chunks_after_retrieval = [re.sub(r"\t+|\n+", "", chunks[i]) for i in top_indices[0]]
    
    response = co.rerank(
        query=question,
        documents=top_chunks_after_retrieval,
        top_n=top_n
    )

    top_chunks_after_rerank = [result.document['text'] for result in response]  
    return top_chunks_after_rerank

### Test reranker

In [43]:
chat_history = []
question = "What are some popular regional dishes in American cuisine?"

In [44]:
top_chunks_after_rerank = rag_with_reranker(en_vectorstore, question, en_chunks, k=10, top_n=3)

print("Here are the top 3 chunks after rerank: ")
for t in top_chunks_after_rerank:
    print("== " + t)

Here are the top 3 chunks after rerank: 
== Highlights of American cuisine include milkshakes, barbecue, and a wide range of fried foods. Many quintessential American dishes are unique takes on food originally from other culinary traditions, including pizza, hot dogs, and Tex-Mex. Regional highlights include a range of fish dishes in the coastal states, gumbo, and cheesesteak. American cuisine has specific foods that are eaten on holidays, such as a turkey at Thanksgiving dinner or Christmas dinner. Modern American cuisine includes a focus on fast
== American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, T

In [45]:
prompt = prompt_template.format(documents=top_chunks_after_rerank, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

```json
{
    "answer": "Popular regional dishes in American cuisine include fish specialties from coastal states, gumbo, cheesesteak, Tex-Mex, and Cajun and Creole dishes from Louisiana.",
    "thinking": "The documents mention several regional specialties, including dishes influenced by specific cultural and regional heritages."
}
```


# Conclusion

In this notebook, we explored Cohere's multilingual models, including Command R+, Embed Multilingual V3, and Rerank Multilingual V3 for reranking retrieved information, RAG significantly enhances the relevance and accuracy of generated content.

The reranking step, in particular, proved invaluable in improving the quality of generated text by incorporating relevant information from a knowledge base with support for multiple languages. This approach not only ensures factual accuracy but also provides context-specific responses tailored to the user's query.