# Cohere's Multilingual Models on AWS

In today's globalized world, the ability to understand and process multiple languages is becoming increasingly important. [Cohere](https://cohere.com/) has developed a suite of multilingual models designed to tackle this challenge. In this notebook, we'll explore the Cohere's multilingual models and their potential applications.

## Why Multilingual Models Matter

Language is a fundamental aspect of human communication and cultural expression. As businesses and organizations expand their reach across borders, the need for effective multilingual solutions becomes paramount. Multilingual models enable seamless communication, bridging language barriers and facilitating cross-cultural understanding.

[Cohere's multilingual models](https://aws.amazon.com/marketplace/seller-profile?id=87af0c85-6cf9-4ed8-bee0-b40ce65167e0) offer several benefits:

1. **Broad Language Coverage**: Cohere's models support a wide range of languages, allowing you to process and generate content in multiple languages simultaneously.

2. **Improved Accuracy**: By training on diverse language data, these models can better capture nuances, idioms, and cultural contexts, resulting in more accurate translations and language processing.

3. **Scalability**: With a single multilingual model, you can address language needs across multiple regions, reducing the need for maintaining separate models for each language.

4. **Cost-Efficiency**: Deploying a single multilingual model can be more cost-effective than maintaining multiple monolingual models, especially for organizations with global operations.

## The Models

In this notebook, we'll be working with the following Cohere's multilingual models:

1. **Cohere Command R+**: A powerful Large Language Model (LLM) capable of understanding and generating text in multiple languages.

2. **Cohere Embed Multilingual V3**: An embedding model designed to encode text from various languages into dense vector representations, enabling efficient similarity comparisons and semantic search.

3. **Cohere Rerank V3.5**: The Cohere Rerank is a powerful tool that allows users to refine and improve the output of their models by considering additional context and signals

Throughout this notebook, we'll explore practical examples and use cases for these models, showcasing their capabilities in areas such as multilingual content generation, and information retrieval.


## Architecture

![MultilingualRAGArchitecture](images/MultilingualRAGArchitectureRerank.png)

1. Create embedding for the dataset and store it in a vector DB
2. Create embedding for the query
3. Obtain search results based on embedding
4. Refine search results with Rerank and send Query+ Reranked Search results to the LLM to generate final output

## Pre-requisites:

Use kernel either `conda_python3`, `conda_pytorch_p310` or `conda_tensorflow2_p310`.

## Step 1: Install Dependencies

Here, we will install all the required dependencies to run this notebook.

In [1]:
!pip install wikipedia==1.4.0 --quiet
!pip install --upgrade setuptools==69.5.1 wheel --quiet
!pip install cohere-aws==0.8.16 
!pip install faiss-cpu==1.8.0 --quiet
!pip install langchain-text-splitters==0.2.2 --quiet
!pip install numpy==1.26.4 --quiet
!pip install --upgrade cohere-aws cohere hnswlib pandas numpy markdown boto3 -q

Collecting cohere-aws==0.8.16
  Using cached cohere_aws-0.8.16-cp310-cp310-linux_x86_64.whl
Installing collected packages: cohere-aws
  Attempting uninstall: cohere-aws
    Found existing installation: cohere-aws 0.8.18
    Uninstalling cohere-aws-0.8.18:
      Successfully uninstalled cohere-aws-0.8.18
Successfully installed cohere-aws-0.8.16
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.36.4 requires botocore==1.35.63, but you have botocore 1.36.26 which is incompatible.
awscli 1.36.4 requires s3transfer<0.11.0,>=0.10.0, but you have s3transfer 0.11.2 which is incompatible.[0m[31m
[0m

In [2]:
import boto3
import cohere_aws
#from cohere_aws import Client
import faiss
import json
from langchain_text_splitters import RecursiveCharacterTextSplitter
import numpy as np
import re
import wikipedia

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


## Step 2 - Downloading dataset

To get started, we'll leverage data from wikipedia to answer questions about the diverse culinary traditions. In this notebook we will focus on the English language.

### USA - English

In [3]:
# English
wikipedia.set_lang('en')

In [4]:
# we'll get some wikipedia data
article = wikipedia.page('American Cuisine')
en_text = article.content
print(f"The text has roughly {len(en_text.split())} words.")

The text has roughly 19081 words.


In [5]:
print(en_text[0:1000])

American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, Texan, Tex-Mex, New Mexican, and Tlingit, and the cuisines of immigrant groups such as Chinese American, German American, Italian American, Greek American, Jewish American, and Mexican American. The large size of America and its long history of immigration have created an especially diverse cuisine that varies by region. 
American cooking dates back to the traditions of the Native Americans, whose diet included a mix of farmed and hunted food, and varied widely across the continent. The Colonial period created a mix of new world and Old World cookery, 

## Step 3 - Vector Indexing

We index the document in an open-source vectorstore called FAISS. This requires chunking the documents, creating embeddings, and indexing them into FAISS.

To efficiently store and retrieve relevant information from our multilingual data, we will leverage [FAISS](https://ai.meta.com/tools/faiss/) (Facebook AI Similarity Search), an open-source library for efficient similarity search and clustering of dense vectors. FAISS allows us to index and search large collections of embeddings, enabling quick retrieval of the most relevant documents or passages based on their vector representations. 

Our process involves chunking the retrieved documents into smaller, more manageable segments, creating dense vector embeddings for each chunk using Cohere's Embed Multilingual V3, and indexing these embeddings into FAISS. 

By utilizing FAISS's powerful similarity search capabilities, we can quickly identify the most relevant chunks of information based on their semantic similarity to a given query or context. This approach not only facilitates efficient information retrieval but also unlocks the potential for advanced applications such as semantic search, question answering, and knowledge base construction from multilingual sources. FAISS's scalability and performance make it an ideal choice for handling large volumes of embeddings, ensuring responsive and accurate results even with extensive multilingual data.

In [6]:
# For chunking let's use langchain to help us split the text
def get_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=512, # Limiting chunk size for embedding model
        chunk_overlap=50,
        length_function=len,
        is_separator_regex=False,
    )

    # Split the text into chunks with some overlap
    chunks_ = text_splitter.create_documents([text])
    chunks = [c.page_content for c in chunks_]
    
    return chunks

In [7]:
en_chunks = get_chunks(en_text)
print(f"The text has been broken down in {len(en_chunks)} chunks.")

The text has been broken down in 362 chunks.


### Create embeddings for every text chunk

In [8]:
def generate_embeddings(
    bedrock_client,
    model_id,
    texts,
    batch_size=50
):
    """
    Convert text into embeddings.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        texts (list) : The texts to send to the embedding model.
        batch_size (int): Batch size to limit the number of chunks sent to the embedding model.

    Returns:
        response (JSON): The embeddings that the model generated.
    """
    
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        body = {
            "texts": batch_texts,
            "input_type": "search_document"
        }
        # Send the message.
        response = bedrock_client.invoke_model(modelId=model_id, body=json.dumps(body))

        response_body = json.loads(response.get('body').read())
        
        embeddings.extend(response_body["embeddings"])

    return embeddings

In [9]:
# Define bedrock client
bedrock_client = boto3.client(service_name='bedrock-runtime', region_name="us-west-2")

In [10]:
en_embeddings = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", en_chunks)
print(f"We just computed {len(en_embeddings)} embeddings.")

We just computed 362 embeddings.


### Store embeddings into FAISS

In [11]:
# Create an index
en_vectorstore = faiss.IndexFlatL2(1024)

# Add data to the index
en_vectorstore.add(np.array(en_embeddings))

## Step 4: RAG: Retrieve relevant chunks from the vector database

Retrieval-Augmented Generation (RAG) is a powerful approach that combines the strengths of retrieval and generation models for natural language processing tasks. In this approach, we first utilize an embedding model to identify relevant information from FAISS based on the input query. The retrieved information is then incorporated into a large language model, like Cohere's Command R+, which uses this additional context to generate more informed and accurate outputs. By leveraging the vast knowledge contained in multilingual sources, RAG enables us to produce high-quality, factual responses that go beyond the model's initial training data. This approach is particularly valuable for tasks like question answering, where relevant external knowledge can significantly improve the quality and completeness of the generated responses.

### Define utility function for RAG

In [12]:
def rag(vectorstore, question, chunks, k=4):
    # Embed the user question
    query = generate_embeddings(bedrock_client, "cohere.embed-multilingual-v3", [question])
    
    # Retrieve the top K indices from the vector database
    D, top_indices = vectorstore.search(np.array(query), k)
    
    top_indices.sort()
    
    # Retrieve the top K most similar chunks
    top_chunks_after_retrieval = [re.sub(r"\t+|\n+", "", chunks[i]) for i in top_indices[0]]
    
    return top_chunks_after_retrieval

### Define utility function for conversation with Bedrock converse API

In [13]:
def generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history=[],
    temperature=0.3,
    max_tokens=400,
    top_p=0.95
):
    """
    Sends messages to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_prompt (str) : The system prompt for the model to use.
        prompt (str) : The message/question to send to the model.
        chat_history (list): The chat history from user and assistant.

    Returns:
        response (str): The text generated output from the model.
        chat_history (str): The full conversation between user and assistant that the model generated.

    """

    system_prompts = [
        {
            "text": system_prompt
        }
    ]

    messages = [
        {
            "role": "user",
            "content": [{"text": prompt}]
        }
    ]

    chat_history.extend(messages)

    # Base inference parameters.
    inference_config = {
        "temperature": temperature,
        "maxTokens": max_tokens,
        "topP": top_p,
    }

    # Additional inference parameters to use.
    additional_model_fields = {}

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    chat_history.append(response["output"]["message"])

    return response["output"]["message"]["content"][0]["text"], chat_history

### Define the system prompt and guardrails

In [14]:
system_prompt = """You are an AI assistant with expertise in American cuisine. Your knowledge is based solely on the information provided between the <documents> and </documents> tags.

Before answering any questions, first check if the user has provided information between the <documents> and </documents> tags. If no information is provided, respond with the following JSON:

{
    "answer": "I do not have enough information to answer that question."
}

If documents are provided, your task is to answer questions accurately and concisely, using only the details from the given documents. Do not use your own knowledge or any external sources to answer the questions, even if you know the answer.

If a question cannot be fully answered using the provided documents, respond with the following JSON:

{
    "answer": "I do not have enough information to answer that question."
}

All responses must be in valid JSON format, with the 'answer' key containing the actual response text.

To provide transparency, include your reasoning process with the 'thinking' key as the following format:

{
    "answer": "Your response here",
    "thinking": "Your reasoning process here"
}

Be concise and objective in your responses, without any personal opinions or subjective statements.
"""
prompt_template = "<documents>\n{documents}\n</documents>\n\nQuestion: {question}\nThink step-by-step."

### Define model ID parameter

In [15]:
model_id = "cohere.command-r-plus-v1:0"

### Test guardrails

In [16]:
chat_history = []
question = "What are some popular regional dishes in American cuisine?"

In [17]:
prompt = prompt_template.format(documents="", question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

{
    "answer": "I do not have enough information to answer that question."
}


### Retrieved top K most relevant chunks from FAISS

In [18]:
# Setting number of nearest neighbours
k = 4

In [19]:
top_chunks_after_retrieval = rag(en_vectorstore, question, en_chunks, k)

print(f"Here are the top {k} chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("== " + t)

Here are the top 4 chunks after retrieval: 
== American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, Texan, Tex-Mex, New Mexican, and Tlingit, and the cuisines of
== Highlights of American cuisine include milkshakes, barbecue, and a wide range of fried foods. Many quintessential American dishes are unique takes on food originally from other culinary traditions, including pizza, hot dogs, and Tex-Mex. Regional highlights include a range of fish dishes in the coastal states, gumbo, and cheesesteak. American cuisine has specific foods that are eaten on holidays, such as a turkey at Thanksgiving dinner or Chr

In [20]:
chat_history = []
prompt = prompt_template.format(documents=top_chunks_after_retrieval, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

{
    "answer": "Some popular regional dishes in American cuisine include:
- Various fish dishes in coastal states
- Gumbo
- Cheesesteak
- New England specialties
- Cajun and Louisiana Creole cuisine
- Tex-Mex
- Barbecue",
    "thinking": "The documents mention several regional dishes and cuisines within the United States. These include fish dishes from coastal states, gumbo, cheesesteak, New England specialties, and influences from Cajun, Louisiana Creole, and Tex-Mex cultures."
}


### Add Rerank to the RAG pipeline

In [21]:
# Setting number of nearest neighbours
k = 15

<b>Search result for the top 15 results using Embeddings only<b>

In [22]:
top_chunks_after_retrieval = rag(en_vectorstore, question, en_chunks, k)

print(f"Here are the top {k} chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("## \n " + t)

Here are the top 15 chunks after retrieval: 
## 
 American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsylvania Dutch, Mormon foodways, Texan, Tex-Mex, New Mexican, and Tlingit, and the cuisines of
## 
 New Mexican, and Tlingit, and the cuisines of immigrant groups such as Chinese American, German American, Italian American, Greek American, Jewish American, and Mexican American. The large size of America and its long history of immigration have created an especially diverse cuisine that varies by region.
## 
 Highlights of American cuisine include milkshakes, barbecue, and a wide range of fried foods. Many quintessential American dishes

<b>Define Bedrock runtime agent for Cohere Rerank<b>

In [23]:
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime',region_name='us-west-2')

rerank_modelId = "cohere.rerank-v3-5:0"
rerank_package_arn = f"arn:aws:bedrock:us-west-2::foundation-model/{rerank_modelId}"

In [24]:
def rerank_text(text_query, text_sources, num_results, model_package_arn):
    response = bedrock_agent_runtime.rerank(
        queries=[
            {
                "type": "TEXT",
                "textQuery": {
                    "text": text_query
                }
            }
        ],
        sources=text_sources,
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "numberOfResults": num_results,
                "modelConfiguration": {
                    "modelArn": model_package_arn,
                }
            }
        }
    )
    return response['results']

In [25]:
text_sources = []
for text in top_chunks_after_retrieval:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "TEXT",
            "textDocument": {
                "text": text,
            }
        }
    })

<b>Search result for the top 15 results using Rerank as the 2nd step retrieval<b>

In [26]:
response_rerank = rerank_text(question, text_sources, 5, rerank_package_arn)
top_chunks_after_rerank = []
for i in response_rerank:
    top_chunks_after_rerank.append(top_chunks_after_retrieval[i['index']])
    print("\tIndex:{} Relevance:{}\t{}".format(i['index'],i['relevanceScore'],top_chunks_after_retrieval[i['index']]).replace("\n", " "))
    print ("\n")

	Index:2 Relevance:0.8364614248275757	Highlights of American cuisine include milkshakes, barbecue, and a wide range of fried foods. Many quintessential American dishes are unique takes on food originally from other culinary traditions, including pizza, hot dogs, and Tex-Mex. Regional highlights include a range of fish dishes in the coastal states, gumbo, and cheesesteak. American cuisine has specific foods that are eaten on holidays, such as a turkey at Thanksgiving dinner or Christmas dinner. Modern American cuisine includes a focus on fast


	Index:0 Relevance:0.7409440875053406	American cuisine consists of the cooking style and traditional dishes prepared in the United States. It has been significantly influenced by Europeans, Indigenous Americans, Africans, Latin Americans, Asians, Pacific Islanders, and many other cultures and traditions. Principal influences on American cuisine are European, Native American, soul food, regional heritages including Cajun, Louisiana Creole, Pennsyl

In [27]:
chat_history = []
prompt = prompt_template.format(documents=top_chunks_after_rerank, question=question)

response, chat_history = generate_conversation(
    bedrock_client,
    model_id,
    system_prompt,
    prompt,
    chat_history
)
print(response)

```json
{
    "answer": "Some popular regional dishes in American cuisine include gumbo, cheesesteak, and various fish dishes from the coastal states. Each region of the United States has its own distinct culinary offerings, with celebrity chefs like Peter Merriman showcasing Hawaii's regional cuisine and Roy Choi known for Korean American dishes.",
    "thinking": "The documents mention regional dishes and highlight gumbo and cheesesteak as examples. It also mentions fish dishes in coastal states, and the regional nature of modern American cuisine, with celebrity chefs known for specific types of regional cuisine."
}
```


## Conclusion
In this notebook we walked through how to leverage Cohere's Embed V3 multimodal model and Rerank as part of your RAG workflow to achieve more precise search results. Output of the search results when passed on the the LLM helps provide accurate results with lesser hallucinations.