# Wikipedia Semantic Search with Cohere Embedding Archives

---
## Introduction
In this notebook, we demonstrate how to use the [Amazon Bedrock InvokeModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) to do simple [semantic search](https://txt.cohere.ai/what-is-semantic-search/) on the [Wikipedia embeddings archives](https://cohere.com/blog/embedding-archives-wikipedia) published by Cohere. These archives embed Wikipedia sites in multiple languages. In this example, we'll use the 2023 version of [Wikipedia Simple English](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3-int8-binary) and binary embeddings.

### Semantic Search and Text Embeddings
Semantic search leverages text embeddings and similarity to find responses based on meaning, not just keywords. Text embeddings represent pieces of text as numeric vectors that encode semantic meaning. These embeddings allow for mathematical comparisons of word and sentence meaning. Multilingual embeddings map text in different languages to the same vector space, enabling semantic search across languages. 

### Int8/byte and binary encoded embeddings
Semantic search over large datasets can require a lot of memory because most vector databases store embeddings and vector indices in memory. Dimensionality reduction to conserve memory and reduce costs can perform poorly ([Cohere research](https://arxiv.org/abs/2205.11498?ref=cohere-ai.ghost.io)). 

Therefore, a better approach is to use a model that uses fewer bits per dimension. Cohere Embed is a text embedding model that offers leading performance in 100+ languages. It translates text into vector representations which encode semantic meaning. Cohere Embed is the first embedding model that natively supports int8/byte and binary embeddings.

Binary embeddings give you a 32x reduction in memory and can be searched 40x faster. Given that embeddings are typically stored as float32, an embedding with 1024 dimensions requires 1024 x 4 bytes = 4096 bytes. Using 1 bit per dimension results in a 32x reduction in required memory (or, 4096 * 8 / 1024). See [Cohere int8 & binary embeddings](https://cohere.com/blog/int8-binary-embeddings).

---

## Getting Started

### Step 0: Install dependencies

In [51]:
# Let's install HF datasets and boto3, the AWS SDK for Python
%pip install datasets --quiet
%pip install boto3==1.34.120 --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Step 1: Install the Wikipedia embeddings archives published by Cohere

Let's now download 1,000 records from the English Wikipedia embeddings archive so we can search it afterwards.

In [55]:
from datasets import load_dataset
# Import torch, the open-source machine learning library
import torch

# Load at max 1000 documents and embeddings
max_docs = 1000
# Use the Simple English Wikipedia subset
lang = "simple"
docs_stream = load_dataset(f"Cohere/wikipedia-2023-11-embed-multilingual-v3-int8-binary", lang, split="train", streaming=True)

# To verify we have loaded the data, print docs_stream
print(docs_stream)

IterableDataset({
    features: ['_id', 'url', 'title', 'text', 'emb_int8', 'emb_ubinary'],
    n_shards: 7
})


The `IterableDataset` object contains a collection of 1000 examples, each with `features` which are the names of the columns for each example.

The `emb_int8` is an integer encoded embedding while `emb_ubinary` is a binary encoded embedding for each Wikipedia article article.

## Step 2: Create tensor of binary embeddings for semantic search

In [60]:
# Let's create lists of documents and binary embeddings
docs = []
doc_embeddings = []

for doc in docs_stream:
    docs.append(doc)
    doc_embeddings.append(doc['emb_ubinary'])
    if len(docs) >= max_docs:
        break

# Convert doc_embeddings into a PyTorch tensor
doc_embeddings = torch.tensor(doc_embeddings)

Now, `doc_embeddings` holds the embeddings of the first 1,000 documents in the dataset. Each document is represented as an [embeddings vector](https://cohere.com/blog/sentence-word-embeddings) of 128 values. 

In [61]:
# Return the tensor shape
doc_embeddings.shape

torch.Size([1000, 128])

## Step3: Embed query and compute dot product with document embeddings
We can now search these vectors for any query we want. For this toy example, we'll ask a question about Alan Turing since we know the Wikipedia page for Alan Turing is included in this subset of the archive.

To search, we embed the query, then get the nearest neighbors to its embedding (using dot product).

This shows the top five passages that are relevant to the query. We can retrieve more results by changing the `k` value. The question in this simple demo is about Wikipedia because we know that the Wikipedia page is part of the documents in this subset of the archive.

In [91]:
# To use Cohere models on Bedrock we need to install dependencies
import boto3, json, logging
# Set up the Bedrock client
bedrock_rt = boto3.client(service_name="bedrock-runtime", region_name = "us-east-1")
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Create request paramaters for Bedrock
model_id = 'cohere.embed-multilingual-v3'
accept = '*/*'
content_type = 'application/json'
embedding_types = ["ubinary"]
input_type = "search_query"

# Create the text used for semantic search
query = "Tell me about Alan Turing"

# Set the number of nearest neighbors
k = 7

body = json.dumps({
    "texts": [query],
    "input_type": input_type,
    "embedding_types": embedding_types}
)

# Call the Bedrock invoke_model API
response = bedrock.invoke_model(
    body=body,
    modelId=model_id,
    accept=accept,
    contentType=content_type
)

# Load the response into response_body
response_body = json.loads(response.get('body').read())

# Extract the binary embeddings
query_emb_int8 = response_body['embeddings']['ubinary']
print("Query embedding:", query_emb_int8, "\n")

# Convert query into a PyTorch tensor
query_emb_int8 = torch.tensor(query_emb_int8)

Query embedding: [[18, 63, 75, 232, 59, 67, 51, 160, 255, 68, 251, 186, 114, 165, 136, 58, 82, 15, 211, 232, 128, 37, 107, 204, 75, 163, 74, 251, 32, 233, 200, 154, 106, 241, 127, 125, 74, 31, 123, 209, 82, 220, 228, 15, 254, 151, 220, 43, 199, 230, 143, 73, 67, 229, 149, 61, 34, 86, 69, 56, 215, 178, 131, 49, 108, 251, 76, 187, 134, 2, 155, 169, 129, 130, 229, 103, 12, 113, 145, 9, 32, 139, 212, 3, 224, 64, 27, 151, 175, 217, 139, 30, 132, 192, 111, 60, 221, 162, 108, 120, 153, 219, 214, 165, 164, 133, 78, 232, 203, 63, 149, 53, 135, 117, 100, 213, 75, 46, 114, 159, 22, 216, 255, 233, 98, 26, 252, 22]] 



In [92]:
# Compute dot score between query embedding and document embeddings
dot_scores = torch.mm(query_emb_int8, doc_embeddings.transpose(0, 1))
top_k = torch.topk(dot_scores, k)

# Print results
print("Query:", query)
for doc_id in top_k.indices[0].tolist():
    print(docs[doc_id]['title'])
    print(docs[doc_id]['text'])
    print(docs[doc_id]['url'], "\n")

Query: Tell me about Alan Turing
Alan Turing
Turing was one of the people who worked on the first computers. He created the theoretical  Turing machine in 1936. The machine was imaginary, but it included the idea of a computer program.
https://simple.wikipedia.org/wiki/Alan%20Turing 

Alan Turing
In 2013, almost 60 years later, Turing received a posthumous Royal Pardon from Queen Elizabeth II. Today, the “Turing law” grants an automatic pardon to men who died before the law came into force, making it possible for living convicted gay men to seek pardons for offences now no longer on the statute book.
https://simple.wikipedia.org/wiki/Alan%20Turing 

Botany
Gregor Mendel (1822–1884), Augustinian priest and scientist, and is often called the father of genetics for his study of the inheritance of traits in pea plants.
https://simple.wikipedia.org/wiki/Botany 

Creativity
Creativity is the ability of a person or group to make something new and useful or valuable, or the process of making s

In [95]:
# Optionally, send the same query to the Command R model using the Bedrock converse API and compare the output
user_message = "Tell me about Alan Turing."
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = bedrock_rt.converse(
        modelId='cohere.command-r-plus-v1:0',
        messages=conversation,
        inferenceConfig={"maxTokens": 200, "temperature": 0.5, "topP": 0.9},
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)


Alan Turing was a British mathematician, computer scientist, and cryptanalyst who made significant contributions to several fields during his lifetime. He is widely regarded as one of the most influential figures in the development of theoretical computer science and is often credited as being the father of modern computing and artificial intelligence.

Turing was born in London, England, in 1912 and showed a talent for science and mathematics from an early age. He attended the University of Cambridge, where he studied mathematics and gained a first-class degree in 1934. He then went on to do postgraduate work at Princeton University, where he received his Ph.D. in mathematics in 1938.

During World War II, Turing played a crucial role in breaking German military codes, particularly those generated by the Enigma machine. His work at Bletchley Park, Britain's code-breaking center, was instrumental in helping the Allies gain a crucial advantage over the Germans and is believed to have
