# Cohere Web Search with LlamaIndex

This example shows how to use the Python [LlamaIndex](https://docs.llamaindex.ai/en/stable/) library to run a text-generation request against [Cohere's](https://cohere.com/) API, then augment that request using the results from a Google web search.

**Requirements:**
- You will need an access key to Cohere's API key, which you can sign up for at (https://dashboard.cohere.com/welcome/login). A free trial account will suffice, but will be limited to a small number of requests.
- After obtaining this key, store it in plain text in your home in directory in the `~/.cohere.key` file.

## Set up the RAG workflow environment

In [1]:
from bs4 import BeautifulSoup
from getpass import getpass
from googlesearch import search
import os
from pathlib import Path
import requests

from llama_index import VectorStoreIndex, ServiceContext
from llama_index.embeddings.cohereai import CohereEmbedding
from llama_index.llms import Cohere
from llama_index.readers.string_iterable import StringIterableReader
from llama_index.postprocessor.cohere_rerank import CohereRerank

Set up some helper functions:

In [2]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Make sure other necessary items are in place:

In [3]:
try:
    os.environ["COHERE_API_KEY"] = open(Path.home() / ".cohere.key", "r").read().strip()
    os.environ["CO_API_KEY"] = os.environ["COHERE_API_KEY"]
except Exception:
    print(f"ERROR: You must have a Cohere API key available in your home directory at ~/.cohere.key")

## Start with a basic generation request without RAG augmentation

Let's start by asking the Cohere LLM a question about recent events that it doesn't know about, something that happened after it finished training. At the time I'm writing this notebook in January 2024, Cohere doesn't know who won the last World Series of baseball.

**The correct answer is the Texas Rangers won in November 2023.**

"*Who won the 2023 World Series of baseball?*"

In [4]:
query = "Who won the 2023 World Series of baseball?"

## Now send the query to Cohere

In [5]:
llm = Cohere(api_key=os.environ["COHERE_API_KEY"])
result = llm.complete(query)
print(f"Result: \n\n{result}")

Result: 

 The 2023 World Series has not yet been played and therefore there is no winner. 

The 2022 World Series was won by the Houston Astros who defeated the Philadelphia Phillies 4 games to 1.


At the best, the Cohere LLM admits that it doesn't know. At worst, it tells a lie and says the Houston Astros won (they won the year before, in 2022).

Let's see how we can use RAG to augment our question with a Google wen search and get the correct answer.

## Ingestion: Do a Google web search with the question

Parse through all the websites returned by a Google search, break them up into smaller digestible chunks, then encode them as vector embeddings.

In [6]:
# Do a Google web search and store the results in a documents list
web_documents = []
for result_url in search(query, tld="com", num=10, stop=10, pause=2):
    response = requests.get(result_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    web_documents.append(soup.get_text())

print(f"Setting up the embeddings model...\n")
embed_model = CohereEmbedding(
    model_name="embed-english-v3.0",
    input_type="search_query"
)
service_context = ServiceContext.from_defaults(
    embed_model=embed_model,
    llm=llm,
)
print("Done")

Setting up the embeddings model...



[nltk_data] Downloading package punkt to
[nltk_data]     /tmp/fh64fh4kPCnFQdxs/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Done


# Storage: Store the document chunks in a vector DB

The retriever will identify the document chunks that most closely match our original query. (This takes about 1-2 minutes)

In [7]:
documents = StringIterableReader().load_data(texts=web_documents)
index = VectorStoreIndex.from_documents(documents, service_context=service_context, show_progress=True)

Parsing nodes:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/77 [00:00<?, ?it/s]

Let's see what results it found. Important to note, these results are in the order the retriever thought were the best matches.

# Retrieval: Retrieve the chunks that most closely match the original query

In [8]:
search_query_retriever = index.as_retriever(service_context=service_context)
search_query_retrieved_nodes = search_query_retriever.retrieve(query)

# Reranking: Improve the ordering of the document chunks

In [9]:
cohere_rerank = CohereRerank(top_n=3)
query_engine = index.as_query_engine(
    node_postprocessors = [cohere_rerank],
    service_context = service_context
)

## Lastly, let's run our LLM query a final time with the reranked results

In [10]:
result = query_engine.query(query)
print(result)

The Texas Rangers won the 2023 World Series Championship.
