<a href="https://colab.research.google.com/github/EugeneLightsOn/llama_index/blob/cohere_context_chat_plus_citations/docs/examples/chat_engine/chat_engine_cohere_context_citations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Cohere Chat Engine with citations and documents - Usage Example
========================================
    
This notebook demonstrates how to use the Cohere chat engine to generate responses to user input with citations and context documents(Cohere documents mode - see the documentation [here](https://docs.cohere.com/docs/retrieval-augmented-generation-rag) and [here](https://docs.cohere.com/docs/retrieval-augmented-generation-rag). Cohere Chat Engine provides several modes for generating responses, including: Chat, Async Chat, Stream Chat and Async Stream Chat. We will use OpenSearch as a vector index to retrieve the context documents and the next sample [data](https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt) 



Download Data
========================================

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-21 13:47:16--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-02-21 13:47:17 (1.29 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



Install requirements
========================================


In [None]:
%pip install llama-index
%pip install cohere


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Let's start by creating a new index and adding the context documents into it.
========================================

### Imports

In [None]:
import asyncio
import os
from llama_index.embeddings.cohereai import CohereEmbedding
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
)
from llama_index.vector_stores import (
    OpensearchVectorStore,
    OpensearchVectorClient,
)

from llama_index.llms import Cohere, ChatMessage

### Constants

In [None]:
os.environ["COHERE_API_KEY"] = "COHERE_API_KEY_HERE"
COHERE_API_KEY = os.environ["COHERE_API_KEY"]
OPENSEARCH_URL = "http://localhost:9200"
OPENSEARCH_INDEX_NAME = "idx_paul_graham_essay"

### Create a new Cohere LLM instance

In [None]:
LLM = Cohere(
    "command",
    api_key=COHERE_API_KEY,
    temperature=0.5,
    additional_kwargs={"prompt_truncation": "AUTO"},
)

### Create a new Cohere embedding

In [None]:
def get_cohere_embed_model(
    input_type="search_document", model_name="embed-english-v3.0"
):
    cohere_embed_model = CohereEmbedding(
        cohere_api_key=COHERE_API_KEY,
        model_name=model_name,
        input_type=input_type,
    )

    return cohere_embed_model

### Create a new service context

In [None]:
def get_cohere_service_context(input_type="search_document"):
    cohere_service_context = ServiceContext.from_defaults(
        llm=LLM, embed_model=get_cohere_embed_model(input_type=input_type)
    )
    return cohere_service_context

### Create a new open search vector client

In [None]:
def get_opensearch_store_client(
    host="http://localhost:9200",
    size=1024,
    embedding_field="passage_embedding",
    text_field="passage_text",
):
    opensearch_store_client = OpensearchVectorClient(
        host,
        OPENSEARCH_INDEX_NAME,
        size,
        embedding_field=embedding_field,
        text_field=text_field,
    )

    return opensearch_store_client

### Read the sample documents from the file

In [None]:
def get_sample_documents(path="./data/paul_graham/"):
    sample_documents = SimpleDirectoryReader(path).load_data()
    return sample_documents

### Create a new Opensearch vector store

In [None]:
def get_opensearch_vector_store():
    opensearch_vector_store = OpensearchVectorStore(
        get_opensearch_store_client()
    )

    return opensearch_vector_store

### Create a new storage context

In [None]:
def get_opensearch_storage_context():
    opensearch_storage_context = StorageContext.from_defaults(
        vector_store=get_opensearch_vector_store()
    )
    return opensearch_storage_context

### Create a new vector store index and fill it with the sample documents

In [None]:
def fill_opensearch_documents_index():
    index = VectorStoreIndex.from_documents(
        documents=get_sample_documents(),
        service_context=get_cohere_service_context(),
        storage_context=get_opensearch_storage_context(),
    )
    return index

### For the Chat Engine we should use Cohere embed model with input type "search_query". See the [documentation](https://docs.cohere.com/reference/embed) for more details. So we need to use a new index with the input type "search_query"

In [None]:
def get_opensearch_query_index():
    opensearch_store = get_opensearch_vector_store()
    opensearch_context = get_cohere_service_context(input_type="search_query")
    index = VectorStoreIndex.from_vector_store(
        opensearch_store, service_context=opensearch_context
    )
    return index

### And finally, we can create a new chat engine using Cohere Context mode

In [None]:
def get_chat_engine_by_index_and_mode(index, mode="cohere_context"):
    service_context = get_cohere_service_context(input_type="search_query")
    chat_engine = index.as_chat_engine(
        service_context=service_context, chat_mode=mode
    )
    return chat_engine

## Let's run it and see the results
### Chat

In [None]:
# Create a new index and fill it with the sample documents run it if the index is not created yet. Just comment it after the first run
fill_opensearch_documents_index()

# Create a new chat engine
chat_engine = get_chat_engine_by_index_and_mode(get_opensearch_query_index())

# Ask a question using the chat engine
question = "What did Paul Graham do with SHRDLU?"
llm_response = chat_engine.chat(question)
# Print the response
print(llm_response)
# Print the citations
print(f"Citations: {llm_response.citations}")
# Print the context documents
print(f"Documents: {llm_response.documents}")

Paul Graham reverse-engineered SHRDLU for his undergraduate thesis. SHRDLU was a PBS documentary that showed Terry Winograd using the SHRDLU program. Paul Graham was drawn to working on AI after seeing the demo of SHRDLU in the documentary and eventually focused on the Lisp language after deciding AI, as practiced then, was a hoax.
Citations: [{'start': 12, 'end': 37, 'text': 'reverse-engineered SHRDLU', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 46, 'end': 67, 'text': 'undergraduate thesis.', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 81, 'end': 96, 'text': 'PBS documentary', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 109, 'end': 123, 'text': 'Terry Winograd', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 270, 'end': 283, 'text': 'Lisp language', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 299, 'end': 333, 'text': 'AI, as practiced then, was a hoax.', 'document

### Async Chat

In [None]:
llm_response = await chat_engine.achat(question)
print(llm_response)
print(f"Citations: {llm_response.citations}")
print(f"Documents: {llm_response.documents}")

For his undergraduate thesis, Paul Graham reverse-engineered Shrdlu. He later wrote about his experience with Shrdlu and his motivations for working with Lisp in an essay titled "The Acceleration of Addict Formation", where he discusses the increasing incentive for creating new hacks instead of focusing on the original goal of the project.
Citations: [{'start': 8, 'end': 28, 'text': 'undergraduate thesis', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}, {'start': 42, 'end': 68, 'text': 'reverse-engineered Shrdlu.', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}]
Documents: [{'id': '68841b92-0964-4488-a4b9-883dd26288ce', 'text': 'I couldn\'t have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.\n\nAI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, 

### Stream Chat 

In [None]:
llm_response = chat_engine.stream_chat(question)
llm_response.print_response_stream()

print("Citations: ")
llm_response.print_citations_stream()

print("Documents: ")
llm_response.print_documents_stream()

For his undergraduate thesis, Paul Graham reverse-engineered SHRDLU. He later wrote about his experience with SHRDLU and his motivations for working with Lisp in an essay titled "The Acceleration of Addict Formation", where he discusses the increasing incentive for creating new hacks instead of focusing on the original goal of the project.Citations: 
[{'start': 8, 'end': 28, 'text': 'undergraduate thesis', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 42, 'end': 68, 'text': 'reverse-engineered SHRDLU.', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 94, 'end': 116, 'text': 'experience with SHRDLU', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 125, 'end': 158, 'text': 'motivations for working with Lisp', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 178, 'end': 216, 'text': '"The Acceleration of Addict Formation"', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 266, 'end': 2

### Async Stream Chat

In [None]:
llm_response = await chat_engine.astream_chat(question)
await llm_response.aprint_response_stream()
print("Citations: ")
await llm_response.aprint_citations_stream()
print("Documents: ")
await llm_response.aprint_documents_stream()

For his undergraduate thesis, Paul Graham reverse-engineered SHRDLU. He later wrote about his experience with SHRDLU and his motivations for working with Lisp in an essay titled "The Acceleration of Addict Formation", where he discusses the increasing incentive for creating new hacks instead of focusing on the original goal of the project.Citations: 
[{'start': 8, 'end': 28, 'text': 'undergraduate thesis', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 42, 'end': 68, 'text': 'reverse-engineered SHRDLU.', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 94, 'end': 116, 'text': 'experience with SHRDLU', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 125, 'end': 158, 'text': 'motivations for working with Lisp', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 178, 'end': 216, 'text': '"The Acceleration of Addict Formation"', 'document_ids': ['68841b92-0964-4488-a4b9-883dd26288ce']}][{'start': 266, 'end': 2