# RAG Using Cohere Command and Cohere Reranking

## Importing Dependencies

In [1]:
from langchain_cohere import CohereEmbeddings
from langchain.vectorstores.deeplake import DeepLake
from langchain.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_cohere import ChatCohere
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.retrievers import ContextualCompressionRetriever
from langchain_cohere import CohereRerank



## Constants

In [2]:
API_KEY = "481RrCzFPdfl0eTZWSLMApiw9TGWfjHAoK9BKerd"
TEMPERATURE = 0.3

In [3]:
embeddings = CohereEmbeddings(
    model="embed-english-v3.0",
    cohere_api_key=API_KEY,
)

llm = ChatCohere(
    model="command",
    cohere_api_key=API_KEY,
    temperature=TEMPERATURE,
)

## Data Ingestion

In [4]:
docstore = DeepLake(
    dataset_path="deeplake_vstore",
    embedding=embeddings,
    verbose=False,
    num_workers=4,
)



In [5]:
file = "data/Apple-10K-2023.pdf"
loader = PyPDFLoader(file_path=file)
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=200,
)
pages = loader.load()
chunks = text_splitter.split_documents(pages)

In [6]:
_ = docstore.add_documents(documents=chunks)

Creating 358 embeddings in 1 batches of size 358::   0%|          | 0/1 [00:00<?, ?it/s]

Creating 358 embeddings in 1 batches of size 358:: 100%|██████████| 1/1 [00:05<00:00,  5.80s/it]


## QnA

In [7]:
docstore = DeepLake(
    dataset_path="deeplake_vstore",
    embedding=embeddings,
    verbose=False,
    read_only=True,
    num_workers=4,
)

retriever = docstore.as_retriever(
    search_type="similarity",
    search_kwargs={
        "fetch_k": 20,
        "k": 10,
    }
)

Deep Lake Dataset in deeplake_vstore already exists, loading from the storage


In [8]:
prompt_template = """
You are an intelligent chatbot that can answer user's queries. You will be provided with Relevant 
context based on the user's queries. Your task is to analyze user's query and generate response for 
the query from the context. Make sure to suggest one similar follow-up question based on the context 
for the user to ask.

NEVER generate response to queries for which there is no or irrelevant context.

Context: {context}

Question: {question}

Answer:
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT, "verbose": False}

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    verbose=False,
    chain_type_kwargs=chain_type_kwargs,
)

In [9]:
query = "What are the fiscal year highlights for second half of 2023?"

response = qa.invoke({"query": query})
result = response["result"]

In [10]:
print(result)

The Company’s total net sales were $383.3 billion and net income was $97.0 billion during 2023. 
The Company’s total net sales decreased by 3% or $11.0 billion during 2023 compared to 2022. 
The weakness in foreign currencies relative to the U.S. dollar accounted for more than the entire year-over-year decrease in total net sales, which consisted primarily of lower net sales of Mac and iPhone, partially offset by higher net sales of Services. 

Based on the above context of Apple's financial performance, the second half of the fiscal year 2023 focused on the third and fourth quarters. These quarterly highlights include new product offerings and updates to the company's operating systems. 

The third quarter announced the following new products:

•MacBook Air 15”
•Mac Studio
•Mac Pro
•Apple Vision Pro™, the Company’s first spatial computer featuring its new visionOS™, expected to be available in early 
calendar year 2024
•iOS 17, macOS Sonoma, iPadOS 17, tvOS 17 and watchOS 10, updates 

## QnA Using Cohere Reranker

In [11]:
cohere_rerank = CohereRerank(
    cohere_api_key=API_KEY,
    model="rerank-english-v3.0"
)

In [12]:
docstore = DeepLake(
    dataset_path="deeplake_vstore",
    embedding=embeddings,
    verbose=False,
    read_only=True,
    num_workers=4,
)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=cohere_rerank,
    base_retriever=docstore.as_retriever(
        search_type="similarity",
        search_kwargs={
            "fetch_k": 20,
            "k": 15,
        },
    )
)

Deep Lake Dataset in deeplake_vstore already exists, loading from the storage


In [13]:
prompt_template = """
You are an intelligent chatbot that can answer user's queries. You will be provided with Relevant 
context based on the user's queries. Your task is to analyze user's query and generate response for 
the query from the context. Make sure to suggest one similar follow-up question based on the context 
for the user to ask.

NEVER generate response to queries for which there is no or irrelevant context.

Context: {context}

Question: {question}

Answer:
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT, "verbose": False}

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    verbose=False,
    chain_type_kwargs=chain_type_kwargs,
)

In [14]:
query = "What are the fiscal year highlights for second half of 2023?"

response = qa.invoke({"query": query})
result = response["result"]

In [15]:
print(result)

The Company’s total net sales were $383.3 billion and net income was $97.0 billion during 2023. 
The Company’s total net sales decreased 3% or $11.0 billion during 2023 compared to 2022. 
Significant announcements during the second half of fiscal year 2023, included:
1. MacBook Pro 14”, MacBook Pro 16” and Mac mini;
2. Second-generation HomePod;
3. MacBook Air 15”, Mac Studio and Mac Pro;
4. Apple Vision Pro™, the Company’s first spatial computer featuring its new visionOS™, expected to be available in early calendar year 2024; and
5. iOS 17, macOS Sonoma, iPadOS 17, tvOS 17 and watchOS 10, updates to the Company’s operating systems. 

Based on the provided context, you can also ask the chatbot questions such as: 

1. How much did net sales decrease in the second half of 2023?
2. What type of product announcements were made in the second half of 2023?
3. When would you expect an announcement for Apple Vision Pro?


In [1]:
from src.ingestion import Ingestion

ingestion = Ingestion()



In [2]:
file = "data/Apple-10K-2023.pdf"

ingestion.create_and_add_embeddings(file_path=file)