In [None]:
%%capture
!pip install llama-index==0.10.37 cohere==5.5.0 openai==1.30.1 llama-index-embeddings-openai==0.1.9 llama-index-llms-cohere==0.2.0 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 

In [2]:
import os

from getpass import getpass
import nest_asyncio

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv()

True

In [3]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [4]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [5]:
QDRANT_URL = os.environ['QDRANT_URL'] or getpass("Enter your Qdrant URL:")

In [6]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

# Querying

- 📊 Now that you've loaded your data and built an index, it's time to focus on the core of an LLM application: querying.

- 🤖 Querying at its simplest involves making a prompt call to an LLM - this could be asking a question, requesting a summary, or giving more complex instructions.

- 🔗 For more advanced uses, querying can include repeated or chained prompt calls to an LLM, or even a reasoning loop across multiple components.

Let's first instantiate the `qdrant` vector store.

In [11]:
import qdrant_client
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext

# embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY, model_name="text-embedding-3-small")

# JC code
from llama_index.embeddings.cohere import CohereEmbedding
embed_model = CohereEmbedding(api_key=CO_API_KEY, model_name="embed-english-light-v3.0")

# initialize qdrant client
client = qdrant_client.QdrantClient(
    url=QDRANT_URL, 
    api_key=QDRANT_API_KEY,
)

vector_store = QdrantVectorStore(
    client=client, 
    collection_name="it_can_be_done",
    embed_model=embed_model,
)

# assign qdrant vector store to storage context
storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
    )

# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, 
    embed_model=embed_model,
    storage_context=storage_context
)

# 🧐 The `QueryEngine`

A Query Engine is a higher-level construct that uses an `Index` (and by extension, a `Retriever`) to answer queries. 

It not only retrieves the relevant data but also processes it to generate a response to the query. A `Query Engine` uses the `Retriever` to fetch data and then applies additional logic to generate a response.

Here's what happens under the hood:

- 📚 **Retrieval**: Find and return the most relevant documents from the `Index` using strategies like "top-k" semantic retrieval.

- 🔧 **Postprocessing**: Optionally rerank, transform, or filter retrieved Nodes, often based on specific metadata like keywords.

- 🔄 **Response Synthesis**: Combine the query, relevant data, and prompt to generate a response from your LLM.

Note, there are [a wide variety of Query Engines](https://github.com/run-llama/llama_index/tree/main/llama-index-core/llama_index/core/query_engine) available in LlamaIndex. We won't touch on all of them in this course, but I encourage you to explore what's available and think of how you may be able to use them.


In [39]:
from llama_index.llms.cohere import Cohere

# llm = Cohere(model="command-r-plus")

# JC code
llm = Cohere(model="command-r-plus-08-2024", api_key=CO_API_KEY, max_tokens=512)

query_engine = index.as_query_engine(llm=llm, streaming=True)

response = query_engine.query(
    "What do the Abced Joshua Chen Daniel believe?"
)

response.print_response_stream()

I'm sorry, I couldn't find any information about the Abced Joshua Chen Daniel in the provided text.

In [40]:
response.source_nodes
response.response_txt

"I'm sorry, I couldn't find any information about the Abced Joshua Chen Daniel in the provided text."

In [41]:
response.source_nodes[0].get_text()

'Likely feeling pretty blue,\r\n  Being human, same as you,\r\n  But he was brave amid despair,\r\n  And Washington crossed the Delaware!\r\n\r\n  So when you\'re with trouble beset,\r\n  And your spirits are soaking wet,\r\n  When all the sky with clouds is black,\r\n  Don\'t lie down upon your back\r\n  And look at _them_. Just do the thing;\r\n  Though you are choked, still try to sing.\r\n  If times are dark, believe them fair,\r\n  And you will cross the Delaware!\r\n\r\n\r\n_Joseph Morris._\r\n\r\n\r\n\r\n\r\nRABBI BEN EZRA\r\n\r\n(SELECTED VERSES)\r\n\r\n\r\nTo some people success is everything, and the easier it is gained the\r\nbetter. To Browning success is nothing unless it is won by painful\r\neffort. What Browning values is struggle. Throes, rebuffs, even failure\r\nto achieve what we wish, are to be welcomed, for the effects of vigorous\r\nendeavor inweave themselves into our characters; moreover through\r\nstruggle we lift ourselves from the degradation into which the in

In [13]:
response.source_nodes[1].get_text()

'When God at first made Man,\r\n  Having a glass of blessings standing by;\r\n  Let us (said He) pour on him all we can:\r\n  Let the world\'s riches, which disperséd lie,\r\n    Contract into a span.\r\n\r\n    So strength first made a way;\r\n  Then beauty flow\'d, then wisdom, honor, pleasure\r\n  When almost all was out, God made a stay,\r\n  Perceiving that alone, of all His treasure,\r\n    Rest in the bottom lay.\r\n\r\n    For if I should (said He)\r\n  Bestow this jewel also on My creature,\r\n  He would adore My gifts instead of Me,\r\n  And rest in Nature, not the God of Nature.\r\n    So both should losers be.\r\n\r\n    Yet let him keep the rest,\r\n  But keep them with repining restlessness:\r\n  Let him be rich and weary, that at least,\r\n  If goodness lead him not, yet weariness\r\n    May toss him to My breast.\r\n\r\n\r\n_George Herbert._\r\n\r\n\r\n\r\n\r\nA PHILOSOPHER\r\n\r\n\r\n"The web of our life is of mingled yarn, good and ill together," says\r\nShakespeare. 

### Streaming response

In [15]:
response = query_engine.query(
    "What poems by Rudyard Kipling are in this book?"
)

response.print_response_stream()

The poems by Rudyard Kipling included in this book are "If" and "When Earth's Last Picture Is Painted."

### 💬 Chat Engine

In [19]:
chat_engine = index.as_chat_engine(llm=llm)

chat_engine.streaming_chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Assistant: This book is a collection of poems and prose pieces that explore various themes and ideas. The title, "The Scrap Book," suggests that it is a compilation of diverse writings, perhaps collected by the author or editor over time.

The poems cover a range of topics, including life's struggles, perseverance, courage, and the pursuit of truth. For example, "The Fighter" is a poem about facing daily battles against fear and doubt, and how fighting keeps one's spirit strong. "The Game" encourages readers to play the game of life with fairness and integrity, valuing the process more than the outcome. "Courage" emphasizes the importance of maintaining a brave and dignified attitude in the face of life's challenges.

The prose pieces also offer insights and anecdotes. One piece titled "A Good Name" highlights the value of a good reputation and integrity, using a story about Robert E. Lee to illustrate the point. Another piece, "Swel

### Chat modes

#### Simple

Chat with LLM, without making use of a knowledge base. To use this mode set `chat_mode="simple"`.

Corresponds to [`SimpleChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/simple.py). 

#### Condense question

Generate a standalone question from the conversation context and the last message. Then, ask the query engine for a response. To use this mode set `chat_mode="condense_question"`.

Corresponds to [`CondenseQuestionChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_question.py).

#### Context 

Retrieve text from the index based on the user's message. Utilize this context to formulate a response. To use this mode set `chat_mode="context"`.

Corresponds to [`ContextChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/context.py).

#### Condense plus context

Condense a conversation and latest user message to a standalone question. Then build a context for the standalone question from a retriever. Finally, pass the context along with prompt and user message to LLM to generate a response. To use this mode set `chat_mode="condense_plus_context"`.

Corresponds to [`CondensePlusContextChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py).

#### ReACT
Corresponds to [`ReActAgent`](https://github.com/run-llama/llama_index/blob/37c95965426bddae82cec1ad49d3aa82e8bfe819/llama-index-core/llama_index/core/agent/react/base.py#L36).

Use a ReAct agent loop with query engine tools. To use this mode set `chat_mode="react"`.

#### Best

Select the best chat engine based on the current LLM. To use this mode set `chat_mode="best"`.

Corresponds to `OpenAIAgent` if using an OpenAI model that supports function calling API, otherwise, corresponds to `ReActAgent`.

In [20]:
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chat_engine = index.as_chat_engine(
    llm=llm,
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about a book of poems called 'It Can Be Done'."
    ),
)

chat_engine.streaming_chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Assistant: 'It Can Be Done' is a collection of poems that explores various themes related to human experiences, emotions, and the human spirit. The poems in this book seem to be written by different authors, each offering their unique perspectives and insights.

The title itself, 'It Can Be Done', suggests a theme of determination, perseverance, and the power of the human will. Many of the poems appear to be motivational and inspirational, encouraging readers to face challenges, overcome obstacles, and achieve their goals.

Some of the poems, like "The Fighter" and "Courage," focus on the struggles and battles we face in life, emphasizing the importance of resilience and a fighting spirit. Others, such as "To Youth After Pain," offer solace and hope in the face of grief and suffering, reminding readers that pain is a part of life and can lead to growth and empathy.

The book also seems to touch on themes of self-improvement, personal

# Customizing Querying

- 🔧 **Customizing Retrieval**: Use LlamaIndex's low-level composition API to adjust `top_k` value for more granular control over query results.

- 📈 **Adding Post-Processing**: Implement a step to ensure only nodes meeting a minimum similarity score are included, balancing between data richness and relevance.

- 🎚️ **SimilarityPostprocessor**: Set a similarity score threshold, compatible only with embedding-based retrievers, to ensure high relevance.

In [25]:
from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# configure a retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

# configure a post processor
similarity_processor = SimilarityPostprocessor(similarity_cutoff=0.32)

# configure a response sythesizer
response_synthsizer = get_response_synthesizer(llm=llm)

# create a query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthsizer,
    node_postprocessors=[similarity_processor],
)

In [26]:
query_engine.query("Compare the portrayal of internal versus external battles in the narratives and poems")

Response(response='The provided text contains a collection of poems and narratives that explore various themes, including the power of the human spirit, the importance of perseverance, and the value of inner strength. While some pieces focus on external battles, such as the poem "Battle Cry," which speaks of the courage and determination required in physical combat, many delve into the internal struggles and triumphs of the human experience.\n\nOne recurring theme is the idea that true strength lies within the individual, and that external circumstances should not define one\'s character. For instance, "The Quitter" encourages resilience in the face of adversity, emphasizing that it\'s not the end result but the act of fighting that matters. Similarly, "The Call of the Unbeaten" highlights the importance of "stick-to-it-iveness" and determination in overcoming obstacles.\n\nThe poem "Know Thyself" takes this idea further, suggesting that the greatest battles are fought within the self.

In [16]:
client.close()

# JC Takeaway

Re-implement the whole thing:

1. Load env variables from .env;
2. Build a RAG chatbot using Cohere LLM and the "it_can_be_done" collection that we stored in Qdrant vector database.


In [49]:
import os
import nest_asyncio
import dotenv
import getpass

import qdrant_client as qdrant
import llama_index.core as li
import llama_index.vector_stores.qdrant as vs_qdrant  
import llama_index.embeddings.cohere as emb_cohere
import llama_index.llms.cohere as llm_cohere

nest_asyncio.apply()
dotenv.load_dotenv()

CO_API_KEY = os.getenv('CO_API_KEY') or getpass.getpass("Enter your Cohere API key: ")
QDRANT_URL = os.getenv('QDRANT_URL') or getpass.getpass("Enter your Qdrant URL:")
QDRANT_API_KEY = os.getenv('QDRANT_API_KEY') or getpass.getpass("Enter your Qdrant API Key: ")

embed_model = emb_cohere.CohereEmbedding(
    api_key=CO_API_KEY, 
    model_name="embed-english-light-v3.0"
)
llm = llm_cohere.Cohere(
    model="command-r-plus-08-2024", 
    api_key=CO_API_KEY, 
    max_tokens=512
)
qdrant_client = qdrant.QdrantClient(
    api_key=QDRANT_API_KEY,
    url=QDRANT_URL, 
)
vector_store = vs_qdrant.QdrantVectorStore(
    client=qdrant_client, 
    collection_name="it_can_be_done",
#    embed_model=embed_model,
)
# storage_context = li.
index = li.VectorStoreIndex.from_vector_store(
    vector_store=vector_store, 
    embed_model=embed_model
)

query_engine = index.as_query_engine(llm=llm, streaming=True)
response = query_engine.query(
    "What's the shortest poem in this book?"
)
response.print_response_stream()

# index.as_chat_engine(llm=llm).streaming_chat_repl()

The shortest poem in this collection is "Keep A-Goin'!"