In [1]:
%%capture 
# !pip install llama-index==0.10.37 llama-index-llms-cohere==0.2.0 
!pip install -U llama-index llama-index-llms-cohere cohere


In [2]:
import os

from getpass import getpass
import nest_asyncio

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv()

True

In [3]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [4]:
print(CO_API_KEY)

bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ


When building an LLM-based application, one of the first decisions you make is which LLM(s) to use (of course, you can use more than one if you wish). 

The LLM will be used at various stages of your pipeline, including

- During indexing:
  - üë©üèΩ‚Äç‚öñÔ∏è To judge data relevance (to index or not).
  - üìñ Summarize data & index those summaries.

- During querying:
  - üîé Retrieval: Fetching data from your index, choosing the best data source from options, even using tools to fetch data.
  
  - üí° Response Synthesis: Turning the retrieved data into an answer, merge answers, or convert data (like text to JSON).

LlamaIndex gives you a single interface to various LLMs. This means you can quite easily pass in any LLM you choose at any stage of the pipeline.

In this course we'll primiarly use OpenAI. You can see a full list of LLM integrations [here](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules.html) and use your LLM provider of choice. 

# Basic Usage

You can call `complete` with a prompt

In [15]:
import cohere

co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

res = co.chat(
    model="command-a-03-2025",
    messages=[
        {
            "role": "user",
            "content": "Alexander the Great was a",
        }
    ],
)

print(res.message.content[0].text)

Alexander the Great, also known as Alexander III of Macedon, was a king of the ancient Greek kingdom of Macedon. He is one of the most famous and successful military commanders in history, known for his unprecedented campaign of conquests that stretched from Greece to northwestern India. Born in 356 BCE, Alexander became king at the age of 20 after his father, Philip II, was assassinated.

Under Alexander's leadership, he:

1. **Conquered the Persian Empire**: He defeated the mighty Persian Empire, led by Darius III, in a series of decisive battles, including the Battle of Issus (333 BCE) and the Battle of Gaugamela (331 BCE).

2. **Spread Hellenistic Culture**: His conquests led to the spread of Greek culture, language, and ideas across a vast area, known as the Hellenistic period.

3. **Founded Numerous Cities**: Alexander founded over 20 cities, the most famous being Alexandria in Egypt, which became a major center of learning and culture.

4. **Reached the Indus Valley**: His campa

In [16]:
import cohere

co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

res = co.chat(
    model="command-a-03-2025",
    messages=[
        {
            "role": "user",
            "content": "Write a title for a blog post about API design. Only output the title text.",
        }
    ],
)

print(res.message.content[0].text)
# "The Ultimate Guide to API Design: Best Practices for Building Robust and Scalable APIs"


"Mastering API Design: Best Practices for Scalability, Security, and Developer Experience"


In [None]:
# from llama_index.llms.cohere import Cohere

# llm = Cohere(model="command-r-plus", temperature=0.2)

# response = llm.predict("Alexander the Great was a")
# # response = llm.chat("Alexander the Great was a")
# # response = llm.chat(
# #     messages=[
# #         {"role": "user", "content": "Alexander the Great was a"}
# #     ]
# # )


# # print(response)
# print(response)



ValidationError: 1 validation error for LLMPredictStartEvent
template
  Input should be a valid dictionary or instance of BasePromptTemplate [type=model_type, input_value='Alexander the Great was a', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/model_type

# Prompt templates

- ‚úçÔ∏è A prompt template is a fundamental input that gives LLMs their expressive power in the LlamaIndex framework.

- üíª It's used to build the index, perform insertions, traverse during querying, and synthesize the final answer.

- ü¶ô LlamaIndex has several built-in prompt templates.

- üõ†Ô∏è Below is how you can create one from scratch.


In [None]:
import cohere
from llama_index.core import PromptTemplate

template = """Write a song about {thing} in the style of {style}."""

prompt = template.format(thing="a broken xylophone", style="parody rap") 

co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

res = co.chat(
    model="command-a-03-2025",
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
)

print(res.message.content[0].text)

**"Broken Xylophone Blues"**  
*(Parody Rap in the style of Eminem‚Äôs "Lose Yourself")*  

*[Beat drops, xylophone clangs awkwardly in the background]*  

**Verse 1:**  
Yo, it‚Äôs the story of a xylophone, once the life of the party,  
Now it‚Äôs sittin‚Äô in the corner, soundin‚Äô like a fartin‚Äô Harley.  
Bar 1‚Äôs cracked, bar 3‚Äôs missin‚Äô, bar 5‚Äôs got a chip,  
Tryna play a melody, but it‚Äôs soundin‚Äô like a slip.  
Used to be the star of the orchestra, now it‚Äôs just a joke,  
Every time I hit it, it‚Äôs like, ‚ÄúYo, this thing‚Äôs broke!‚Äù  
Teacher‚Äôs givin‚Äô me the side-eye, like, ‚ÄúWhat‚Äôs your excuse?‚Äù  
I‚Äôm like, ‚ÄúChill, it‚Äôs not my fault, it‚Äôs got the broken xylophone blues!‚Äù  

**Chorus:**  
Broken xylophone, can‚Äôt hit the right note,  
Soundin‚Äô like a dying goose, man, this thing‚Äôs a joke.  
Broken xylophone, sittin‚Äô in the band room,  
Used to be a legend, now it‚Äôs just a bum tune.  

**Verse 2:**  
Tried to fix it with some tape, bu

# üí≠ Chat Messages

In [5]:
from llama_index.core.llms import ChatMessage
from llama_index.llms.cohere import Cohere

llm = Cohere(model="command-a-03-2025")

messages = [
    ChatMessage(role="system", content="You're an travel advisor. Who can suggest places to visit in India in winters"),
    ChatMessage(role="user", content="Hey, travel advisor, can you help me in trip planning in india"),
]

response = llm.chat(messages)

print(response)

assistant: Absolutely! India is a diverse and vibrant country with a plethora of destinations that are perfect for a winter trip. Whether you're looking for snowy mountains, sunny beaches, cultural experiences, or historical sites, India has something for everyone. Here are some fantastic winter destinations to consider:

### **1. Kashmir ‚Äì The Paradise on Earth**
- **Why Visit:** Known as "Paradise on Earth," Kashmir is breathtaking in winter with snow-covered landscapes, frozen Dal Lake, and opportunities for skiing in Gulmarg.
- **Highlights:** Dal Lake, Gulmarg, Pahalgam, Sonamarg, and local Kashmiri cuisine.
- **Best Time:** December to February.

### **2. Manali and Shimla ‚Äì Himalayan Getaways**
- **Why Visit:** These hill stations in Himachal Pradesh offer snow-clad mountains, adventure activities, and cozy vibes.
- **Highlights:** Rohtang Pass, Solang Valley (Manali), Mall Road, Kufri (Shimla).
- **Best Time:** December to February.

### **3. Goa ‚Äì Beach Paradise**
- **Wh

# Chat Prompt Templates 

In [26]:
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core import ChatPromptTemplate
import cohere

# llm = Cohere(model="command-r-plus")

chat_template = [
    ChatMessage(role=MessageRole.SYSTEM,content="You always answers questions with as much detail as possible."),
    ChatMessage(role=MessageRole.USER, content="{question}")
    ]

chat_prompt = ChatPromptTemplate(chat_template)

# response = llm.complete(chat_prompt.format(question="How far did Alexander the Great go in his conquests?"))
co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

res = co.chat(
    model="command-a-03-2025",
    messages=[
        {
            "role": "user",
            "content": chat_prompt.format(question="How far did Alexander the Great go in his conquests?"),
        }
    ],
)

print(res.message.content[0].text)

# print(response)

Alexander the Great, one of history's most renowned military commanders and conquerors, embarked on an extraordinary campaign of conquest that spanned over a decade and covered an immense geographical area. His empire stretched from Greece in the west to northwestern India in the east, encompassing a vast array of territories and cultures. Here‚Äôs a detailed breakdown of the extent of his conquests:

### **1. Early Campaigns (336‚Äì334 BCE):**
- **Greece and the Balkans:** After consolidating power in Macedonia following his father Philip II's assassination, Alexander secured his position in Greece by quelling rebellions in Thebes and Athens.
- **Asia Minor (Modern Turkey):** Alexander crossed the Hellespont (Dardanelles) in 334 BCE, defeating the Persian forces at the **Battle of the Granicus River**. He then liberated Greek cities along the coast, including Sardis, Ephesus, and Miletus.

### **2. Conquest of the Persian Empire (333‚Äì330 BCE):**
- **Syria and Egypt:** Alexander defe

# Streaming Output

In [43]:
#In Cohere, "Chat with Streaming" refers to using their chat API endpoint (like co.chat()) in streaming mode, where the model sends partial outputs (tokens or text chunks) to your application in real-time as it generates them ‚Äî instead of waiting for the full response to be ready

In [5]:
import cohere

co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

response = co.chat_stream(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Explain how transformers work in simple terms"}]
)


In [6]:
response 

<generator object V2Client.chat_stream at 0x11d5e4c80>

In [7]:
for event in response:
    # print(event)
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="")

Sure! Transformers are a type of neural network architecture that has revolutionized natural language processing (NLP) and other areas of machine learning. Here‚Äôs a simple breakdown of how they work:

### 1. **Input as Words (Tokens)**
   - Transformers take text as input, but instead of processing it as raw characters, they break it down into smaller units called **tokens** (usually words or subwords).
   - Each token is converted into a numerical representation called an **embedding**, which captures its meaning in a way the computer can understand.

### 2. **Attention Mechanism**
   - The key innovation of transformers is the **attention mechanism**. Instead of processing words one by one, the model looks at all words in a sentence simultaneously and figures out which words are most important for understanding each other.
   - For example, in the sentence "The cat sat on the mat," the word "cat" might pay more attention to "sat" and "mat" because they are closely related.
   - Thi

In [23]:
# from llama_index.llms.cohere import Cohere
# from llama_index.core.llms import ChatMessage, MessageRole

# llm = Cohere(model="command-a-03-2025")

messages = [
    {"role": "system", "content": "You're a great historian bot."},
    {"role": "user", "content": "When did Alexander the Great arrive in China?"}
]

# response = llm.stream_chat(messages)
#########

import cohere

co = cohere.ClientV2(api_key="bn1KLBUUL5x8D8a6DyqGUEUTQqOkgYSFQp0CV3vJ")

response = co.chat_stream(
    model="command-a-03-2025",
    messages=messages
)

# response = co.chat_stream(
#     model="command-a-03-2025",
#     messages=[{"role": "user", "content": "Explain how transformers work in simple terms"}]
# )

# print(response)
# # for r in response:
# #     print(r.delta, end="")

In [24]:
response

<generator object V2Client.chat_stream at 0x165ca6030>

In [25]:
for event in response:
    # print(event)
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="")

Alexander the Great never actually arrived in China. His empire, at its peak, stretched from Greece in the west to India in the east, but it did not extend as far as China. Alexander's easternmost campaigns took him to the Indus River valley in what is now Pakistan and parts of northwestern India. He died in 323 BCE in Babylon (modern-day Iraq), and his empire was subsequently divided among his generals, known as the Diadochi.

China, during Alexander's time, was in the Warring States period (475‚Äì221 BCE), and there is no historical record of any direct contact between Alexander's forces and the Chinese states. The first recorded contact between the Greco-Roman world and China came much later, during the Han Dynasty (206 BCE‚Äì220 CE), when the Romans and the Chinese established indirect trade and diplomatic relations along the Silk Road.

# üí¨ Chat Engine


In [27]:
from llama_index.core.chat_engine import SimpleChatEngine
from llama_index.llms.cohere import Cohere
llm = Cohere(model="command-a-03-2025")

chat_engine = SimpleChatEngine.from_defaults(llm=llm)

chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Assistant: Learning AI (Artificial Intelligence) is an exciting and rewarding journey! Here‚Äôs a step-by-step guide to help you get started:

### 1. **Understand the Basics**
   - **What is AI?** Familiarize yourself with the core concepts of AI, including machine learning, deep learning, neural networks, and natural language processing.
   - **Mathematics Foundation:** Brush up on essential math topics like linear algebra, calculus, probability, and statistics. These are crucial for understanding AI algorithms.
   - **Programming Skills:** Learn a programming language commonly used in AI, such as Python. Python is highly recommended due to its simplicity and extensive libraries like TensorFlow, PyTorch, and scikit-learn.

### 2. **Online Courses and Resources**
   - **Coursera, edX, Udacity:** Platforms like these offer courses from top universities and companies. Look for courses like:
     - *Machine Learning* by Andrew Ng (Cours