# Basic RAG (Retrieval Augmented Generation)

In [1]:
# ! pip install faiss-cpu "mistralai>=0.1.2"

### Load API key

In [11]:
dlai_endpoint = "https://api.mistral.ai/v1"
api_key="izPOWi2wtW6ARdOuZihOwjXja6Lzl8a0"

### Get data

- You can go to https://www.deeplearning.ai/the-batch/
- Search for any article and copy its URL.

### Parse the article with BeautifulSoup 

In [4]:
import requests
from bs4 import BeautifulSoup
import re

response = requests.get(
    "https://www.deeplearning.ai/the-batch/a-roadmap-explores-how-ai-can-detect-and-mitigate-greenhouse-gases/"
)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")
tag = soup.find("div", re.compile("^prose--styled"))
text = tag.text
print(text)

How can AI help to fight climate change? A new report evaluates progress so far and explores options for the future.What’s new: The Innovation for Cool Earth Forum, a conference of climate researchers hosted by Japan, published a roadmap for the use of data science, computer vision, and AI-driven simulation to reduce greenhouse gas emissions. The roadmap evaluates existing approaches and suggests ways to scale them up.How it works: The roadmap identifies 6 “high-potential opportunities”: activities in which AI systems can make a significant difference based on the size of the opportunity, real-world results, and validated research. The authors emphasize the need for data, technical and scientific talent, computing power, funding, and leadership to take advantage of these opportunities.Monitoring emissions. AI systems analyze data from satellites, drones, and ground sensors to measure greenhouse gas emissions. The European Union uses them to measure methane emissions, environmental orga

### Optionally, save the text into a text file
- You can upload the text file into a chat interface in the next lesson.
- To download this file to your own machine, click on the "Jupyter" logo to view the file directory.  

In [5]:
file_name = "AI_greenhouse_gas.txt"
with open(file_name, 'w') as file:
    file.write(text)

### Chunking

In [6]:
chunk_size = 512
chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]

In [7]:
len(chunks)

8

### Get embeddings of the chunks

In [18]:
import numpy as np
from mistralai import Mistral

# Set your API key and model
api_key = "izPOWi2wtW6ARdOuZihOwjXja6Lzl8a0"
model = "mistral-embed"

client = Mistral(api_key=api_key)

# Updated function with correct API usage
def get_text_embedding(txt):
    response = client.embeddings.create(model=model, inputs=[txt])
    return response.data[0].embedding

In [19]:
# Example chunks list
chunks = ["This is the first chunk.", "Second text chunk goes here."]

text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])
print(text_embeddings)

[[ 0.00232506  0.02667236  0.05477905 ... -0.00394821 -0.00709915
  -0.00364113]
 [-0.01286316  0.04782104  0.04190063 ... -0.02243042  0.00362968
  -0.02378845]]


In [20]:
text_embeddings

array([[ 0.00232506,  0.02667236,  0.05477905, ..., -0.00394821,
        -0.00709915, -0.00364113],
       [-0.01286316,  0.04782104,  0.04190063, ..., -0.02243042,
         0.00362968, -0.02378845]])

In [21]:
len(text_embeddings[0])

1024

### Store in a vector databsae
- In this classroom, you'll use [Faiss](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/)

In [22]:
import faiss

d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

### Embed the user query

In [23]:
question = "What are the ways that AI can reduce emissions in Agriculture?"
question_embeddings = np.array([get_text_embedding(question)])

In [24]:
question_embeddings

array([[-0.00059366,  0.04180908,  0.04327393, ..., -0.02471924,
         0.01049805,  0.00936127]])

### Search for chunks that are similar to the query

In [25]:
D, I = index.search(question_embeddings, k=2)
print(I)

[[1 0]]


In [26]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['Second text chunk goes here.', 'This is the first chunk.']


In [27]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [36]:
from mistralai import Mistral

model = "mistral-small-latest"

client = Mistral(api_key=api_key)

def mistral(user_message, model=model, is_json=False):
    messages = [{"role": "user", "content": user_message}]

    if is_json:
        chat_response = client.chat.complete(
            model=model,
            messages=messages,
            response_format="json"
        )
    else:
        chat_response = client.chat.complete(
            model=model,
            messages=messages
        )

    return chat_response.choices[0].message.content


# Example usage:
prompt = "What is the capital of France?"
response = mistral(prompt)
print(response)


The capital of France is **Paris**. Paris is known for its rich history, iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral, as well as its cultural and artistic contributions. It is also a major global center for art, fashion, gastronomy, and culture.


In [37]:
response = mistral(prompt)
print(response)

The capital of France is **Paris**. Paris is known for its rich history, iconic landmarks such as the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and its cultural influence on art, fashion, and cuisine.


## RAG + Function calling

In [38]:
def qa_with_context(text, question, chunk_size=512):
    # split document into chunks
    chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]
    # load into a vector database
    text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])
    d = text_embeddings.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embeddings)
    # create embeddings for a question
    question_embeddings = np.array([get_text_embedding(question)])
    # retrieve similar chunks from the vector database
    D, I = index.search(question_embeddings, k=2)
    retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
    # generate response based on the retrieve relevant text chunks

    prompt = f"""
    Context information is below.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {question}
    Answer:
    """
    response = mistral(prompt)
    return response

In [39]:
I.tolist()

[[1, 0]]

In [40]:
I.tolist()[0]

[1, 0]

In [41]:
import functools

names_to_functions = {"qa_with_context": functools.partial(qa_with_context, text=text)}

In [49]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "qa_with_context",
            "description": "Answer user question by retrieving relevant context",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "user question",
                    }
                },
                "required": ["question"],
            },
        },
    },
]

In [45]:
from mistralai import Mistral

model = "mistral-large-latest"

client = Mistral(api_key=api_key)

question = "What are the ways AI can mitigate climate change in transportation?"

messages = [{"role": "user", "content": question}]

response = client.chat.complete(
    model=model,
    messages=messages,
)

print(response.choices[0].message.content)


AI can significantly contribute to mitigating climate change in the transportation sector through various ways. Here are some key areas:

1. **Efficient Traffic Management:**
   - **Real-Time Traffic Prediction:** AI can analyze historical and real-time traffic data to predict congestion and provide alternative routes, reducing idle time and fuel consumption.
   - **Smart Traffic Signals:** AI can optimize traffic signal timing, reducing wait times and decreasing emissions from idling vehicles.

2. **Route Optimization:**
   - **Freight and Logistics:** AI can optimize routes for cargo transportation, reducing the distance traveled and fuel used.
   - **Public Transportation:** AI can improve bus routing and scheduling, making public transportation more efficient and attractive to users.

3. **Predictive Maintenance:**
   - AI can predict when vehicle maintenance is required, preventing breakdowns and extending vehicle lifespan. This can reduce the environmental impact of maintenance a

In [50]:
tool_calls = response.choices[0].message.tool_calls

if tool_calls:
    tool_function = tool_calls[0].function
    print(tool_function)
else:
    print("No tool calls found in the response.")


No tool calls found in the response.


In [None]:
tool_function.name

NameError: name 'tool_function' is not defined

In [None]:
import json

args = json.loads(tool_function.arguments)
args

In [None]:
function_result = names_to_functions[tool_function.name](**args)
function_result

## More about RAG
To learn about more advanced chunking and retrieval methods, you can check out:
- [Advanced Retrieval for AI with Chroma](https://learn.deeplearning.ai/courses/advanced-retrieval-for-ai/lesson/1/introduction)
  - Sentence window retrieval
  - Auto-merge retrieval
- [Building and Evaluating Advanced RAG Applications](https://learn.deeplearning.ai/courses/building-evaluating-advanced-rag)
  - Query Expansion
  - Cross-encoder reranking
  - Training and utilizing Embedding Adapters
