<a href="https://colab.research.google.com/github/ManjunathAdi/LLMs/blob/main/LLM_RAG_ReAct_CoT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
#!pip install faiss-cpu transformers sentence-transformers

In [6]:
import faiss
from sentence_transformers import SentenceTransformer
import numpy as np
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


# Model Import
We will use "microsoft/Phi-3-mini-4k-instruct" from Hugging Face for the generation task.

In [5]:

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [18]:
# Function to generate text using the model
def generate_text(query, context=""):
    # Concatenate query and context
    input_text = f"Query: {query}\nContext: {context}"
    inputs = tokenizer.encode(input_text, return_tensors="pt")
    # Move inputs to the same device as the model
    inputs = inputs.to(model.device) # Move inputs to GPU if model is on GPU

    outputs = model.generate(inputs, max_length=960, num_return_sequences=1)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)


# Retriever Using FAISS
We will create a FAISS retriever that uses embeddings from a sentence transformer to retrieve the top-k documents.

In [7]:
# Load a sentence-transformer model for embedding
embedder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Sample documents corpus
documents = [
    "Generative AI enables machines to generate human-like text and creative content.",
    "Reinforcement learning involves agents learning by interacting with environments to maximize rewards.",
    "Supervised learning involves training a model on labeled data, where the outcome is already known.",
    "In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties.",
    "GPT models are large-scale generative models that excel in generating human-like text."
]

# Embed the documents using sentence-transformers
doc_embeddings = embedder.encode(documents)

# Create FAISS index for document retrieval
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(doc_embeddings)

# Function to retrieve top-k documents
def retrieve_docs(query, k=2):
    query_embedding = embedder.encode([query])
    _, indices = index.search(query_embedding, k)
    return [documents[i] for i in indices[0]]


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
doc_embeddings.shape

(5, 384)

# Implementing RAG
RAG retrieves relevant documents and generates an answer using the combined query and retrieved context.

In [11]:
# RAG function that retrieves documents and generates text
def rag_generate(query):
    # Step 1: Retrieve top-k documents
    retrieved_docs = retrieve_docs(query, k=2)
    context = " ".join(retrieved_docs)

    # Step 2: Generate response using microsoft/Phi-3-mini-4k-instruct
    generated_output = generate_text(query, context)
    return generated_output

# Example query
query = "Explain how reinforcement learning works."

# Generate output using RAG
output = rag_generate(query)
print("Final Output:", output)




Final Output: Query: Explain how reinforcement learning works.
Context: Reinforcement learning involves agents learning by interacting with environments to maximize rewards. In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties. The goal of the agent is to learn a policy that maximizes the cumulative reward over time.


## Response:Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent takes actions in the environment and receives feedback in the form of rewards or penalties. The goal of the agent is to learn a policy that maximizes the cumulative reward over time.



# Implementing ReAct + RAG
In ReAct, the model reasons about the query, retrieves documents using RAG, and updates its reasoning after retrieving relevant documents.

### Input-Output Example for ReAct + RAG:

* Query: "What is the role of rewards in reinforcement learning?"
* Step 1:
  * Retrieved Documents:
    * "Reinforcement learning involves agents learning by interacting with environments to maximize rewards."
    * "In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties."
  * Reasoning: "In reinforcement learning, rewards help agents learn by providing positive feedback for good actions and penalties for bad actions."
* Step 2:
  * Updated Query: "In reinforcement learning, rewards help agents learn by providing positive feedback for good actions and penalties for bad actions."
  * Reasoning: "Rewards allow the agent to adjust its behavior over time, helping it maximize the total reward received over multiple interactions."

In [14]:
# ReAct + RAG function that alternates reasoning and action (retrieval)
def react_rag(query, max_steps=2):
    reasoning_steps = []
    for step in range(max_steps):
        # Step 1: Retrieve relevant documents
        retrieved_docs = retrieve_docs(query, k=2)
        context = " ".join(retrieved_docs)

        # Step 2: Generate reasoning based on query and context
        reasoning = generate_text(query, context)
        reasoning_steps.append(reasoning)

        # Update the query with the current reasoning for the next step
        query = reasoning
    return reasoning_steps

# Example query for ReAct + RAG
query_react = "What is the role of rewards in reinforcement learning?"

# Output for ReAct + RAG
reasoning_steps = react_rag(query_react)
for i, step in enumerate(reasoning_steps):
    print(f"Step {i + 1}: {step}")


Step 1: Query: What is the role of rewards in reinforcement learning?
Context: In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties. Reinforcement learning involves agents learning by interacting with environments to maximize rewards. The agent's goal is to learn a policy that maps states to actions in a way that maximizes the cumulative reward over time. Rewards are used to guide the learning process and help the agent make better decisions.


## Response:The role of rewards in reinforcement learning is to provide feedback to the agent about the quality of its actions. Rewards are used to guide the learning process and help the agent make better decisions. The agent's goal is to learn a policy that maps states to actions in a way that maximizes the cumulative reward over time.

In reinforcement learning, the agent receives feedback from the environment in the form of rewards or penalties. The agent receives a positive reward when 

# Implementing CoT + RAG
In CoT + RAG, we decompose the query into multiple steps, retrieve documents at each step, and reason over the combined query and retrieved context.

### Input-Output Example for CoT + RAG:
* Query: "Explain the difference between supervised and reinforcement learning."
* Step 1:

  * Retrieved Documents:

    * "Supervised learning involves training a model on labeled data, where the outcome is already known."
    * "Reinforcement learning involves agents learning by interacting with environments to maximize rewards."
  * Reasoning: "Supervised learning uses labeled data, while reinforcement learning relies on interacting with the environment to learn from rewards and penalties."
* Step 2:
  * Updated Query: "Supervised learning uses labeled data, while reinforcement learning relies on interacting with the environment to learn from rewards and penalties."
  * Reasoning: "In supervised learning, the outcome is already known, while in reinforcement learning, the agent must discover the best actions through exploration."

In [19]:
# Chain of Thought (CoT) + RAG for multi-step reasoning
def cot_rag(query, max_steps=3):
    reasoning_steps = []
    for step in range(max_steps):
        # Retrieve relevant documents at each step
        retrieved_docs = retrieve_docs(query, k=2)
        context = " ".join(retrieved_docs)

        # Generate reasoning step based on query and context
        reasoning = generate_text(query, context)
        reasoning_steps.append(reasoning)

        # Update the query with the current reasoning for the next step
        query = reasoning
    return reasoning_steps

# Example query for CoT + RAG
query_cot_rag = "Explain the difference between supervised and reinforcement learning."

# Output reasoning steps
reasoning_steps = cot_rag(query_cot_rag)
for i, step in enumerate(reasoning_steps):
    print(f"Step {i + 1}: {step}")



Step 1: Query: Explain the difference between supervised and reinforcement learning.
Context: Supervised learning involves training a model on labeled data, where the outcome is already known. Reinforcement learning involves agents learning by interacting with environments to maximize rewards.


## Response:Supervised learning and reinforcement learning are two different approaches to machine learning. In supervised learning, a model is trained on a labeled dataset, where the desired output is already known. The model learns to map input data to the corresponding output by minimizing the difference between its predictions and the actual labels. This approach is commonly used for tasks such as image classification, speech recognition, and spam detection.

On the other hand, reinforcement learning involves training an agent to learn how to interact with an environment in order to maximize a reward signal. The agent learns by trial and error, receiving feedback in the form of rewards or p

# Implementing ReAct + CoT + RAG
This combines ReAct, CoT, and RAG, where reasoning alternates with retrieval and is broken into multiple steps.

### Input-Output Example for ReAct + CoT + RAG:
* Query: "What is the purpose of feedback in reinforcement learning?"
* Step 1:
  * Retrieved Documents:
    * "Reinforcement learning involves agents learning by interacting with environments to maximize rewards."
    * "In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties."
  * Reasoning: "Feedback in reinforcement learning helps the agent learn which actions lead to rewards and which lead to penalties."
* Step 2:
  * Updated Query: "Feedback in reinforcement learning helps the agent learn which actions lead to rewards and which lead to penalties."
  * Reasoning: "This feedback allows the agent to adjust its actions over time to maximize the total reward."
* Step 3:
  * Updated Query: "Feedback allows the agent to adjust its actions over time to maximize the total reward."
  * Reasoning: "Without feedback, the agent would not be able to improve its performance over multiple interactions."

In [20]:
# ReAct + CoT + RAG function
def react_cot_rag(query, max_steps=3):
    reasoning_steps = []
    for step in range(max_steps):
        # Retrieve relevant documents
        retrieved_docs = retrieve_docs(query, k=2)
        context = " ".join(retrieved_docs)

        # Generate reasoning based on query and context
        reasoning = generate_text(query, context)
        reasoning_steps.append(reasoning)

        # Update the query with the current reasoning for the next step
        query = reasoning
    return reasoning_steps

# Example query for ReAct + CoT + RAG
query_react_cot_rag = "What is the purpose of feedback in reinforcement learning?"

# Run ReAct + CoT + RAG
reasoning_steps = react_cot_rag(query_react_cot_rag)
for i, step in enumerate(reasoning_steps):
    print(f"Step {i + 1}: {step}")


Step 1: Query: What is the purpose of feedback in reinforcement learning?
Context: In reinforcement learning, agents receive feedback from the environment in the form of rewards or penalties. Reinforcement learning involves agents learning by interacting with environments to maximize rewards. Feedback is crucial for the agent to understand the consequences of its actions and to learn the optimal policy.


## Response:The purpose of feedback in reinforcement learning is to provide the agent with information about the consequences of its actions. This feedback, in the form of rewards or penalties, allows the agent to learn from its experiences and adjust its behavior to maximize the cumulative reward over time. By receiving feedback, the agent can update its internal model of the environment and improve its decision-making process. Ultimately, the goal of reinforcement learning is to find the optimal policy that maximizes the expected cumulative reward. Feedback is essential for the agen

# Summary
* RAG: Retrieves relevant documents based on the query and augments the generation process with external knowledge.
* ReAct + RAG: Alternates between reasoning and retrieval, improving the model’s ability to handle complex queries.
* CoT + RAG: Breaks down reasoning into multiple steps, retrieving relevant documents at each step.
* ReAct + CoT + RAG: Combines all approaches, where reasoning, action (retrieval), and multi-step reasoning (CoT) are used to generate accurate, factually grounded responses.