<a href="https://colab.research.google.com/github/athlour/Gen-AI/blob/main/RAG_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **RAG Implementation Using Mistral 7B or Phi-2 in Google Colab**

This guide covers the implementation of a Retrieval-Augmented Generation (RAG) system using Google Colab, which includes the following steps:

1. **Data Preparation:** Extract documents from Wikipedia using wikipedia-api.

2. **Document Chunking**: Split documents into smaller, manageable chunks.

3. **Retriever:** Perform semantic search using sentence-transformers and FAISS.

4. **Re-Ranker:** Use Cross-Encoders for better retrieval accuracy.

5. **Generator:** Use Mistral 7B or Phi-2 for answer generation.

**Prerequisites**

Ensure your Colab runtime is set to GPU

* **Go to Runtime → Change runtime type → GPU.**











# **Step 1: Install Required Libraries**


In [2]:
!pip install wikipedia-api
!pip install chromadb
!pip install faiss-cpu
!pip install sentence-transformers
!pip install transformers torch
!pip install bitsandbytes accelerate
!pip install huggingface_hub



# **Step 2: Extract Wikipedia Pages**

In [16]:
import wikipediaapi
import logging

logging.basicConfig(level=logging.INFO)

def get_wikipedia_page(title):
    try:
        # Provide a proper User-Agent with contact info
        user_agent = "MyWikipediaClient/1.0 (Contact: your-email@example.com)"
        wiki_wiki = wikipediaapi.Wikipedia(user_agent=user_agent, language='en')
        page = wiki_wiki.page(title)
        if not page.exists():
            logging.warning(f"Page '{title}' not found.")
            return None
        return page.text
    except Exception as e:
        logging.error(f"Error fetching page '{title}': {e}")
        return None

# Example: Fetch Wikipedia pages
titles = ["List of Tamil films of 2025"]
#titles = ["Stock market", "Day trading","Artificial intelligence", "Machine learning"]

wiki_pages = {title: get_wikipedia_page(title) for title in titles}

# Print the first 500 characters for verification
for title, content in wiki_pages.items():
    if content:
        print(f"\n--- {title} ---\n{content[:500]}...\n")



--- List of Tamil films of 2025 ---
This is a list of Tamil language films produced in the Tamil cinema in India that are to be released/scheduled in 2025.

Box office collection
The following is the list of highest-grossing Tamil cinema films released in 2025. The rank of the films in the following table depends on the estimate of worldwide collections as reported by organizations classified as green by Wikipedia. There is no official tracking of domestic box office figures within India.

January–March
April–June
Upcoming release...



# **Step 3: Perform Document Chunking**
Use NLTK to break large text into smaller chunks

In [1]:
!pip uninstall -y nltk
!pip install nltk


Found existing installation: nltk 3.9.1
Uninstalling nltk-3.9.1:
  Successfully uninstalled nltk-3.9.1
Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: nltk
Successfully installed nltk-3.9.1


In [4]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
nltk.download('punkt')
nltk.download('punkt_tab')  # Attempting to download 'punkt_tab'


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [5]:
nltk.data.path.append('/root/nltk_data')

In [6]:
import logging
logging.basicConfig(level=logging.INFO)

def chunk_text(text, max_tokens=512):
    sentences = sent_tokenize(text)
    current_chunk = []
    current_length = 0
    chunks = []

    for sentence in sentences:
        sentence_length = len(word_tokenize(sentence))

        # Handle sentences longer than max_tokens
        if sentence_length > max_tokens:
            logging.warning(f"A single sentence exceeds {max_tokens} tokens. Splitting sentence.")
            words = word_tokenize(sentence)
            for i in range(0, len(words), max_tokens):
                chunks.append(" ".join(words[i:i + max_tokens]))
            continue

        if current_length + sentence_length > max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_length = 0

        current_chunk.append(sentence)
        current_length += sentence_length

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

# Chunk all Wikipedia data
chunked_data = {}
for title, content in wiki_pages.items():
    if content:
        chunks = chunk_text(content)
        chunked_data[title] = chunks
        logging.info(f"{title}: {len(chunks)} chunks created.")


In [7]:
chunked_data

{'Stock market': ['A stock market, equity market, or share market is the aggregation of buyers and sellers of stocks (also called shares), which represent ownership claims on businesses; these may include securities listed on a public stock exchange as well as stock that is only traded privately, such as shares of private companies that are sold to investors through equity crowdfunding platforms. Investments are usually made with an investment strategy in mind. Size of the market\nThe total market capitalization of all publicly traded stocks worldwide rose from US$2.5 trillion in 1980 to US$111 trillion by the end of 2023. As of 2016, there are 60 stock exchanges in the world. Of these, there are 16 exchanges with a market capitalization of $1 trillion or more, and they account for 87% of global market capitalization. Apart from the Australian Securities Exchange, these 16 exchanges are all in North America, Europe, or Asia. By country, the largest stock markets as of January 2022 are 

# **Step 4: Store Data in ChromaDB**
Generate embeddings using SentenceTransformer and store chunks in ChromaDB:

In [8]:
import chromadb
from sentence_transformers import SentenceTransformer

# Initialize ChromaDB
client = chromadb.Client()
collection = client.create_collection("wiki_rag")

# Load SentenceTransformer for embeddings
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Store chunks in ChromaDB
for title, chunks in chunked_data.items():
    for i, chunk in enumerate(chunks):
        embedding = embedding_model.encode(chunk)
        doc_id = f"{title}_{i}"
        collection.add(ids=[doc_id], embeddings=[embedding.tolist()], documents=[chunk])

print("Data stored in ChromaDB.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Data stored in ChromaDB.


# **Step 5: Perform Semantic Search**

In [9]:
def retrieve_context(query, n_results=5):
    query_embedding = embedding_model.encode([query])
    results = collection.query(
        query_embeddings=query_embedding.tolist(),
        n_results=n_results
    )
    return results['documents'][0]

query = "How is AI used ?"
retrieved_docs = retrieve_context(query)
print("Retrieved Documents:\n", "\n".join(retrieved_docs))


Retrieved Documents:
 Artificial intelligence (AI) refers to the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs. High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtere

# **Step 6: Re-Rank Results Using Cross-Encoder**

In [10]:
from sentence_transformers import CrossEncoder

re_ranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

def re_rank(query, documents):
    pairs = [[query, doc] for doc in documents]
    scores = re_ranker.predict(pairs)
    sorted_docs = [doc for _, doc in sorted(zip(scores, documents), reverse=True)]
    return sorted_docs

re_ranked_docs = re_rank(query, retrieved_docs)
print("Top Re-ranked Document:", re_ranked_docs[0][:500])


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Top Re-ranked Document: A few examples are energy storage, medical diagnosis, military logistics, applications that predict the result of judicial decisions, foreign policy, or supply chain management. AI applications for evacuation and disaster management are growing. AI has been used to investigate if and how people evacuated in large scale and small scale evacuations using historical data from GPS, videos or social media. Further, AI can provide real time information on the real time evacuation conditions. In agricu


In [11]:
from huggingface_hub import notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [12]:
!huggingface-cli whoami

athlour


# **Load Mistral 7B with Quantization (Memory Efficient)**

In [13]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Configure 4-bit Quantization for Efficient Loading
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

# Load the Model and Tokenizer
model_name = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda", quantization_config=quant_config)

print("Mistral 7B Loaded Successfully with 4-bit Quantization.")


tokenizer_config.json:   0%|          | 0.00/996 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Mistral 7B Loaded Successfully with 4-bit Quantization.


# **Generate Responses with RAG**

In [14]:
import torch

def generate_response(query, context):
    prompt = f"Based on the following context:\n{context}\nAnswer the question: {query}"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, top_p=0.9)

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
query = "Explain how AI is used in trading."
context = "AI in trading is used for algorithmic trading, market sentiment analysis, and predictive modeling using historical data."
response = generate_response(query, context)

print("Response:", response)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Response: Based on the following context:
AI in trading is used for algorithmic trading, market sentiment analysis, and predictive modeling using historical data.
Answer the question: Explain how AI is used in trading.

#Explain #AI #used #trading

The best answers of Explain how AI is used in trading.

AI in trading is used for algorithmic trading, market sentiment analysis, and predictive modeling using historical data.

In algorithmic trading, AI can analyze market data and make buy and sell decisions based on the analysis. This can be done using various algorithms such as machine learning, neural networks, and reinforcement learning.

In market sentiment analysis, AI can analyze social media, news articles, and other data sources to understand the mood of the market and make trading decisions based on that information.

In predictive modeling, AI can use historical data to predict future market trends and make trading decisions based on those predictions.

This way AI is used to he