<a href="https://colab.research.google.com/github/MohamedAhmedGalal/Whatsapp_RAG_Bot/blob/main/olama_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **🔧 Install & Launch Ollama + LangChain Environment**

This cell sets up everything needed to run a local RAG chatbot inside Colab using Ollama and LangChain:

Install Ollama

Uses a Linux shell script to download and install Ollama directly in the Colab environment.

Start the Ollama server

Runs ollama serve in the background using a Python thread.

Keeps the model API active so that LangChain can connect and send prompts.

Install Python dependencies

langchain, langchain-community, and langchain-ollama for building the chatbot pipeline.

chromadb for storing and retrieving document embeddings.

sentence-transformers for creating text embeddings.

pypdf (optional) for parsing PDF files.

tiktoken for token counting and chunking.

After running this cell, the Ollama service will be ready to accept requests from LangChain, allowing you to build a private, local AI chatbot without relying on external APIs.

In [None]:
# Install Ollama, LangChain, and helpers
!curl -fsSL https://ollama.com/install.sh | sh  # Install Ollama (Linux-based in Colab)

# Start Ollama service in background
import subprocess
import threading
import time

def run_ollama():
    subprocess.run(["ollama", "serve"], capture_output=True)

ollama_thread = threading.Thread(target=run_ollama)
ollama_thread.daemon = True
ollama_thread.start()

time.sleep(10)  # Wait for server to start

# Install Python deps
!pip install langchain langchain-community langchain-ollama chromadb sentence-transformers pypdf  # pypdf if needed, but TXT is fine
!pip install tiktoken  # For tokenization

print("Setup complete! Ollama server running.")

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-ollama
  Downloading langchain_ollama-0.3.8-py3-none-any.whl.metadata (2.1 kB)
Collecting chromadb
  Downloading chromadb-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)



---

### 🌍 Download & Verify the ALLaM:7b Model

This cell prepares the **Arabic-tuned Large Language Model (ALLaM:7b)** for use with Ollama:

1. **Pull the model**

   * `!ollama pull iKhalid/ALLaM:7b` downloads the ALLaM:7b weights (\~4 GB).
   * This model is optimized for **Arabic language understanding and generation**, making it well-suited for chatting in Arabic or bilingual contexts.

2. **Verify the installation**

   * `!ollama list` displays all models currently available in your Colab environment, confirming that ALLaM:7b is ready to be loaded and used in the chatbot pipeline.


In [None]:
# Pull ALLaM:7b (Arabic-tuned, ~4GB download)
!ollama pull iKhalid/ALLaM:7b

# Verify
!ollama list

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A


---

### 💬 Upload & Extract Your WhatsApp Chat

This cell uploads and processes a **WhatsApp chat export** to capture **only your own messages**.
The goal is to build a dataset that reflects **your unique writing style, tone, and vocabulary**, so the chatbot can mimic how **you** actually chat.

**Steps performed:**

1. **File Upload**

   * `files.upload()` lets you select your exported `whatsapp_chat.txt` file.
   * The export should be in plain text format (from WhatsApp’s *Export Chat* feature, without media).

2. **Message Parsing**

   * Reads the file line by line and extracts only the messages sent by **you** (based on your name in the export).
   * Handles multi-line messages by concatenating them properly.

3. **Output**

   * Combines all of your messages into a single text block (`chat_text`) ready for embedding and training.
   * This ensures the chatbot learns **your lingo, expressions, and style** instead of generic responses.


In [None]:
from google.colab import files
import os

# Upload if not already done
uploaded = files.upload()  # Select your whatsapp_chat.txt
chat_file = list(uploaded.keys())[0]  # Assumes one file

# Load and parse chat (filter YOUR messages for style mimicry)
def load_whatsapp_chat(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()

    your_messages = []
    current_msg = ""
    for line in lines:
        if line.startswith('['):  # Timestamp line
            if current_msg:
                your_messages.append(current_msg.strip())
                current_msg = ""
            if '- You:' in line or '- M. Galal:' in line:  # Adjust 'You' if your export uses your name
                # Extract message after timestamp and name
                msg_start = line.find(']') + 1
                current_msg = line[msg_start:].strip()
            else:
                current_msg = ""  # Skip friend's messages
        else:
            current_msg += " " + line.strip()

    if current_msg:
        your_messages.append(current_msg.strip())

    # Combine into full text (your thoughts/sense)
    full_text = "\n\n".join(your_messages)
    return full_text

chat_text = load_whatsapp_chat(chat_file)
print(f"Loaded {len(chat_text)} chars of YOUR messages.")
print(chat_text[:500] + "...")  # Preview

Saving WhatsApp Chat with ) osama khalil.txt to WhatsApp Chat with ) osama khalil.txt
Loaded 267920 chars of YOUR messages.
27/08/2016, 3:34 pm - Messages and calls are end-to-end encrypted. Only people in this chat can read, listen to, or share them. Learn more. 27/08/2016, 3:34 pm - M. Galal: في مشكله عندك لو عزمت سراج !!!!! علي ما افتكر من زمان كدة كان في مشاكل بينكم .... او رامي علام ...او هيثم .... 27/08/2016, 3:53 pm - :) osama khalil: خالص 27/08/2016, 3:53 pm - :) osama khalil: ولا اي حاجه 27/08/2016, 3:53 pm - :) osama khalil: حبيبي 27/08/2016, 4:58 pm - :) osama khalil: <Media omitted> 12/09/2016, 7:15 am - ...




### 🔗 Build the RAG (Retrieval-Augmented Generation) Chatbot

This cell creates the **core pipeline** that powers the WhatsApp-style chatbot.
It combines **semantic search** with a **local large language model (LLM)** to generate responses that sound like **you**.

**Key Steps:**

1. **Text Chunking**

   * Splits your exported WhatsApp messages (`chat_text`) into overlapping chunks (500 characters each with 100 overlap).
   * This ensures the model can handle long conversations and retain context.

2. **Semantic Embeddings**

   * Uses `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` to convert each chunk into a high-dimensional vector.
   * Embeddings capture the **meaning** of your words, not just exact matches.

3. **Vector Store (Chroma)**

   * Stores all the embeddings in a local Chroma database (`./chroma_db`).
   * Enables **fast, similarity-based search** to retrieve the most relevant chunks whenever a query is asked.

4. **Local LLM (Ollama)**

   * Loads the `iKhalid/ALLaM:7b` model, a 7B-parameter Arabic-friendly language model.
   * Generates natural, context-aware answers while respecting your **style, phrasing, and tone**.

5. **Custom Prompt**

   * Guides the LLM to respond **as if it were you**, using the retrieved WhatsApp messages as context.
   * Example instruction:

     > “You are responding as M. Galal, based on their past thoughts and style from WhatsApp chats.”

6. **QA Chain**

   * Connects all components:

     * **Retriever** → finds the most relevant messages.
     * **Prompt** → injects them into a carefully designed instruction.
     * **LLM** → generates a reply that feels like a real WhatsApp response from you.

This pipeline allows anyone to **chat with your digital persona**, with replies grounded in the **actual messages you wrote**.


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_ollama import OllamaLLM
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# (Assuming chat_text is loaded and split into chunks, e.g., from load_whatsapp_chat)
# Example: chat_text = load_whatsapp_chat("whatsapp_chat.txt")
# splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# chunks = splitter.split_text(chat_text)
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_text(chat_text)

# 1. Embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    model_kwargs={'device': 'cuda'}
)

# 2. Vector store
vectorstore = Chroma.from_texts(
    texts=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 3. LLM
llm = OllamaLLM(model="iKhalid/ALLaM:7b", temperature=0.7)

# 4. Custom Prompt
prompt_template = """
You are responding as {user_name}, based on their past thoughts and style from WhatsApp chats.
Use casual, personal language like in these examples from {user_name}'s messages:

{context}

Now, respond to this query in {user_name}'s voice: thoughtful, direct, and true to their sense.
Query: {question}
Response:
"""
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question", "user_name"]
)

# 5. Create a custom chain to handle inputs correctly
def format_inputs(inputs):
    # Extract context from retriever output and format with query and user_name
    context = "\n\n".join([doc.page_content for doc in inputs["context"]])
    return {
        "context": context,
        "question": inputs["query"],
        "user_name": inputs["user_name"]
    }

# Chain: Pass only query to retriever, preserve user_name for prompt
qa_chain = (
    {
        "context": lambda x: retriever.invoke(x["query"]),  # Pass only query to retriever
        "query": RunnablePassthrough(),
        "user_name": RunnablePassthrough()
    }
    | RunnableLambda(format_inputs)
    | PROMPT
    | llm
)

# 6. Set your name
user_name = "M. Galal"



### 🔧 Installing and Running **Ollama** inside this Colab session

This cell installs **Ollama**, starts its local server, and checks that it’s running:

1. **Install Ollama**
   Uses `curl -fsSL https://ollama.com/install.sh | sh` to download and install the Ollama binary in the current Colab runtime.

2. **Launch the Ollama server**
   Starts `ollama serve` in a **background Python thread**, allowing the notebook to continue while the server runs.

3. **Startup delay**
   Waits \~10 seconds to give the server time to initialize.

4. **Verify server status**
   Sends a `curl` request to `http://localhost:11434` to confirm that the Ollama API is alive.
   If successful, you’ll see: **“Ollama server is running!”**




In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in the background
import subprocess
import threading
import time

def run_ollama():
    subprocess.run(["ollama", "serve"], capture_output=True)

# Start Ollama in a separate thread
ollama_thread = threading.Thread(target=run_ollama)
ollama_thread.daemon = True
ollama_thread.start()

# Wait for server to start
time.sleep(10)

# Verify Ollama is running
try:
    output = subprocess.check_output(["curl", "http://localhost:11434"])
    print("Ollama server is running!")
except subprocess.CalledProcessError:
    print("Ollama server failed to start. Check logs or restart.")

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Ollama server is running!




### 💬 Testing the WhatsApp-Style Chatbot

This cell sends a **sample query** to the fine-tuned QA chain and shows both the **generated response** and the **retrieved context** from your exported WhatsApp messages.

**Steps performed:**

1. **Create a test query** – For example:

   > *"What do you think about our last trip? Respond like you would in WhatsApp in Egyptian slang language."*
   > (You can also test in Arabic if preferred.)
2. **Invoke the QA chain** – Passes the query and your `user_name` to the chatbot pipeline so it can generate a reply that matches **your personal style and lingo**.
3. **Display the output** –

   * **Generated Response**: The chatbot’s WhatsApp-like reply.
   * **Retrieved Contexts**: Up to 3 of your closest past messages used as grounding evidence.

This confirms that the model is **mimicking your WhatsApp tone** and pulling relevant pieces of your own chat history to craft natural, context-aware answers.


In [None]:
# Test query
query = "What do you think about our last trip? Respond like you would in WhatsApp.in egyptian slang language"
# Or Arabic: "شو رأيك بالرحلة الأخيرة؟ رد زي ما بتنرد في الواتس"

result = qa_chain.invoke({
    "query": query,
    "user_name": user_name
})

print("Generated Response:")
print(result)
print("\nRetrieved Contexts (your past messages):")
docs = vectorstore.similarity_search(query, k=3)  # Manually fetch for display
for doc in docs:
    print(f"- {doc.page_content[:200]}...")

Generated Response:
 بصراحة كانت رحلة حلوة وممتعة جداً يا أسامة! 😊 لكن بصراحة كنت أتمنى نقضي وقت أطول هناك ونستكشف أكتر. عموماً، كانت تجربة جميلة وأتمنالها تتكرر تاني قريباً! 

Retrieved Contexts (your past messages):
- Nasr City, Cairo Governorate https://maps.app.goo.gl/yhrLT3czGycacaGU8 11/12/2021, 9:36 am - :) osama khalil: احمد كشمير م� �كانيكي.vcf (file attached) 11/12/2021, 9:37 am - :) osama khalil: صورة البر...
- 15/02/2021, 10:47 am - M. Galal: https://go-bus.com/search?arrivalDate=24/02/2021&arrivalStation=40&departureDate=20/02/2021&departureStation=8&passengersNo=1&transportationType=bus&tripType=round 15/...
- omitted> 14/08/2024, 1:48 pm - :) osama khalil: <Media omitted> 14/08/2024, 1:48 pm - :) osama khalil: <Media omitted> 14/08/2024, 1:48 pm - :) osama khalil: <Media omitted> 14/08/2024, 1:48 pm - :) o...
