## TopologicPy ChatBot with RAG using DeepSeek and Ollama
### A local implementation that replaces OpenAI with DeepSeek running through Ollama

This notebook provides a chatbot interface for topological Python coding, using a Retrieval Augmented Generation (RAG) system that enhances responses with information from your uploaded PDF documentation. Unlike the original version, this implementation runs 100% locally without requiring OpenAI API keys.


be carefulk as the local chatbot interface requires that you install the tcl/tk python library. on a mac this can be done with 
>brew install python-tk
on a linux machine(depending on distribution)
>sudo apt-get install python3-tk

this almost works we need to fix the embeddings with hugginface -
perhaps try it in a clean python installation.

In [24]:
# Install required libraries (run this cell if packages are not already installed)
# You can comment this out if you've already installed these packages
import sys
import subprocess

def install_packages(packages):
    for package in packages:
        try:
            __import__(package.split("==")[0].replace("-", "_"))
            print(f"{package} is already installed.")
        except ImportError:
            print(f"Installing {package}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            print(f"Successfully installed {package}")

required_packages = [
    "langchain", 
    "langchain-community", 
    "tiktoken", 
    "pdfplumber", 
    "ipywidgets", 
    "faiss-cpu", 
    "sentence-transformers",
    "requests"
]

install_packages(required_packages)

langchain is already installed.
langchain-community is already installed.
tiktoken is already installed.
pdfplumber is already installed.
ipywidgets is already installed.
Installing faiss-cpu...
Collecting faiss-cpu
  Using cached faiss_cpu-1.11.0-cp311-cp311-macosx_14_0_arm64.whl.metadata (4.8 kB)
Collecting numpy<3.0,>=1.25.0 (from faiss-cpu)
  Using cached numpy-2.2.5-cp311-cp311-macosx_14_0_arm64.whl.metadata (62 kB)
Using cached faiss_cpu-1.11.0-cp311-cp311-macosx_14_0_arm64.whl (3.3 MB)
Using cached numpy-2.2.5-cp311-cp311-macosx_14_0_arm64.whl (5.4 MB)
Installing collected packages: numpy, faiss-cpu
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.5
    Uninstalling numpy-1.23.5:
      Successfully uninstalled numpy-1.23.5


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 4.1.0 requires huggingface-hub>=0.20.0, but you have huggingface-hub 0.17.3 which is incompatible.
sentence-transformers 4.1.0 requires transformers<5.0.0,>=4.41.0, but you have transformers 4.34.0 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Successfully installed faiss-cpu-1.11.0 numpy-2.2.5
Successfully installed faiss-cpu


RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
cannot import name 'is_torch_tpu_available' from 'transformers.utils' (/Users/td3003/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/__init__.py)

In [25]:
# Import necessary libraries
import os
import pdfplumber #for data extraction from the PDf
import tiktoken #for tokenising text
import time
import textwrap #for interface
import ipywidgets as widgets #for interface
import IPython                                                          #Interactive Python Shell
from IPython.display import display, Markdown
import requests #for API calls to Ollama

In [26]:
# LangChain imports
from langchain_community.embeddings import HuggingFaceEmbeddings         #text to numerical vectors - embeddings
from langchain.vectorstores import FAISS                                 #similarity search for vectors
from langchain.text_splitter import RecursiveCharacterTextSplitter       #splits large text chunks into smaller
from langchain_community.llms import Ollama                              #imports Ollama integration
from langchain.chains import RetrievalQA                                 #pre-built chain for document retrieval and question answering
from langchain.prompts import PromptTemplate                             #PromptTemplates for LangChain - Persona, Task, Communication
from langchain.memory import ConversationBufferMemory                    #Memory Handling for LangChain
from langchain.schema.runnable import RunnableMap, RunnableSequence      #Schema mapping and sequence for LangChain
from langchain.schema.output_parser import StrOutputParser               #Output parser for LangChain
from langchain_community.chat_models import ChatOllama                   #Chat model for Ollama

In [27]:
# Ollama API setup
OLLAMA_BASE_URL = "http://localhost:11434" # Default Ollama URL
# If using a remote Ollama server, change the URL above

In [29]:
# Verify Ollama is running and has DeepSeek model pulled
def check_ollama():
    try:
        # Check if Ollama is running
        response = requests.get(f"{OLLAMA_BASE_URL}/api/tags")
        if response.status_code != 200:
            print("Error: Ollama server is not running. Please start Ollama.")
            return False
            
        # Check if DeepSeek model is available
        models = response.json().get("models", [])
        deepseek_models = [model for model in models if "deepseek" in model["name"].lower()]
        
        if not deepseek_models:
            print("DeepSeek model not found. Pulling the model (this may take some time)...")
            # Pull the model
            pull_response = requests.post(
                f"{OLLAMA_BASE_URL}/api/pull",
                json={"name": "deepseek-coder:6.7b"}
            )
            if pull_response.status_code != 200:
                print(f"Error pulling model: {pull_response.text}")
                return False
            print("DeepSeek model pulled successfully.")
        else:
            print(f"DeepSeek model(s) available: {[model['name'] for model in deepseek_models]}")
        return True
    except requests.exceptions.ConnectionError:
        print("Error: Cannot connect to Ollama server. Please make sure Ollama is running.")
        return False

# Run the check
ollama_ready = check_ollama()

DeepSeek model(s) available: ['deepseek-r1:latest', 'deepseek-coder-v2:latest', 'deepseek-coder:latest']


In [30]:
import warnings
warnings.filterwarnings('ignore')

In [31]:
# Function to extract text from PDF
def extract_text_from_pdf(pdf_path):
    """Extract text content from a PDF file."""
    text = ""
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            extracted = page.extract_text()
            if extracted:  # Avoid NoneType errors
                text += extracted + "\n"
    return text

In [32]:
# Upload PDF file using local file dialog
import tkinter as tk
from tkinter import filedialog

# Create and hide the root window
root = tk.Tk()
root.withdraw()

print("Please select your PDF document:")
pdf_path = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])

if pdf_path:
    pdf_filename = os.path.basename(pdf_path)
    print(f"Selected file: {pdf_filename}")
else:
    print("No file selected. Please run this cell again.")

Please select your PDF document:


2025-05-05 13:47:32.831 python[21440:35175973] +[IMKInputSession subclass]: chose IMKInputSession_Modern


Selected file: topologicpy-readthedocs-io-en-latest (1).pdf


In [33]:
# Extract text from the uploaded PDF
try:
    pdf_text = extract_text_from_pdf(pdf_path)
    print(f"Successfully extracted text from {pdf_filename}")
except Exception as e:
    print(f"Error extracting text from PDF: {e}")

CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, def

Successfully extracted text from topologicpy-readthedocs-io-en-latest (1).pdf


In [38]:
# Split the document into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = text_splitter.create_documents([pdf_text])

# Create vector embeddings and store in FAISS
# Using Hugging Face embeddings instead of OpenAI embeddings
embeddings = HuggingFaceEmbeddings() #model_name="sentence-transformers/all-MiniLM-L6-v2"
vectorstore = FAISS.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever()

# Initialize the chat model using Ollama with DeepSeek
llm = ChatOllama(
    model="deepseek-coder:6.7b",  # DeepSeek coder model
    temperature=0.7,
    base_url=OLLAMA_BASE_URL,
    streaming=False  # Set to True if you want streaming responses
)

RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
cannot import name 'is_torch_tpu_available' from 'transformers.utils' (/Users/td3003/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/__init__.py)

In [35]:
# Create the RAG (Retrieval Augmented Generation) chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

NameError: name 'llm' is not defined

In [36]:
# Define the prompt template for our Topologic ChatBot
#customise the Persona, Task, Communication, Context as needed. test whether deep/specific details work better.                                              

prompt_template = PromptTemplate.from_template("""
<Persona>
You are a very technical python coder with expertise in geometry and topology, and in particular topologicPy and Industrial Foundation Classes strategies.
</Persona>

<Task>
The conversation is about helping junior python coders to develop code using Topologicpy API
Please use few-shot strategy to benchmark your responses when the questions are difficult.
Use the retrieved context when it's relevant to answer the user's question.
Communicate sources for your answers when needed.
</Task>

<Communication>
Respond in detailed python code and with deatialed comments and explanations.
Keep the dialogue on track.
Never reveal you are an AI or LLM.
If questioned further provide an explanation of at least one paragraph with ten sentences as an explanation of your thinking.
Ensure your answers are data-driven when possible, drawing from the context provided.
</Communication>

<Context>
{context}
</Context>

Conversation history:
{history}

User: {user_input}
ChatBot:
""")

The next part develops the interface. The chat returns input after hitting return on the keyboard. Be patient with it as it might take time since we're running DeepSeek locally.

In [37]:
# Initialize memory for conversation history
memory = ConversationBufferMemory(return_messages=True, max_token_limit=500)

# Create a function to process user input using both RAG and the conversational prompt
def process_user_input(user_input, history):
    # First, use RAG to retrieve relevant context
    rag_response = rag_chain({"query": user_input})
    relevant_context = rag_response.get("result", "")

    # Format the conversation history
    formatted_history = "\n".join([f"User: {h['user_input']}\nChatBot: {h['assistant']}" for h in history])

    # Use the prompt template with the retrieved context
    response = llm.invoke(prompt_template.format(
        context=relevant_context,
        history=formatted_history,
        user_input=user_input
    ))

    return response.content

# UI Elements
chat_output = widgets.Output()
user_input_box = widgets.Textarea(
    placeholder="Enter your message here...",
    description="User:",
    style={'description_width': 'initial'},
    layout=widgets.Layout(width="80%", height="50px")
)
end_chat_button = widgets.Button(description="End Chat Session", button_style="danger")

# Display UI Elements
display(chat_output, user_input_box, end_chat_button)

# Initialize conversation history
history = []

# Open log file in append mode
log_filename = "rag_chat_history.txt"
log_file = open(log_filename, "a", encoding="utf-8")

# Show initial message
with chat_output:
    print("Welcome! I'm your TopologicPy assistant. How can I help you today?")

def handle_input():
    """Handles user input when Enter is pressed."""
    user_input = user_input_box.value.strip()

    if not user_input:
        return  # Ignore empty input

    if user_input.lower() in ["exit", "quit"]:
        stop_chat()
        return

    # Display user input
    with chat_output:
        print(f"User: {user_input}")

    # Process response using combined approach
    try:
        response_text = process_user_input(user_input, history)
        wrapped_response = textwrap.fill(response_text, width=120)

        with chat_output:
            print(f"ChatBot: {wrapped_response}")

        # Update conversation history
        history.append({"user_input": user_input, "assistant": response_text})

        # Log conversation to file
        log_file.write(f"User: {user_input}\n")
        log_file.write(f"ChatBot: {response_text}\n\n")
        log_file.flush()  # Ensure data is written immediately

    except Exception as e:
        with chat_output:
            print(f"Error: {e}")

    # Clear input box for next message
    user_input_box.value = ""

def handle_keypress(change):
    """Detect Enter and submit input."""
    if change["name"] == "value" and change["new"].endswith("\n"):  # Detect newlines
        handle_input()

def stop_chat(_=None):
    """Ends the chat, saves dialogue history and closes the log file"""
    global log_file
    log_file.close()  # Close file properly

    with chat_output:
        print("\nGoodbye! Chat history saved to 'rag_chat_history.txt'.")

    disable_input()

def disable_input():
    """Disables input box and chat button after chat ends."""
    user_input_box.close()
    end_chat_button.disabled = True

# Bind buttons and input events
end_chat_button.on_click(stop_chat)

# Attach event listener for Enter
user_input_box.observe(handle_keypress, names="value")

# Function to analyze token usage (for debugging/optimization)
def count_tokens(text, encoding_name="cl100k_base"):
    try:
        encoding = tiktoken.get_encoding(encoding_name)
        return len(encoding.encode(text))
    except:
        # Fallback method if tiktoken doesn't support the encoding
        return len(text.split())

Output()

Textarea(value='', description='User:', layout=Layout(height='50px', width='80%'), placeholder='Enter your mes…

Button(button_style='danger', description='End Chat Session', style=ButtonStyle())

## Setting Up Your Local Environment

### Prerequisites

1. **Python Environment:** Make sure you have Python 3.8+ installed
2. **Jupyter Notebook:** Should be installed and running locally
3. **Ollama Setup:**

### Installing and Setting Up Ollama

Before running this notebook, you need to have Ollama installed and running on your system:

1. **Install Ollama**:
   - Download and install from [https://ollama.ai/](https://ollama.ai/)
   - Available for Windows, macOS, and Linux
   
2. **Start Ollama**:
   - Windows: Run Ollama from the Start menu
   - macOS: Open Ollama from Applications
   - Linux: Run `ollama serve` in terminal
   
3. **Pull the DeepSeek model**:
   - Open a terminal or command prompt
   - Run: `ollama pull deepseek-coder:6.7b`
   - This will download the model (approximately 4-5GB)

### Using Different DeepSeek Models

Ollama supports various DeepSeek models. You can change the model by modifying the `model` parameter in the `ChatOllama` initialization:

- `deepseek-coder:6.7b` - A smaller coding-focused model (recommended)
- `deepseek-coder:33b` - A larger coding-focused model (requires 16GB+ RAM)
- `deepseek-llm:7b` - A general-purpose model

### Hardware Requirements

- **Minimum**: 8GB RAM, decent CPU
- **Recommended**: 16GB RAM, modern CPU, NVIDIA GPU with 6GB+ VRAM for faster inference

### Troubleshooting

- **Connection Error**: Make sure Ollama is running (default port 11434)
- **Memory Issues**: Try using a smaller model variant or close other applications
- **Model Not Loading**: Check Ollama logs for detailed error messages

### LangChain Documentation

For more information on using LangChain with Ollama, see:
https://python.langchain.com/docs/integrations/llms/ollama


## Implementation Details

### Components Used
- **LLM**: DeepSeek model with Ollama for local inference
- **Embeddings**: Hugging Face Sentence Transformers for local vector embeddings
- **Vector Store**: FAISS for efficient similarity search
- **Framework**: LangChain for RAG workflow management
- **PDF Processing**: pdfplumber for document extraction

### Technical Notes
- All processing happens locally - your data never leaves your machine
- The text is split into chunks of 500 characters with 50 character overlap
- FAISS creates a searchable index of document vectors
- DeepSeek's coder model is specialized for programming questions
- The conversation history is limited to prevent context overflow

### Performance Tips
- First response may be slow while the model loads into memory
- Subsequent responses will be faster
- If responses are too slow, try reducing the temperature parameter
- For better performance, ensure Ollama can access your GPU if available
