## Ollama Integration

This notebook focuses on integrating a local LLM (Ollama) with our memory system.



## Step 1: Install Ollama

Before we proceed, you need to install Ollama on your system.

### Installation Command:
```bash
curl -fsSL https://ollama.ai/install.sh | sh


For Linux/WSL
wget -qO- https://ollama.ai/install.sh | sh

Now verify the installation
```bash
ollama --version

## Step 2: Install Ollama Python Client

Now we will install the Python client library to communicate with Ollama from our code.

### Installation Command:
```bash
pip install ollama

## Step 3: Download LLM Model

Now let's download a language model to use with our memory system.

### Recommended Model (Lightweight):
```bash
ollama pull llama3.2:3b

### Verify Model Installation

``` bash
ollama list

### Test the model now
``` bash 
ollama run llama3.2:3b
# Type a test message and exit.

In [5]:
## We will just test the model with a small snippet,,

import ollama

print("Testing Llama 3.2 model...")

try:
    # Simple test conversation for checking..
    response = ollama.chat(
        model='llama3.2:3b',
        messages=[
            {
                'role': 'user',
                'content': 'Hello! who are you?'
            }
        ]
    )
    
    print("Model is working!")
    print("="*200)
    print(f"Response: {response['message']['content']}")
    
except Exception as e:
    print(f"Error: {e}")

Testing Llama 3.2 model...
Model is working!
Response: Hello! I'm an artificial intelligence model, which means I'm a computer program designed to simulate conversations and answer questions to the best of my knowledge. I don't have a personal identity or physical presence, but I'm here to help and provide information on a wide range of topics.

I can assist with tasks such as:

* Answering questions
* Generating text
* Translating languages
* Summarizing content
* And more!

Feel free to ask me anything, and I'll do my best to help. How can I assist you today?


## Step 4: Connect Ollama with Memory Database

Now we'll integrate our LLM (Ollama) with the memory system we built in setup_vectorydb notebook.

### What we're going to do:
1. **Import ChromaDB** - Same as setup_vectordb notebook to access our memory
2. **Connect to existing collection** - Get our stored conversations
3. **Create memory-aware function** - Combine memory search + LLM response
4. **Test the integration** - See how LLM uses past conversations

### Below is the code for doing the above things (Most of the things we did earlier in setup_vectordb notebook)

In [15]:
# Step 4A: Import required libraries
import chromadb
import ollama
from datetime import datetime
import json

print("Connecting Ollama with Memory Database...")

# Step 4B: Connect to existing persistent ChromaDB from Phase 1
client = chromadb.PersistentClient(path="./chroma_db")

# Get the existing collection created in Phase 1
collection = client.get_collection(name="conversation_memory")

print("✅ Connected to existing memory collection!")
print(f" Total memories available: {collection.count()}")
print(f" Collection name: {collection.name}")

print("\n" + "="*100)
print(" Memory-LLM Integration Setup Complete!")


Connecting Ollama with Memory Database...
✅ Connected to existing memory collection!
 Total memories available: 10
 Collection name: conversation_memory

 Memory-LLM Integration Setup Complete!


## Create Memory-Aware Chat Function
### What we will buil now is:

1. **Search function** - Find relevant past conversations
2. **Context builder** - Format memory for LLM prompt
3. **Memory aware chat** - LLM that uses conversation history
4. **Test it** - See how memory improves responses

In [18]:
def search_memory(query, top_k=3):
    """
    Search for relevant past conversations based on the query
    """
    print(f" Searching memory for: '{query}'")
    
    # Search for similar conversations
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )
    
    print(f" Found {len(results['documents'][0])} relevant memories")
    return results

def format_memory_context(search_results):
    """
    Format search results into context for the LLM
    """
    if not search_results['documents'][0]:
        return "No relevant conversation history found."
    
    context = "=== RELEVANT CONVERSATION HISTORY ===\n\n"
    
    for i, (doc, metadata) in enumerate(zip(
        search_results['documents'][0], 
        search_results['metadatas'][0]
    )):
        context += f"Memory {i+1}:\n"
        context += f"Content: {doc}\n"
        context += f"Topic: {metadata.get('topic', 'Unknown')}\n"
        context += f"User Skill Level: {metadata.get('user_level', 'Unknown')}\n"
        context += f"Technology: {metadata.get('tech_stack', 'Unknown')}\n"
        context += f"Timestamp: {metadata.get('timestamp', 'Unknown')}\n"
        context += "-" * 50 + "\n\n"
    
    return context

def memory_chat(user_message, use_memory=True):
    """
    Chat with LLM using conversation memory for context
    """
    print("="*100)
    print(f" User: {user_message}")
    print("="*100)
    
    # now we will build a very detailed and contextual  prompt
    if use_memory:
        # Searching  for relevant memorie
        memory_results = search_memory(user_message)
        memory_context = format_memory_context(memory_results)
        
        # what i have done is that i have enhanced professional system prompt with memory
        system_prompt = f"""You are an advanced AI assistant with persistent memory capabilities. Your primary objective is to provide highly personalized, contextually-aware responses based on the user's conversation history and demonstrated preferences.

{memory_context}

CORE INSTRUCTIONS:
1. MEMORY UTILIZATION: Always reference and build upon the provided conversation history to demonstrate continuity and understanding of the user's background, interests, and expertise levels.

2. PERSONALIZATION: Adapt your communication style, technical depth, and examples to match the user's demonstrated skill level and interests from previous conversations.

3. CONTEXTUAL AWARENESS: Connect current questions to past discussions when relevant, showing how topics relate to the user's ongoing projects or learning journey.

4. CONSISTENCY: Maintain awareness of the user's preferences, previously discussed technologies, and established context across all interactions.

5. PROGRESSIVE LEARNING: Build upon previous conversations to offer increasingly sophisticated insights and recommendations that align with the user's growing expertise.

6. RELEVANCE FILTERING: Only reference historical context that is directly relevant to the current query - avoid overwhelming responses with unnecessary background information.

7. ACKNOWLEDGE LIMITATIONS: If the conversation history doesn't contain relevant information for the current query, explicitly state this and provide the best possible response based on available context.

RESPONSE REQUIREMENTS:
- Be specific and actionable
- Reference relevant past discussions naturally  
- Maintain professional yet personable tone
- Provide depth appropriate to user's demonstrated skill level
- Offer practical next steps when applicable
- Use examples that align with user's known interests and experience level

TECHNICAL CONTEXT AWARENESS:
- Consider the user's demonstrated expertise with specific technologies
- Build upon their known project context and learning goals
- Suggest resources appropriate to their skill progression
- Connect new concepts to their established knowledge base

Respond to the user's current message while seamlessly integrating insights from their conversation history to provide the most helpful and contextually appropriate response possible."""
    else:
        system_prompt = """You are a helpful AI assistant. Provide clear, accurate, and contextually appropriate responses to user queries. Be professional, concise, and ensure your answers are actionable and well-structured."""
    
    # Chat with Ollama
    try:
        response = ollama.chat(
            model='llama3.2:3b',
            messages=[
                {'role': 'system', 'content': system_prompt},
                {'role': 'user', 'content': user_message}
            ]
        )
        
        ai_response = response['message']['content']
        print(f" AI Response:\n{ai_response}")
        
        return ai_response
        
    except Exception as e:
        print(f" Error: {e}")
        return None

print("✅ Memory aware chat function created!")


✅ Memory aware chat function created!
