# Python Foundations: Functions vs Classes

## Why Learn This?

Before building production RAG systems (Part 2), you need to understand:
- **Functions**: Reusable code blocks
- **Classes**: Blueprints that bundle data + functions together

This notebook uses **AI/ML examples** so it connects to your RAG learning!

---

# Part 1: Functions - The Basics

## What is a Function?

A **function** is a reusable block of code that:
1. Takes **inputs** (parameters)
2. Does **something** (logic)
3. Returns **outputs** (results)

**Think of it like a machine:** Input ‚Üí Process ‚Üí Output

## Example 1: Simple Function

In [1]:
def greet(name):
    """A simple function that greets someone."""
    message = f"Hello, {name}!"
    return message

# Use the function
result = greet("Alice")
print(result)

result2 = greet("Bob")
print(result2)

Hello, Alice!
Hello, Bob!


**How it works:**
1. `def greet(name):` - Define function called "greet" that takes "name" as input
2. `message = ...` - Do something with the input
3. `return message` - Send back the result
4. `greet("Alice")` - Call the function with "Alice"

## Example 2: Function with Multiple Parameters (AI Example)

In [9]:
def calculate_similarity(score1, score2):
    """
    Calculate average similarity between two scores.
    
    Args:
        score1: First similarity score
        score2: Second similarity score
        
    Returns:
        Average of the two scores
    """
    average = (score1 + score2) / 2
    return average

# Use it
result = calculate_similarity(0.85, 0.92)
print(f"Average similarity: {result}")

result2 = calculate_similarity(0.45, 0.67)
print(f"Average similarity: {result2}")

Average similarity: 0.885
Average similarity: 0.56


## Example 3: Function Returning Multiple Values

In [10]:
def analyze_query(query):
    """
    Analyze a query and return multiple pieces of information.
    
    Returns:
        tuple: (word_count, character_count, is_question)
    """
    word_count = len(query.split())
    char_count = len(query)
    is_question = query.endswith('?')
    
    return word_count, char_count, is_question

# Use it
query = "What is machine learning?"
words, chars, question = analyze_query(query)

print(f"Query: {query}")
print(f"Words: {words}")
print(f"Characters: {chars}")
print(f"Is question: {question}")

Query: What is machine learning?
Words: 4
Characters: 25
Is question: True


## Example 4: Functions Calling Other Functions (Like RAG!)

In [11]:
def embed_text(text):
    """Simulate embedding text (fake numbers for demo)."""
    # In real life, this would use SentenceTransformer
    embedding = [len(text), text.count('a'), text.count('e')]
    return embedding

def calculate_similarity_score(emb1, emb2):
    """Calculate simple similarity between two embeddings."""
    # Simple difference (in real life: cosine similarity)
    difference = sum(abs(a - b) for a, b in zip(emb1, emb2))
    similarity = 1.0 / (1.0 + difference)  # Convert to 0-1 score
    return similarity

def search_documents(query, documents):
    """
    Search documents for the most similar one.
    This function CALLS the other two functions!
    """
    # Embed the query
    query_emb = embed_text(query)
    
    # Find best match
    best_score = 0
    best_doc = None
    
    for doc in documents:
        doc_emb = embed_text(doc)
        score = calculate_similarity_score(query_emb, doc_emb)
        
        if score > best_score:
            best_score = score
            best_doc = doc
    
    return best_doc, best_score

# Use it (mini RAG!)
docs = [
    "Machine learning uses data",
    "Deep learning uses neural networks",
    "Python is a programming language"
]

query = "What is deep learning?"
best_match, score = search_documents(query, docs)

print(f"Query: {query}")
print(f"Best match: {best_match}")
print(f"Similarity: {score:.3f}")

Query: What is deep learning?
Best match: Machine learning uses data
Similarity: 0.143


**Key Point:** Functions can call other functions - just like your RAG pipeline!
```
retrieve_relevant_docs() ‚Üí calls ‚Üí create_embeddings()
generate_answer() ‚Üí uses results from ‚Üí retrieve_relevant_docs()
```

---

# Part 2: The Problem with Just Functions

## Scenario: Building a Simple RAG System

In [12]:
# Our knowledge base
knowledge_base = [
    "Machine learning uses algorithms",
    "Deep learning uses neural networks",
    "AI is transforming the world"
]

# We embed them once
kb_embeddings = [embed_text(doc) for doc in knowledge_base]

print("Knowledge base ready!")
print(f"Documents: {len(knowledge_base)}")
print(f"Embeddings: {len(kb_embeddings)}")

Knowledge base ready!
Documents: 3
Embeddings: 3


In [14]:
# Now let's query it multiple times
# Notice the problem: we have to pass kb and kb_embeddings EVERY TIME!

def query_rag(question, knowledge_base, kb_embeddings):
    """Query the RAG system."""
    query_emb = embed_text(question)
    
    best_score = 0
    best_doc = None
    
    for doc, doc_emb in zip(knowledge_base, kb_embeddings):
        score = calculate_similarity_score(query_emb, doc_emb)
        if score > best_score:
            best_score = score
            best_doc = doc
    
    return best_doc, best_score

# Query 1
answer1, score1 = query_rag("What is ML?", knowledge_base, kb_embeddings)
print(f"Q1: {answer1} (score: {score1:.3f})")

# Query 2 - MUST pass the same parameters again!
answer2, score2 = query_rag("What is DL?", knowledge_base, kb_embeddings)
print(f"Q2: {answer2} (score: {score2:.3f})")

# Query 3 - Again! So repetitive!
answer3, score3 = query_rag("Tell me about AI", knowledge_base, kb_embeddings)
print(f"Q3: {answer3} (score: {score3:.3f})")

Q1: AI is transforming the world (score: 0.053)
Q2: AI is transforming the world (score: 0.053)
Q3: AI is transforming the world (score: 0.071)


## The Problems:

1. ‚ùå **Repetitive** - Always passing `knowledge_base` and `kb_embeddings`
2. ‚ùå **Error-prone** - Easy to pass wrong data
3. ‚ùå **Hard to manage** - If we add more data (model configs, settings), even more parameters!
4. ‚ùå **Messy** - Data and functions are separate

**Solution:** Use a CLASS! üéØ

---

# Part 3: Classes - The Solution

## What is a Class?

A **class** is a blueprint that bundles:
- **Data** (variables) - called "attributes"
- **Functions** (methods) - things you can do with that data

**Think of it like:**
- A class is a **cookie cutter** üç™
- An object (instance) is the **actual cookie**

## Simple Example: A Person Class

In [11]:
class Person:
    """A simple class representing a person."""
    
    def __init__(self, name, age):
        """Initialize a person with name and age."""
        # 'self' refers to THIS specific person
        self.name = name  # Store name
        self.age = age    # Store age
    
    def greet(self):
        """Make the person introduce themselves."""
        return f"Hi, I'm {self.name} and I'm {self.age} years old!"
    
    def have_birthday(self):
        """Increase age by 1."""
        self.age += 1
        return f"Happy birthday! {self.name} is now {self.age}!"

# Create two different people (objects/instances)
alice = Person("Alice", 25)
bob = Person("Bob", 30)

# Each person has their own data
print(alice.greet())
print(bob.greet())

# Each person can have actions
print(alice.have_birthday())
print(alice.greet())  # Alice is now 26!
print(bob.greet())    # Bob is still 30!

Hi, I'm Alice and I'm 25 years old!
Hi, I'm Bob and I'm 30 years old!
Happy birthday! Alice is now 26!
Hi, I'm Alice and I'm 26 years old!
Hi, I'm Bob and I'm 30 years old!


In [4]:
class Hockey():
    def __init__(self, team_name, city):
        self.team_name = team_name
        self.city = city

    def greet_team(self):
        return f"Welcome to the {self.team_name}, hope you have a great season!"
    
    def tactics(self):
        return f"The tactic we use is basic defense and counter-attack., this is the approach of the {self.team_name} based in {self.city}."
    
# Create a hockey team instance
virajpet = Hockey("Virajpet Tigers", "Kodagu")
madikeri = Hockey("Madikeri Lions", "Kodagu")

print(virajpet.greet_team())
print(virajpet.tactics())

print(madikeri.greet_team())
print(madikeri.tactics())


Welcome to the Virajpet Tigers, hope you have a great season!
The tactic we use is basic defense and counter-attack., this is the approach of the Virajpet Tigers based in Kodagu.
Welcome to the Madikeri Lions, hope you have a great season!
The tactic we use is basic defense and counter-attack., this is the approach of the Madikeri Lions based in Kodagu.


## How it Works:

1. **`class Person:`** - Define the blueprint
2. **`def __init__(self, ...):`** - Constructor (runs when you create a person)
3. **`self.name = name`** - Store data inside the object
4. **`def greet(self):`** - Method (function that belongs to the class)
5. **`alice = Person(...)`** - Create an actual person (instance)
6. **`alice.greet()`** - Call the method on that specific person

**Key Point:** `alice` and `bob` are **separate** - they have their own data!

---

# Part 4: Classes for AI/ML

## Example: A Simple Embedding System

In [15]:
class EmbeddingSystem:
    """A class to manage text embeddings."""
    
    def __init__(self, model_name="simple"):
        """Initialize the embedding system."""
        self.model_name = model_name
        self.documents = []        # Store documents
        self.embeddings = []       # Store embeddings
        print(f"‚úÖ EmbeddingSystem created with model: {model_name}")
    
    def add_document(self, text):
        """Add a document and create its embedding."""
        self.documents.append(text)
        embedding = embed_text(text)  # Use our function from before
        self.embeddings.append(embedding)
        print(f"‚úÖ Added document: {text[:50]}...")
    
    def search(self, query):
        """Search for most similar document."""
        query_emb = embed_text(query)
        
        best_score = 0
        best_doc = None
        
        for doc, doc_emb in zip(self.documents, self.embeddings):
            score = calculate_similarity_score(query_emb, doc_emb)
            if score > best_score:
                best_score = score
                best_doc = doc
        
        return best_doc, best_score
    
    def show_stats(self):
        """Show statistics about the system."""
        return f"Model: {self.model_name}, Documents: {len(self.documents)}"

# Use it!
system = EmbeddingSystem(model_name="demo-v1")

# Add documents
system.add_document("Machine learning uses algorithms")
system.add_document("Deep learning uses neural networks")
system.add_document("Python is great for AI")

# Show stats
print(f"\n{system.show_stats()}")

# Search (notice: no need to pass documents/embeddings!)
result, score = system.search("What is deep learning?")
print(f"\nSearch result: {result}")
print(f"Score: {score:.3f}")

‚úÖ EmbeddingSystem created with model: demo-v1
‚úÖ Added document: Machine learning uses algorithms...
‚úÖ Added document: Deep learning uses neural networks...
‚úÖ Added document: Python is great for AI...

Model: demo-v1, Documents: 3

Search result: Python is great for AI
Score: 0.250


In [17]:
class Embedding():
    def __init__(self, model_name="simple"):
        self.model_name = model_name
        self.documents = []
        self.embeddings = []

    def add_document(self, text):
        self.documents.append(text)
        embeddings = embed_text(text)
        self.embeddings.append(embeddings)

    def search(self, query):
        embed_query = embed_text(query)
        best_score = 0
        best_doc = None

        for doc, doc_emb in zip(self.documents, self.embeddings):
            score = calculate_similarity_score(embed_query, doc_emb)

            if score > best_score:
                best_score = score
                best_doc = doc
        return best_doc, best_score
    
    def show_stats(self):
        return f"Model: {self.model_name}, Documents: {len(self.documents)}"
    
system = Embedding(model_name= "demo-v1")

# Add documents
system.add_document("Machine learning uses algorithms")
system.add_document("Deep learning uses neural networks")
system.add_document("Python is great for AI")

# Show stats
print(f"\n{system.show_stats()}")

# Search (notice: no need to pass documents/embeddings!)
result, score = system.search("What is deep learning?")
print(f"\nSearch result: {result}")
print(f"Score: {score:.3f}")


Model: demo-v1, Documents: 3

Search result: Python is great for AI
Score: 0.250


## Why This is Better:

‚úÖ **Clean API** - `system.add_document()`, `system.search()` - simple!  
‚úÖ **No repetition** - Don't pass documents/embeddings every time  
‚úÖ **Data + Logic together** - Everything related to embeddings is in one place  
‚úÖ **Easy to extend** - Can add more methods easily  

Compare:
```python
# With functions (messy)
docs = []
embs = []
add_document("text", docs, embs)
result = search("query", docs, embs)

# With class (clean)
system = EmbeddingSystem()
system.add_document("text")
result = system.search("query")
```

---

# Part 5: Building a Simple RAG Class

Let's recreate the RAG system using a class!

In [14]:
class SimpleRAG:
    """A simple RAG system using classes."""
    
    def __init__(self, model_name="simple-rag"):
        """Initialize the RAG system."""
        self.model_name = model_name
        self.knowledge_base = []   # Store documents
        self.kb_embeddings = []    # Store embeddings
        print(f"‚úÖ RAG system initialized: {model_name}")
    
    def add_documents(self, documents):
        """Add multiple documents to the knowledge base."""
        self.knowledge_base = documents
        self.kb_embeddings = [embed_text(doc) for doc in documents]
        print(f"‚úÖ Added {len(documents)} documents")
    
    def retrieve(self, query, top_k=1):
        """Retrieve most relevant documents."""
        query_emb = embed_text(query)
        
        # Calculate all scores
        scores = []
        for doc, doc_emb in zip(self.knowledge_base, self.kb_embeddings):
            score = calculate_similarity_score(query_emb, doc_emb)
            scores.append((doc, score))
        
        # Sort by score and get top_k
        scores.sort(key=lambda x: x[1], reverse=True)
        return scores[:top_k]
    
    def query(self, question):
        """Main method: query the RAG system."""
        # Retrieve
        results = self.retrieve(question, top_k=1)
        best_doc, score = results[0]
        
        # Generate (simplified - just return the document)
        answer = f"Based on context: {best_doc}"
        
        return {
            "question": question,
            "answer": answer,
            "score": score,
            "source": best_doc
        }

# Use it!
rag = SimpleRAG(model_name="demo-rag-v1")

# Add knowledge
docs = [
    "Machine learning is a subset of AI that learns from data",
    "Deep learning uses neural networks with multiple layers",
    "Python is the most popular language for AI development"
]
rag.add_documents(docs)

# Query it (notice how clean this is!)
result = rag.query("What is deep learning?")

print(f"\n{'='*60}")
print(f"Question: {result['question']}")
print(f"Answer: {result['answer']}")
print(f"Score: {result['score']:.3f}")
print('='*60)

‚úÖ RAG system initialized: demo-rag-v1
‚úÖ Added 3 documents

Question: What is deep learning?
Answer: Based on context: Python is the most popular language for AI development
Score: 0.028


## Multiple Queries - See How Clean It Is!

In [15]:
questions = [
    "What is machine learning?",
    "Tell me about Python",
    "What are neural networks?"
]

for q in questions:
    result = rag.query(q)  # SO SIMPLE!
    print(f"\nQ: {q}")
    print(f"A: {result['answer']}")
    print(f"Score: {result['score']:.3f}")


Q: What is machine learning?
A: Based on context: Python is the most popular language for AI development
Score: 0.030

Q: Tell me about Python
A: Based on context: Python is the most popular language for AI development
Score: 0.025

Q: What are neural networks?
A: Based on context: Python is the most popular language for AI development
Score: 0.031


**Compare this to the function approach:**

```python
# Functions (repetitive)
for q in questions:
    result = query_rag(q, knowledge_base, kb_embeddings)  # Pass everything!

# Class (clean)
for q in questions:
    result = rag.query(q)  # Just the question!
```

The class **remembers** the knowledge base and embeddings! üéØ

---

# Part 6: Key Concepts Summary

## Functions

‚úÖ **Use when:**
- Simple, one-off operations
- Pure logic (input ‚Üí output)
- No need to remember state

**Example:**
```python
def calculate_average(numbers):
    return sum(numbers) / len(numbers)
```

## Classes

‚úÖ **Use when:**
- Need to store data (state)
- Multiple related operations
- Want clean, reusable APIs

**Example:**
```python
class Calculator:
    def __init__(self):
        self.history = []  # Remember past calculations
    
    def add(self, a, b):
        result = a + b
        self.history.append(result)
        return result
```

## Important Terms

| Term | Meaning |
|------|----------|
| **Class** | Blueprint/template for creating objects |
| **Object/Instance** | Actual thing created from a class |
| **`__init__`** | Constructor - runs when object is created |
| **`self`** | Refers to "this specific object" |
| **Attribute** | Data stored in object (`self.name`) |
| **Method** | Function that belongs to a class |

## The Pattern in RAG

```python
# Step 1: Create the system (blueprint ‚Üí object)
rag = SimpleRAG()

# Step 2: Set it up (store data)
rag.add_documents(docs)

# Step 3: Use it (clean API)
result = rag.query("What is ML?")
```

**This is exactly what you'll see in Part 2 of the RAG notebook!**

---

# Practice Exercises

## Exercise 1: Create a Document Manager Class

Build a class that:
- Stores documents
- Counts total words
- Finds longest document

In [16]:
class DocumentManager:
    def __init__(self):
        # TODO: Initialize empty list of documents
        self.documents = []
        print("DocumentManager initialized.")
    
    def add_document(self, text):
        # TODO: Add document to list
        self.documents.append(text)
        print(f"Added document: {text}")
    
    def total_words(self):
        # TODO: Count total words across all documents
        total = sum(len(doc.split()) for doc in self.documents)
        return total
        
    
    def longest_document(self):
        # TODO: Return the longest document
        return max(self.documents, key=len) if self.documents else None
        

# Test it
manager = DocumentManager()
manager.add_document("Machine learning is great")
manager.add_document("AI")
print(manager.total_words())  # Should print 5
print(manager.longest_document())  # Should print "Machine learning is great"

DocumentManager initialized.
Added document: Machine learning is great
Added document: AI
5
Machine learning is great


## Solution (Don't peek until you try!)

In [None]:
# SOLUTION
class DocumentManager:
    def __init__(self):
        self.documents = []
    
    def add_document(self, text):
        self.documents.append(text)
        print(f"‚úÖ Added: {text}")
    
    def total_words(self):
        total = sum(len(doc.split()) for doc in self.documents)
        return total
    
    def longest_document(self):
        return max(self.documents, key=len) if self.documents else None

# Test
manager = DocumentManager()
manager.add_document("Machine learning is great")
manager.add_document("AI")
print(f"\nTotal words: {manager.total_words()}")
print(f"Longest doc: {manager.longest_document()}")

---

# Summary: You're Ready for Part 2!

## What You Learned:

‚úÖ **Functions** - Reusable code blocks  
‚úÖ **Classes** - Bundle data + functions  
‚úÖ **`__init__`** - Constructor that runs on creation  
‚úÖ **`self`** - Refers to the specific object  
‚úÖ **Why classes are better** - For systems that need state  

## Now You Can Understand:

```python
# Part 2 of RAG notebook will make PERFECT sense!
class SimpleRAG:
    def __init__(self, ...):  # ‚Üê You know this!
        self.documents = []    # ‚Üê You know this!
    
    def add_documents(self, docs):  # ‚Üê You know this!
        self.documents = docs
    
    def query(self, question):  # ‚Üê You know this!
        # Retrieval + Generation
        pass
```

**Go to Part 2 now - it will click immediately!** üöÄ