# Lesson 1: Vector Fundamentals for Machine Learning

**Duration**: 4-5 hours  
**Prerequisites**: Basic Python, high school mathematics  
**Learning Objectives**:
- Understand what vectors are and why they're crucial in ML
- Master vector creation and manipulation with NumPy
- Implement essential vector operations
- Apply vector concepts to real ML scenarios

---

## Part 1: Understanding Vectors (Theory)

### What is a Vector?

A **vector** is a mathematical object that has both magnitude (size) and direction. In machine learning:
- **Data points** are represented as vectors (e.g., [height, weight, age])
- **Features** of a dataset form vector components
- **Model parameters** are stored as vectors
- **Predictions** are often vector outputs

### Why Vectors Matter in ML

1. **Data Representation**: Every data sample is a vector
2. **Computational Efficiency**: Vectorized operations are much faster
3. **Mathematical Foundation**: Most ML algorithms use vector math
4. **Dimensionality**: Vectors help us work in high-dimensional spaces

### Types of Vectors

- **Row Vector**: [1, 2, 3] - horizontal arrangement
- **Column Vector**: [[1], [2], [3]] - vertical arrangement
- **Zero Vector**: [0, 0, 0] - all components are zero
- **Unit Vector**: [1, 0, 0] - magnitude equals 1

## Part 2: Vector Creation with NumPy

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Union, List

# Let's start with basic vector creation
print("=== Vector Creation Methods ===")

=== Vector Creation Methods ===


In [5]:
# Method 1: From Python lists
list_data = [1, 2, 3, 4, 5]
vector_from_list = np.array(list_data)
print(f"From list: {vector_from_list}")
print(f"Shape: {vector_from_list.shape}")
print(f"Data type: {vector_from_list.dtype}")
print()

From list: [1 2 3 4 5]
Shape: (5,)
Data type: int64



In [None]:
# Method 2: Using arange (like Python's range)
vector_arange = np.arange(1, 10, 2)  # start, stop, step
print(f"Using arange(1, 10, 2): {vector_arange}")

# Method 3: Using linspace (linearly spaced)
vector_linspace = np.linspace(0, 10, 5)  # start, stop, num_points
print(f"Using linspace(0, 10, 5): {vector_linspace}")
print()

In [None]:
# Method 4: Special vectors
zeros_vector = np.zeros(5)
ones_vector = np.ones(5)
random_vector = np.random.random(5)

print(f"Zeros vector: {zeros_vector}")
print(f"Ones vector: {ones_vector}")
print(f"Random vector: {random_vector}")
print()

In [None]:
# Row vs Column vectors
row_vector = np.array([1, 2, 3])  # Shape: (3,)
column_vector = np.array([[1], [2], [3]])  # Shape: (3, 1)

print(f"Row vector: {row_vector}")
print(f"Row vector shape: {row_vector.shape}")
print()
print(f"Column vector:\n{column_vector}")
print(f"Column vector shape: {column_vector.shape}")

## Part 3: Basic Vector Operations

In [None]:
# Create two vectors for operations
vector_a = np.array([1, 2, 3, 4])
vector_b = np.array([5, 6, 7, 8])

print(f"Vector A: {vector_a}")
print(f"Vector B: {vector_b}")
print()

In [None]:
# Element-wise operations
addition = vector_a + vector_b
subtraction = vector_a - vector_b
multiplication = vector_a * vector_b  # Element-wise, not dot product!
division = vector_a / vector_b

print("=== Element-wise Operations ===")
print(f"Addition: {addition}")
print(f"Subtraction: {subtraction}")
print(f"Multiplication: {multiplication}")
print(f"Division: {division}")
print()

In [None]:
# Scalar operations
scalar = 3
scalar_mult = vector_a * scalar
scalar_add = vector_a + scalar

print("=== Scalar Operations ===")
print(f"Vector A * {scalar}: {scalar_mult}")
print(f"Vector A + {scalar}: {scalar_add}")
print()

In [None]:
# Vector properties
magnitude = np.linalg.norm(vector_a)  # ||v|| = sqrt(sum of squares)
sum_elements = np.sum(vector_a)
mean_value = np.mean(vector_a)
max_value = np.max(vector_a)
min_value = np.min(vector_a)

print("=== Vector Properties ===")
print(f"Magnitude (L2 norm): {magnitude:.3f}")
print(f"Sum of elements: {sum_elements}")
print(f"Mean value: {mean_value}")
print(f"Max value: {max_value}")
print(f"Min value: {min_value}")

## Part 4: Advanced Vector Operations

In [None]:
# Dot Product - VERY IMPORTANT in ML!
dot_product = np.dot(vector_a, vector_b)
# Alternative syntax
dot_product_alt = vector_a @ vector_b

print("=== Dot Product ===")
print(f"Dot product (np.dot): {dot_product}")
print(f"Dot product (@): {dot_product_alt}")
print(f"Manual calculation: {vector_a[0]*vector_b[0] + vector_a[1]*vector_b[1] + vector_a[2]*vector_b[2] + vector_a[3]*vector_b[3]}")
print()

In [None]:
# Vector normalization (creating unit vectors)
def normalize_vector(v):
    """Convert vector to unit vector (magnitude = 1)"""
    magnitude = np.linalg.norm(v)
    if magnitude == 0:
        return v
    return v / magnitude

unit_vector_a = normalize_vector(vector_a)
print("=== Vector Normalization ===")
print(f"Original vector A: {vector_a}")
print(f"Magnitude: {np.linalg.norm(vector_a):.3f}")
print(f"Unit vector A: {unit_vector_a}")
print(f"Unit vector magnitude: {np.linalg.norm(unit_vector_a):.3f}")
print()

In [None]:
# Distance calculations - crucial for ML algorithms
def euclidean_distance(v1, v2):
    """Calculate Euclidean distance between two vectors"""
    return np.linalg.norm(v1 - v2)

def manhattan_distance(v1, v2):
    """Calculate Manhattan distance between two vectors"""
    return np.sum(np.abs(v1 - v2))

def cosine_similarity(v1, v2):
    """Calculate cosine similarity between two vectors"""
    dot_product = np.dot(v1, v2)
    magnitude_product = np.linalg.norm(v1) * np.linalg.norm(v2)
    return dot_product / magnitude_product

print("=== Distance and Similarity Measures ===")
print(f"Euclidean distance: {euclidean_distance(vector_a, vector_b):.3f}")
print(f"Manhattan distance: {manhattan_distance(vector_a, vector_b):.3f}")
print(f"Cosine similarity: {cosine_similarity(vector_a, vector_b):.3f}")

## Part 5: Real-World ML Example - Customer Data

In [None]:
# Simulate customer data as vectors
# Features: [age, income, spending_score, loyalty_years]

customer_1 = np.array([25, 50000, 80, 2])    # Young, moderate income, high spending
customer_2 = np.array([45, 80000, 60, 5])    # Middle-aged, high income, moderate spending
customer_3 = np.array([35, 60000, 70, 3])    # Middle-aged, good income, good spending

customers = np.array([customer_1, customer_2, customer_3])

print("=== Customer Data (Vectors) ===")
print("Features: [age, income, spending_score, loyalty_years]")
print(f"Customer 1: {customer_1}")
print(f"Customer 2: {customer_2}")
print(f"Customer 3: {customer_3}")
print()

In [None]:
# Find most similar customers using cosine similarity
print("=== Customer Similarity Analysis ===")

similarity_1_2 = cosine_similarity(customer_1, customer_2)
similarity_1_3 = cosine_similarity(customer_1, customer_3)
similarity_2_3 = cosine_similarity(customer_2, customer_3)

print(f"Similarity between Customer 1 & 2: {similarity_1_2:.3f}")
print(f"Similarity between Customer 1 & 3: {similarity_1_3:.3f}")
print(f"Similarity between Customer 2 & 3: {similarity_2_3:.3f}")

# Find the most similar pair
similarities = [similarity_1_2, similarity_1_3, similarity_2_3]
pairs = ["1 & 2", "1 & 3", "2 & 3"]
max_similarity_idx = np.argmax(similarities)

print(f"\nMost similar customers: {pairs[max_similarity_idx]} (similarity: {similarities[max_similarity_idx]:.3f})")

In [None]:
# Calculate average customer profile
average_customer = np.mean(customers, axis=0)
print(f"\n=== Average Customer Profile ===")
print(f"Average profile: {average_customer}")
print(f"Average age: {average_customer[0]:.1f} years")
print(f"Average income: ${average_customer[1]:,.0f}")
print(f"Average spending score: {average_customer[2]:.1f}")
print(f"Average loyalty: {average_customer[3]:.1f} years")

## Part 6: Visualization

In [None]:
# Visualize 2D vectors
plt.figure(figsize=(12, 4))

# Plot 1: Vector visualization
plt.subplot(1, 3, 1)
vector_2d_a = np.array([3, 4])
vector_2d_b = np.array([1, 2])

plt.quiver(0, 0, vector_2d_a[0], vector_2d_a[1], angles='xy', scale_units='xy', scale=1, color='red', label='Vector A')
plt.quiver(0, 0, vector_2d_b[0], vector_2d_b[1], angles='xy', scale_units='xy', scale=1, color='blue', label='Vector B')
plt.xlim(-1, 5)
plt.ylim(-1, 5)
plt.grid(True)
plt.legend()
plt.title('2D Vectors')
plt.xlabel('X')
plt.ylabel('Y')

# Plot 2: Customer data (first 2 features)
plt.subplot(1, 3, 2)
ages = customers[:, 0]
incomes = customers[:, 1] / 1000  # Convert to thousands
plt.scatter(ages, incomes, c=['red', 'blue', 'green'], s=100)
plt.xlabel('Age')
plt.ylabel('Income (thousands)')
plt.title('Customer Data (Age vs Income)')
for i, (age, income) in enumerate(zip(ages, incomes)):
    plt.annotate(f'C{i+1}', (age, income), xytext=(5, 5), textcoords='offset points')

# Plot 3: Vector operations result
plt.subplot(1, 3, 3)
x = np.arange(len(vector_a))
plt.bar(x - 0.2, vector_a, 0.4, label='Vector A', alpha=0.7)
plt.bar(x + 0.2, vector_b, 0.4, label='Vector B', alpha=0.7)
plt.xlabel('Index')
plt.ylabel('Value')
plt.title('Vector Comparison')
plt.legend()

plt.tight_layout()
plt.show()

## Part 7: Practical Exercises

Complete these exercises to reinforce your understanding:

### Exercise 1: Vector Calculator
Create a comprehensive vector calculator class

In [None]:
class VectorCalculator:
    """A comprehensive vector calculator for ML operations"""
    
    @staticmethod
    def add(v1: np.ndarray, v2: np.ndarray) -> np.ndarray:
        """Add two vectors"""
        # TODO: Implement vector addition
        pass
    
    @staticmethod
    def subtract(v1: np.ndarray, v2: np.ndarray) -> np.ndarray:
        """Subtract second vector from first"""
        # TODO: Implement vector subtraction
        pass
    
    @staticmethod
    def dot_product(v1: np.ndarray, v2: np.ndarray) -> float:
        """Calculate dot product"""
        # TODO: Implement dot product
        pass
    
    @staticmethod
    def magnitude(v: np.ndarray) -> float:
        """Calculate vector magnitude"""
        # TODO: Implement magnitude calculation
        pass
    
    @staticmethod
    def normalize(v: np.ndarray) -> np.ndarray:
        """Normalize vector to unit vector"""
        # TODO: Implement normalization
        pass
    
    @staticmethod
    def angle_between(v1: np.ndarray, v2: np.ndarray) -> float:
        """Calculate angle between two vectors in radians"""
        # TODO: Implement angle calculation using cosine formula
        # Hint: cos(θ) = (v1 · v2) / (||v1|| * ||v2||)
        pass

# Test your implementation
test_v1 = np.array([1, 2, 3])
test_v2 = np.array([4, 5, 6])

# Uncomment these lines when you implement the methods
# print(f"Addition: {VectorCalculator.add(test_v1, test_v2)}")
# print(f"Dot product: {VectorCalculator.dot_product(test_v1, test_v2)}")
# print(f"Magnitude of v1: {VectorCalculator.magnitude(test_v1):.3f}")

### Exercise 2: Recommendation System
Build a simple recommendation system using vector similarity

In [None]:
# Movie rating data (user preferences as vectors)
# Features: [Action, Comedy, Drama, Horror, Romance] ratings (1-5 scale)

users = {
    'Alice': np.array([5, 2, 4, 1, 3]),
    'Bob': np.array([2, 5, 2, 1, 4]),
    'Carol': np.array([4, 3, 5, 2, 2]),
    'David': np.array([1, 4, 2, 5, 1]),
    'Eve': np.array([5, 1, 3, 1, 4])
}

def find_similar_users(target_user: str, users_dict: dict, top_k: int = 2):
    """Find most similar users based on cosine similarity"""
    target_vector = users_dict[target_user]
    similarities = []
    
    for user, vector in users_dict.items():
        if user != target_user:
            # TODO: Calculate cosine similarity
            similarity = cosine_similarity(target_vector, vector)
            similarities.append((user, similarity))
    
    # Sort by similarity (highest first)
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return similarities[:top_k]

# Test the recommendation system
target = 'Alice'
similar_users = find_similar_users(target, users)

print(f"Users most similar to {target}:")
for user, similarity in similar_users:
    print(f"{user}: {similarity:.3f}")

# TODO: Extend this to recommend movies based on similar users' preferences

### Exercise 3: Data Preprocessing
Practice common ML preprocessing tasks with vectors

In [None]:
# Simulate dataset of house features
# Features: [size_sqft, bedrooms, bathrooms, age_years, price]
house_data = np.array([
    [2000, 3, 2, 10, 300000],
    [1500, 2, 1, 20, 250000],
    [3000, 4, 3, 5, 500000],
    [1200, 2, 1, 30, 200000],
    [2500, 3, 2, 15, 400000]
])

print("Original house data:")
print(house_data)
print()

# TODO: Implement these preprocessing functions

def standardize_features(data: np.ndarray) -> np.ndarray:
    """Standardize features to have mean=0 and std=1"""
    # Formula: (x - mean) / std
    # TODO: Implement standardization
    pass

def normalize_features(data: np.ndarray) -> np.ndarray:
    """Normalize features to range [0, 1]"""
    # Formula: (x - min) / (max - min)
    # TODO: Implement min-max normalization
    pass

def calculate_feature_correlations(data: np.ndarray) -> np.ndarray:
    """Calculate correlation matrix between features"""
    # TODO: Use numpy's correlation function
    pass

# Test your implementations
# standardized_data = standardize_features(house_data)
# normalized_data = normalize_features(house_data)
# correlations = calculate_feature_correlations(house_data)

## Part 8: Key Takeaways and Next Steps

### What You've Learned:
1. ✅ **Vector Creation**: Multiple methods using NumPy
2. ✅ **Basic Operations**: Addition, subtraction, scalar multiplication
3. ✅ **Advanced Operations**: Dot products, normalization, distances
4. ✅ **Real Applications**: Customer analysis, recommendations
5. ✅ **Preprocessing**: Data standardization and normalization

### Essential Formulas to Remember:
- **Dot Product**: v₁ · v₂ = Σ(v₁ᵢ × v₂ᵢ)
- **Magnitude**: ||v|| = √(Σvᵢ²)
- **Cosine Similarity**: cos(θ) = (v₁ · v₂) / (||v₁|| × ||v₂||)
- **Euclidean Distance**: d = ||v₁ - v₂||

### Next Steps:
1. **Complete all exercises** in this notebook
2. **Practice with real datasets** (load CSV files, convert to vectors)
3. **Move to Module 2**: Matrix Operations
4. **Build a project**: Create a simple classifier using vector similarity

### Common ML Applications of Vectors:
- **Feature vectors** for machine learning models
- **Word embeddings** in natural language processing
- **Image pixels** as high-dimensional vectors
- **Recommendation systems** using user preference vectors
- **Clustering algorithms** based on vector distances

## Part 9: Self-Assessment Quiz

Test your understanding with these questions:

### Conceptual Questions:
1. Why is the dot product so important in machine learning?
2. When would you use cosine similarity vs Euclidean distance?
3. What's the difference between element-wise multiplication and dot product?
4. Why do we normalize vectors in ML?

### Practical Questions:
1. Given vectors [3, 4] and [6, 8], calculate their cosine similarity
2. Create a vector of 100 random numbers and normalize it
3. Find the closest customer to [30, 55000, 75, 4] from the customer dataset

### Coding Challenge:
Implement a function that takes a dataset and finds the k most similar rows to a query vector using your choice of similarity measure.