# 06. Performance Optimization

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## ðŸ”— Prerequisites

- âœ… Basic Python
- âœ… Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 05, Unit 5** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


## The Story | Ø§Ù„Ù‚ØµØ©

**BEFORE**: You can process data but don't know how to optimize performance for large-scale operations.

**AFTER**: You'll learn performance optimization: profiling, caching, parallelization, and making code run efficiently!

**Why this matters**: Performance Optimization is essential for building complete, professional data science solutions!

---


# Unit 5 - Example 06: Performance Optimization

## ðŸ”— Solving the Problem from Example 04 | Ø­Ù„ Ø§Ù„Ù…Ø´ÙƒÙ„Ø© Ù…Ù† Ø§Ù„Ù…Ø«Ø§Ù„ 16

**Remember the dead end from Example 04?**
- We learned to build production pipelines
- But pipelines were slow - we needed optimization
- We needed performance optimization techniques

**This notebook solves that problem!**
- We'll learn **performance optimization techniques**
- We'll learn **profiling and bottleneck identification**
- We'll learn **code and memory optimization**

**This solves the performance problem from Example 04!**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import cProfile
import pstats
from io import StringIO

print("âœ“ Imports ready")


âœ“ Imports ready


In [2]:
print("=" * 70)
print("Example 06: Performance Optimization | ØªØ­Ø³ÙŠÙ† Ø§Ù„Ø£Ø¯Ø§Ø¡")
print("=" * 70)
print("\nðŸ“š Prerequisites: Examples 02-05 completed, performance knowledge")
print("ðŸ”— This is the FOURTH example in Unit 5 - performance optimization")
print("ðŸŽ¯ Goal: Master performance profiling and optimization")
print("Reference: Study 17.pdf before running this code example.\n")


Example 06: Performance Optimization | ØªØ­Ø³ÙŠÙ† Ø§Ù„Ø£Ø¯Ø§Ø¡

ðŸ“š Prerequisites: Examples 02-05 completed, performance knowledge
ðŸ”— This is the FOURTH example in Unit 5 - performance optimization
ðŸŽ¯ Goal: Master performance profiling and optimization
Reference: Study 17.pdf before running this code example.



# 06. CREATE DATASET FOR OPTIMIZATION DEMO

In [3]:
print("\n1. Creating Dataset")
print("-" * 70)
np.random.seed(42)
n_samples = 100000
data = {
'id': range(n_samples), 'value1': np.random.randn(n_samples),
'value2': np.random.randn(n_samples), 'category': np.random.choice(['A', 'B', 'C', 'D'], n_samples),
'score': np.random.randint(0, 100, n_samples)
}
df = pd.DataFrame(data)
print(f"âœ“ Created dataset with {len(df):,} rows")


1. Creating Dataset
----------------------------------------------------------------------
âœ“ Created dataset with 100,000 rows


2. PERFORMANCE PROFILING


In [4]:
print("\n\n2. Performance Profiling")
print("-" * 70)
def slow_operation(df):
    """Inefficient operation"""
    result = []
    for idx, row in df.iterrows():
        result.append(row['value1'] * row['value2'])
    return pd.Series(result)
def fast_operation(df):
    """Optimized operation"""
    return df['value1'] * df['value2']
# Profile slow operation
print("\nProfiling slow operation (iterrows)...")
start_time = time.time()
result_slow = slow_operation(df.head(1000))  # Use subset for demo
slow_time = time.time() - start_time
print(f"Slow operation time: {slow_time:.4f} seconds")
# Profile fast operation
print("\nProfiling fast operation (vectorized)...")
start_time = time.time()
result_fast = fast_operation(df.head(1000))
fast_time = time.time() - start_time
print(f"Fast operation time: {fast_time:.4f} seconds")
print(f"Speedup: {slow_time/fast_time:.2f}x")



2. Performance Profiling
----------------------------------------------------------------------

Profiling slow operation (iterrows)...
Slow operation time: 0.0073 seconds

Profiling fast operation (vectorized)...
Fast operation time: 0.0001 seconds
Speedup: 58.67x


3. MEMORY OPTIMIZATION


In [5]:
print("\n\n3. Memory Optimization")
print("-" * 70)
print("\nOriginal memory usage:")
print(f"Memory: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
# Optimize data types
df_optimized = df.copy()
df_optimized['score'] = df_optimized['score'].astype('int8')
df_optimized['category'] = df_optimized['category'].astype('category')
print("\nOptimized memory usage:")
print(f"Memory: {df_optimized.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
print(f"Reduction: {(1 - df_optimized.memory_usage(deep=True).sum() / df.memory_usage(deep=True).sum()) * 100:.1f}%")



3. Memory Optimization
----------------------------------------------------------------------

Original memory usage:
Memory: 7.82 MB

Optimized memory usage:
Memory: 2.48 MB
Reduction: 68.3%


4. VISUALIZATION


In [6]:
print("\n\n4. Creating Optimization Visualization")
print("-" * 70)
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.suptitle('Performance Optimization')
# Speed comparison
ops = ['Iterrows\n()', 'Vectorized\n()']
times = [slow_time, fast_time]
colors = ['#FF6B6B', '#4ECDC4']
axes[0].bar(ops, times, color=colors, edgecolor='black')
axes[0].set_ylabel('Time (seconds)')
axes[0].set_title('Operation Speed Comparison', fontsize=12, weight='bold')
axes[0].grid(True, alpha=0.3, axis='y')
# Memory comparison
memory_original = df.memory_usage(deep=True).sum() / 1024**2
memory_optimized = df_optimized.memory_usage(deep=True).sum() / 1024**2
axes[1].bar(['Original\n', 'Optimized\n'],
[memory_original, memory_optimized],
color=['#FF6B6B', '#4ECDC4'], edgecolor='black')
axes[1].set_ylabel('Memory (MB)')
axes[1].set_title('Memory Usage Comparison', fontsize=12, weight='bold')
axes[1].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('17_optimization.png', dpi=300, bbox_inches = 'tight')
print("âœ“ Optimization visualization saved")
plt.close()



4. Creating Optimization Visualization
----------------------------------------------------------------------


âœ“ Optimization visualization saved


# 5. SUMMARY


In [7]:
print("\n" + "=" * 70)
print("Summary")
print("=" * 70)
print("\nKey Concepts Covered:")
print("1. Performance profiling")
print("2. Identifying bottlenecks")
print("3. Vectorization")
print("4. Memory optimization")
print("\nNext Steps: Continue to Example 07 for Large Dataset Handling")



Summary

Key Concepts Covered:
1. Performance profiling
2. Identifying bottlenecks
3. Vectorization
4. Memory optimization

Next Steps: Continue to Example 07 for Large Dataset Handling
