<a href="https://colab.research.google.com/github/aaniaahh/DataScience-2025/blob/main/Completed/05-Foundations/08_timing_and_performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⏱️ 08 - Timing and Performance
In data science, performance matters. Some code runs fast, some slow.
Jupyter/IPython gives us tools to measure runtime easily.

In this notebook you will learn:
* `%time` and `%timeit` for single expressions
* `%%time` and `%%timeit` for entire cells
* Comparing loops, list comprehensions, and NumPy
* Why performance awareness is important

## 1. `%time`

In [None]:
%time sum(range(1_000_000))

✅ **Your Turn:** Use `%time` to measure how long it takes to sort a list of 1 million random numbers.

In [1]:
import random

# Create a list of 1 million random numbers
nums = [random.random() for _ in range(1_000_000)]

# Measure the time it takes to sort the list
%time sorted_nums = sorted(nums)

CPU times: user 446 ms, sys: 4.99 ms, total: 451 ms
Wall time: 456 ms


## 2. `%timeit`

In [None]:
numbers = list(range(1_000))
%timeit [x**2 for x in numbers]

✅ **Your Turn:** Compare `%timeit` results for a list comprehension vs. a `for` loop that builds the same list.

In [2]:
# List comprehension
%timeit [i*i for i in range(1_000_000)]

# For loop
%%timeit
squares = []
for i in range(1_000_000):
    squares.append(i**2)

42.8 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


UsageError: Line magic function `%%timeit` not found.


## 3. `%%time` for a Whole Cell

In [None]:
%%timeit
total = 0
for i in range(1_000_000):
    total += i
total

✅ **Your Turn:** Wrap a longer multi-line operation with `%%time` to measure its runtime.

In [3]:
%%time
import random

# Create a large list of random numbers
nums = [random.random() for _ in range(1_000_000)]

# Sort the list
sorted_nums = sorted(nums)

# Sum the first 100,000 numbers
total = sum(sorted_nums[:100_000])

print("Sum of first 100,000 numbers:", total)


Sum of first 100,000 numbers: 5026.8539437511945
CPU times: user 409 ms, sys: 13.1 ms, total: 422 ms
Wall time: 422 ms


## 4. Comparing Loops vs. NumPy

In [None]:
import numpy as np

numbers = np.arange(1_000_000)

# Python loop
%timeit [x**2 for x in numbers]

# NumPy vectorized
%timeit numbers**2

✅ **Your Turn:** Try squaring numbers with a Python loop, list comprehension, and NumPy array. Compare times.

In [4]:
import numpy as np
import time

# Size of data
N = 500

# 1️⃣ Python for loop
nums = list(range(N))
start = time.time()
squares_loop = []
for x in nums:
    squares_loop.append(x**2)
end = time.time()
print("For loop time:", end - start, "seconds")

# 2️⃣ List comprehension
start = time.time()
squares_lc = [x**2 for x in nums]
end = time.time()
print("List comprehension time:", end - start, "seconds")

# 3️⃣ NumPy array
arr = np.arange(N)
start = time.time()
squares_np = arr**2
end = time.time()
print("NumPy array time:", end - start, "seconds")


For loop time: 0.00020551681518554688 seconds
List comprehension time: 0.00014591217041015625 seconds
NumPy array time: 8.96453857421875e-05 seconds


## 5. Why This Matters
* Performance differences become huge with large datasets.
* Vectorized operations (like NumPy, Pandas) are usually faster.
* `%timeit` is your friend when deciding how to implement something.

### Summary
* `%time` and `%timeit` measure execution speed.
* `%%time` and `%%timeit` work on whole cells.
* Loops are slower than list comprehensions, which are slower than NumPy.
* Always measure performance before optimizing.