# Module 02: NumPy Fundamentals

**Estimated Time**: 60 minutes

## Learning Objectives

By the end of this module, you will:
- Understand why NumPy is essential for data science
- Create and manipulate NumPy arrays efficiently
- Perform vectorized operations for fast computations
- Master array indexing, slicing, and reshaping
- Use broadcasting for efficient array operations
- Apply NumPy for statistical analysis

## Prerequisites

- Module 01 completed
- Basic Python knowledge

---

## 1. Introduction to NumPy

**NumPy** (Numerical Python) is the foundation of scientific computing in Python.

### Why NumPy?

- **Speed**: 10-100x faster than Python lists
- **Memory efficient**: Uses less memory than lists
- **Vectorization**: Operations on entire arrays without loops
- **Foundation**: Pandas, SciPy, scikit-learn all built on NumPy

### Think of NumPy as:
- **Lists**: Like a notebook where you write values one by one
- **NumPy arrays**: Like a spreadsheet optimized for calculations

In [None]:
# Import NumPy (standard alias is 'np')
import numpy as np

print(f"NumPy version: {np.__version__}")

## 2. Creating NumPy Arrays

Arrays are the core data structure in NumPy.

In [None]:
# From Python lists
list_1d = [1, 2, 3, 4, 5]
array_1d = np.array(list_1d)

print("Python list:", list_1d)
print("NumPy array:", array_1d)
print("Type:", type(array_1d))

In [None]:
# 2D arrays (matrices)
list_2d = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
array_2d = np.array(list_2d)

print("2D Array:")
print(array_2d)
print(f"\nShape: {array_2d.shape}")  # (rows, columns)
print(f"Dimensions: {array_2d.ndim}")
print(f"Size (total elements): {array_2d.size}")
print(f"Data type: {array_2d.dtype}")

In [None]:
# Array creation functions
zeros = np.zeros((3, 4))  # 3x4 array of zeros
ones = np.ones((2, 3))  # 2x3 array of ones
empty = np.empty((2, 2))  # Uninitialized (faster than zeros)
full = np.full((3, 3), 7)  # 3x3 array filled with 7

print("Zeros:")
print(zeros)
print("\nFull of 7s:")
print(full)

In [None]:
# Range-based arrays
range_array = np.arange(0, 10, 2)  # Start, stop, step
linspace = np.linspace(0, 1, 5)  # 5 evenly spaced numbers from 0 to 1

print("Range (0 to 10, step 2):", range_array)
print("Linspace (5 points 0 to 1):", linspace)

# Identity matrix
identity = np.eye(3)  # 3x3 identity matrix
print("\nIdentity matrix:")
print(identity)

In [None]:
# Random arrays (crucial for data science)
np.random.seed(42)  # For reproducibility

random_uniform = np.random.rand(3, 3)  # Uniform [0, 1)
random_normal = np.random.randn(3, 3)  # Standard normal distribution
random_int = np.random.randint(0, 100, size=(3, 3))  # Random integers

print("Random uniform [0, 1):")
print(random_uniform)
print("\nRandom integers [0, 100):")
print(random_int)

## 3. Array Operations (Vectorization)

Perform operations on entire arrays without writing loops!

In [None]:
# Arithmetic operations
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print("a:", a)
print("b:", b)
print("\nElement-wise operations:")
print("a + b:", a + b)
print("a * b:", a * b)
print("b / a:", b / a)
print("a ** 2:", a**2)

In [None]:
# Scalar operations
data = np.array([10, 20, 30, 40, 50])

print("Original:", data)
print("Add 5:", data + 5)
print("Multiply by 2:", data * 2)
print("Square:", data**2)

# Temperature conversion: Celsius to Fahrenheit
celsius = np.array([0, 10, 20, 30, 40])
fahrenheit = celsius * 9 / 5 + 32
print("\nCelsius:", celsius)
print("Fahrenheit:", fahrenheit)

In [None]:
# Universal functions (ufuncs)
angles = np.array([0, 30, 45, 60, 90])
radians = np.deg2rad(angles)

print("Angles:", angles)
print("sin(angles):", np.sin(radians))
print("cos(angles):", np.cos(radians))

# Other useful ufuncs
data = np.array([1, 4, 9, 16, 25])
print("\nData:", data)
print("Square root:", np.sqrt(data))
print("Natural log:", np.log(data))
print("Exponential:", np.exp([1, 2, 3]))

## 4. Indexing and Slicing

Access and modify array elements efficiently.

In [None]:
# 1D array indexing
arr = np.array([10, 20, 30, 40, 50])

print("Array:", arr)
print("First element:", arr[0])
print("Last element:", arr[-1])
print("First three:", arr[:3])
print("Last two:", arr[-2:])
print("Every other:", arr[::2])

In [None]:
# 2D array indexing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("Matrix:")
print(matrix)
print("\nElement at row 1, col 2:", matrix[1, 2])  # 6
print("First row:", matrix[0, :])  # [1, 2, 3]
print("Last column:", matrix[:, -1])  # [3, 6, 9]
print("2x2 subarray:\n", matrix[:2, :2])  # Top-left corner

In [None]:
# Boolean indexing (filtering)
data = np.array([10, 25, 30, 15, 40, 5])

print("Data:", data)
print("Values > 20:", data[data > 20])
print("Values between 10 and 30:", data[(data >= 10) & (data <= 30)])

# Modify based on condition
data[data < 15] = 0
print("After setting <15 to 0:", data)

## 5. Array Manipulation

Reshape, transpose, and combine arrays.

In [None]:
# Reshaping
arr = np.arange(12)
print("Original (12 elements):", arr)

# Reshape to 3x4
reshaped = arr.reshape(3, 4)
print("\nReshaped to 3x4:")
print(reshaped)

# Reshape to 2x6
reshaped2 = arr.reshape(2, 6)
print("\nReshaped to 2x6:")
print(reshaped2)

# Flatten back to 1D
flattened = reshaped.flatten()
print("\nFlattened:", flattened)

In [None]:
# Transpose
matrix = np.array([[1, 2, 3], [4, 5, 6]])

print("Original (2x3):")
print(matrix)
print("\nTransposed (3x2):")
print(matrix.T)

In [None]:
# Stacking arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertical stack (row-wise)
vstacked = np.vstack([a, b])
print("Vertical stack:")
print(vstacked)

# Horizontal stack (column-wise)
hstacked = np.hstack([a, b])
print("\nHorizontal stack:")
print(hstacked)

## 6. Statistical Operations

NumPy makes statistics easy!

In [None]:
# Basic statistics
data = np.array([23, 45, 67, 34, 89, 12, 56])

print("Data:", data)
print("\nBasic Stats:")
print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Std Dev: {np.std(data):.2f}")
print(f"Variance: {np.var(data):.2f}")
print(f"Min: {np.min(data)}")
print(f"Max: {np.max(data)}")
print(f"Sum: {np.sum(data)}")

In [None]:
# Statistics along axes (for 2D arrays)
sales = np.array([[100, 150, 200], [120, 140, 180], [110, 160, 210]])  # Week 1  # Week 2  # Week 3

print("Sales (3 weeks x 3 days):")
print(sales)

print("\nDaily averages (column means):", np.mean(sales, axis=0))
print("Weekly totals (row sums):", np.sum(sales, axis=1))
print("Best day overall:", np.max(sales))
print("Overall average:", np.mean(sales))

## 7. Broadcasting

NumPy's powerful feature for operating on arrays of different shapes.

In [None]:
# Broadcasting example
# Normalize data: (value - mean) / std
data = np.array([10, 20, 30, 40, 50])

mean = np.mean(data)
std = np.std(data)

normalized = (data - mean) / std  # Broadcasting!

print("Original:", data)
print(f"Mean: {mean}, Std: {std:.2f}")
print("Normalized:", normalized)
print("New mean:", np.mean(normalized))
print("New std:", np.std(normalized))

## 8. Practical Example: Sales Analysis

Let's apply everything we've learned!

In [None]:
# Monthly sales for 4 products over 6 months
np.random.seed(42)
sales = np.random.randint(100, 500, size=(4, 6))
products = ["Widget", "Gadget", "Gizmo", "Doohickey"]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]

print("Sales Data (Products x Months):")
print(sales)

# Analysis
print("\n=== ANALYSIS ===")
print(f"Total sales: ${np.sum(sales)}")
print(f"Average monthly sales: ${np.mean(sales):.2f}")

# Best performing product
product_totals = np.sum(sales, axis=1)
best_product_idx = np.argmax(product_totals)
print(f"\nBest product: {products[best_product_idx]} (${product_totals[best_product_idx]})")

# Best month
monthly_totals = np.sum(sales, axis=0)
best_month_idx = np.argmax(monthly_totals)
print(f"Best month: {months[best_month_idx]} (${monthly_totals[best_month_idx]})")

# Growth analysis
growth = ((monthly_totals[-1] - monthly_totals[0]) / monthly_totals[0]) * 100
print(f"\nGrowth (Jan to Jun): {growth:.1f}%")

## 9. Exercises

In [None]:
# Exercise 1: Create a 5x5 array of random integers (0-100)
# TODO: Find max value in each row
# TODO: Find min value in each column
# TODO: Calculate the overall mean

# Your code here

In [None]:
# Exercise 2: Temperature conversion
# TODO: Create array of temperatures: [0, 10, 20, 30, 40, 50] Celsius
# TODO: Convert to Fahrenheit
# TODO: Find how many days are above 80°F

# Your code here

## 10. Key Takeaways

Outstanding work! You've mastered NumPy fundamentals:

✓ **Array creation**: zeros, ones, arange, linspace, random  
✓ **Vectorization**: Fast operations without loops  
✓ **Indexing**: Access and modify elements efficiently  
✓ **Reshaping**: Change array dimensions  
✓ **Statistics**: Built-in statistical functions  
✓ **Broadcasting**: Operations on different-shaped arrays  

## Next Steps

**Next Module**: `03_pandas_basics.ipynb`

Pandas builds on NumPy to work with real-world tabular data!

---