# Module 6: NumPy Fundamentals

## Topics Covered
1. Introduction to NumPy and Arrays
2. Creating Arrays (various methods)
3. Array Indexing and Slicing
4. Array Operations and Broadcasting
5. Mathematical Functions
6. Statistical Functions
7. Reshaping and Manipulating Arrays
8. Random Number Generation

## Learning Objectives

By the end of this module, you will be able to:
- Understand why NumPy is essential for data science
- Create and manipulate NumPy arrays efficiently
- Perform vectorized operations without loops
- Apply mathematical and statistical functions to arrays
- Reshape and transform arrays for analysis
- Generate random data for simulations and testing

---

---
# Section 1: Introduction to NumPy and Arrays
---

## What is NumPy?

NumPy (Numerical Python) is the foundational package for scientific computing in Python. It provides:

- **ndarray**: A powerful N-dimensional array object
- **Vectorized operations**: Fast element-wise computations
- **Mathematical functions**: Linear algebra, statistics, and more
- **Broadcasting**: Operations between arrays of different shapes

### Why This Matters in Data Science

NumPy is the backbone of Python's data science ecosystem:
- **pandas** is built on NumPy arrays
- **scikit-learn** uses NumPy for all computations
- **matplotlib** plots NumPy arrays
- **TensorFlow/PyTorch** integrate with NumPy

Understanding NumPy is essential for efficient data manipulation and analysis.

In [1]:
# First, let's import NumPy
# The convention is to import it as 'np'

import numpy as np

print(f"NumPy version: {np.__version__}")

NumPy version: 2.4.0


## NumPy Arrays vs Python Lists

While Python lists are flexible, NumPy arrays are:
- **Faster**: Operations are implemented in C
- **Memory efficient**: Homogeneous data types
- **Convenient**: Built-in mathematical operations

In [2]:
# Example: Creating a NumPy array from a Python list

python_list = [1, 2, 3, 4, 5]
numpy_array = np.array([1, 2, 3, 4, 5])

print(f"Python list: {python_list}")
print(f"Type: {type(python_list)}")
print()
print(f"NumPy array: {numpy_array}")
print(f"Type: {type(numpy_array)}")

Python list: [1, 2, 3, 4, 5]
Type: <class 'list'>

NumPy array: [1 2 3 4 5]
Type: <class 'numpy.ndarray'>


In [3]:
# Example: Speed comparison - multiplying each element by 2

import time

# Create large data
size = 1_000_000
python_list = list(range(size))
numpy_array = np.arange(size)

# Time Python list operation
start = time.time()
result_list = [x * 2 for x in python_list]
list_time = time.time() - start

# Time NumPy operation
start = time.time()
result_array = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list: {list_time:.4f} seconds")
print(f"NumPy array: {numpy_time:.4f} seconds")
print(f"NumPy is {list_time/numpy_time:.1f}x faster!")

Python list: 0.1239 seconds
NumPy array: 0.0123 seconds
NumPy is 10.1x faster!


In [4]:
# Example: Array attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(f"Array:\n{arr}")
print()
print(f"Shape: {arr.shape}")      # Dimensions (rows, columns)
print(f"Dimensions: {arr.ndim}")  # Number of dimensions
print(f"Size: {arr.size}")        # Total number of elements
print(f"Data type: {arr.dtype}") # Type of elements

Array:
[[1 2 3]
 [4 5 6]]

Shape: (2, 3)
Dimensions: 2
Size: 6
Data type: int64


## Data Types in NumPy

NumPy arrays have a single data type for all elements. Common types:

| Type | Description |
|------|-------------|
| `int32`, `int64` | Integer types |
| `float32`, `float64` | Floating point |
| `bool` | Boolean |
| `complex` | Complex numbers |
| `object` | Python objects |

In [5]:
# Example: Data types

int_arr = np.array([1, 2, 3])
float_arr = np.array([1.0, 2.0, 3.0])
bool_arr = np.array([True, False, True])

print(f"Integer array: {int_arr}, dtype: {int_arr.dtype}")
print(f"Float array: {float_arr}, dtype: {float_arr.dtype}")
print(f"Boolean array: {bool_arr}, dtype: {bool_arr.dtype}")

# Specifying dtype explicitly
explicit_float = np.array([1, 2, 3], dtype=np.float64)
print(f"\nExplicit float64: {explicit_float}, dtype: {explicit_float.dtype}")

Integer array: [1 2 3], dtype: int64
Float array: [1. 2. 3.], dtype: float64
Boolean array: [ True False  True], dtype: bool

Explicit float64: [1. 2. 3.], dtype: float64


## Practice Exercise 1.1

**Task:** Create a NumPy array containing the values 10, 20, 30, 40, 50. Print its shape, size, and data type. Then create another array with the same values but as floats.

**Expected Output:**
```
Array: [10 20 30 40 50]
Shape: (5,)
Size: 5
Data type: int64

Float array: [10. 20. 30. 40. 50.]
Data type: float64
```

In [None]:
# Your code here


In [6]:
# Solution 1.1

import numpy as np

# Create integer array
arr = np.array([10, 20, 30, 40, 50])

print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Size: {arr.size}")
print(f"Data type: {arr.dtype}")

# Create float array
float_arr = np.array([10, 20, 30, 40, 50], dtype=np.float64)
print(f"\nFloat array: {float_arr}")
print(f"Data type: {float_arr.dtype}")

Array: [10 20 30 40 50]
Shape: (5,)
Size: 5
Data type: int64

Float array: [10. 20. 30. 40. 50.]
Data type: float64


---
# Section 2: Creating Arrays
---

NumPy provides many ways to create arrays. Choosing the right method makes your code cleaner and more efficient.

## Syntax

```python
# From Python sequences
np.array(list_or_tuple)

# Filled with values
np.zeros(shape)       # All zeros
np.ones(shape)        # All ones
np.full(shape, value) # All same value
np.empty(shape)       # Uninitialized

# Sequences
np.arange(start, stop, step)      # Like range()
np.linspace(start, stop, num)     # Evenly spaced

# Special arrays
np.eye(n)             # Identity matrix
np.diag(values)       # Diagonal matrix
```

In [None]:
# Example: Creating arrays with zeros and ones

import numpy as np

# 1D arrays
zeros_1d = np.zeros(5)
ones_1d = np.ones(5)

print("1D Arrays:")
print(f"Zeros: {zeros_1d}")
print(f"Ones: {ones_1d}")

# 2D arrays (pass shape as tuple)
zeros_2d = np.zeros((3, 4))  # 3 rows, 4 columns
ones_2d = np.ones((2, 3))

print("\n2D Arrays:")
print(f"Zeros (3x4):\n{zeros_2d}")
print(f"\nOnes (2x3):\n{ones_2d}")

In [None]:
# Example: Creating arrays with specific values

# Fill with a specific value
fives = np.full((3, 3), 5)
print(f"Array of 5s:\n{fives}")

# Fill with a float value
pi_array = np.full((2, 4), 3.14159)
print(f"\nArray of pi:\n{pi_array}")

In [None]:
# Example: Creating sequences with arange

# Similar to Python's range()
arr1 = np.arange(10)          # 0 to 9
arr2 = np.arange(5, 15)       # 5 to 14
arr3 = np.arange(0, 20, 2)    # Even numbers 0-18
arr4 = np.arange(10, 0, -1)   # Countdown

print(f"0 to 9: {arr1}")
print(f"5 to 14: {arr2}")
print(f"Even 0-18: {arr3}")
print(f"Countdown: {arr4}")

# Works with floats too
float_range = np.arange(0, 1, 0.1)
print(f"\nFloat range: {float_range}")

In [None]:
# Example: Creating evenly spaced values with linspace

# linspace: specify number of points (includes endpoint by default)
arr1 = np.linspace(0, 10, 5)     # 5 points from 0 to 10
arr2 = np.linspace(0, 1, 11)     # 11 points from 0 to 1
arr3 = np.linspace(0, 100, 5)    # 5 points from 0 to 100

print(f"5 points, 0 to 10: {arr1}")
print(f"11 points, 0 to 1: {arr2}")
print(f"5 points, 0 to 100: {arr3}")

# Useful for creating data for plotting
x = np.linspace(0, 2 * np.pi, 100)  # 100 points for a smooth curve
print(f"\nPoints for plotting: {len(x)} points from 0 to {2*np.pi:.4f}")

In [None]:
# Example: Identity and diagonal matrices

# Identity matrix (1s on diagonal, 0s elsewhere)
identity = np.eye(4)
print(f"4x4 Identity matrix:\n{identity}")

# Diagonal matrix from values
diag = np.diag([1, 2, 3, 4])
print(f"\nDiagonal matrix:\n{diag}")

# Extract diagonal from a matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
diagonal_values = np.diag(matrix)
print(f"\nMatrix:\n{matrix}")
print(f"Diagonal: {diagonal_values}")

In [None]:
# Example: Creating arrays like existing arrays

original = np.array([[1, 2, 3], [4, 5, 6]])

# Create arrays with same shape
zeros_like = np.zeros_like(original)
ones_like = np.ones_like(original)
full_like = np.full_like(original, 99)

print(f"Original:\n{original}")
print(f"\nZeros like:\n{zeros_like}")
print(f"\nOnes like:\n{ones_like}")
print(f"\nFull like (99):\n{full_like}")

## Practice Exercise 2.1

**Task:** Create the following arrays:
1. A 4x4 array of zeros
2. An array of odd numbers from 1 to 19
3. An array of 6 evenly spaced values from 0 to 5
4. A 3x3 identity matrix

In [None]:
# Your code here


In [None]:
# Solution 2.1

import numpy as np

# 1. 4x4 zeros
zeros_4x4 = np.zeros((4, 4))
print(f"4x4 zeros:\n{zeros_4x4}")

# 2. Odd numbers 1 to 19
odds = np.arange(1, 20, 2)
print(f"\nOdd numbers 1-19: {odds}")

# 3. 6 evenly spaced values from 0 to 5
spaced = np.linspace(0, 5, 6)
print(f"\n6 values from 0 to 5: {spaced}")

# 4. 3x3 identity matrix
identity = np.eye(3)
print(f"\n3x3 identity:\n{identity}")

---
# Section 3: Array Indexing and Slicing
---

Accessing and selecting data from arrays is fundamental to data analysis. NumPy provides powerful indexing capabilities.

## Syntax

```python
# 1D array indexing
arr[index]           # Single element
arr[start:stop]      # Slice
arr[start:stop:step] # Slice with step

# 2D array indexing
arr[row, col]        # Single element
arr[row]             # Entire row
arr[:, col]          # Entire column
arr[r1:r2, c1:c2]    # Subarray

# Boolean indexing
arr[condition]       # Elements where condition is True

# Fancy indexing
arr[[0, 2, 4]]       # Select specific indices
```

In [None]:
# Example: 1D array indexing

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(f"Array: {arr}")
print()

# Single element access
print(f"First element (index 0): {arr[0]}")
print(f"Last element (index -1): {arr[-1]}")
print(f"Third element (index 2): {arr[2]}")

# Slicing
print(f"\nFirst 3 elements: {arr[:3]}")
print(f"Last 3 elements: {arr[-3:]}")
print(f"Elements 2-5: {arr[2:6]}")
print(f"Every other element: {arr[::2]}")
print(f"Reversed: {arr[::-1]}")

In [None]:
# Example: 2D array indexing

arr = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])
print(f"2D Array:\n{arr}")
print(f"Shape: {arr.shape}")
print()

# Single element access
print(f"Element at row 0, col 0: {arr[0, 0]}")
print(f"Element at row 1, col 2: {arr[1, 2]}")
print(f"Element at row 2, col 3: {arr[2, 3]}")

In [None]:
# Example: Selecting rows and columns

arr = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

# Selecting rows
print(f"Row 0: {arr[0]}")
print(f"Row 1: {arr[1]}")
print(f"Last row: {arr[-1]}")

# Selecting columns (use : for all rows)
print(f"\nColumn 0: {arr[:, 0]}")
print(f"Column 2: {arr[:, 2]}")
print(f"Last column: {arr[:, -1]}")

In [None]:
# Example: Slicing 2D arrays (subarrays)

arr = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16]
])
print(f"Original array:\n{arr}")

# Extract subarrays
top_left = arr[:2, :2]      # First 2 rows, first 2 columns
bottom_right = arr[2:, 2:]  # Last 2 rows, last 2 columns
middle = arr[1:3, 1:3]      # Middle 2x2

print(f"\nTop-left 2x2:\n{top_left}")
print(f"\nBottom-right 2x2:\n{bottom_right}")
print(f"\nMiddle 2x2:\n{middle}")

In [None]:
# Example: Boolean indexing (filtering)

arr = np.array([15, 22, 8, 45, 33, 12, 28, 5, 41, 19])
print(f"Array: {arr}")

# Create boolean mask
mask = arr > 20
print(f"\nMask (arr > 20): {mask}")

# Apply mask to get filtered values
filtered = arr[mask]
print(f"Values > 20: {filtered}")

# Can do it in one step
print(f"\nValues < 15: {arr[arr < 15]}")
print(f"Even values: {arr[arr % 2 == 0]}")

In [None]:
# Example: Combining conditions

arr = np.array([5, 12, 18, 25, 8, 31, 15, 22, 9, 28])
print(f"Array: {arr}")

# Use & for AND, | for OR (must use parentheses!)
between_10_and_25 = arr[(arr >= 10) & (arr <= 25)]
less_than_10_or_greater_than_25 = arr[(arr < 10) | (arr > 25)]

print(f"\nBetween 10 and 25: {between_10_and_25}")
print(f"Less than 10 or greater than 25: {less_than_10_or_greater_than_25}")

In [None]:
# Example: Fancy indexing (selecting specific indices)

arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(f"Array: {arr}")

# Select specific indices
indices = [0, 2, 5, 8]
selected = arr[indices]
print(f"\nIndices {indices}: {selected}")

# Can use np.array for indices too
idx_array = np.array([1, 3, 5, 7])
print(f"Indices {list(idx_array)}: {arr[idx_array]}")

In [None]:
# Example: Modifying array elements

arr = np.array([1, 2, 3, 4, 5])
print(f"Original: {arr}")

# Modify single element
arr[0] = 100
print(f"After arr[0] = 100: {arr}")

# Modify slice
arr[1:4] = [200, 300, 400]
print(f"After modifying slice: {arr}")

# Modify with boolean indexing
arr2 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
arr2[arr2 > 5] = 0  # Set all values > 5 to 0
print(f"\nAfter setting >5 to 0: {arr2}")

## Practice Exercise 3.1

**Task:** Given the sales data array below:
1. Extract the first week (first 7 values)
2. Find all days with sales above 150
3. Replace all sales below 100 with 100 (minimum threshold)

```python
daily_sales = np.array([120, 85, 200, 175, 95, 160, 140, 110, 190, 88, 155, 130, 145, 210])
```

In [None]:
# Your code here


In [None]:
# Solution 3.1

import numpy as np

daily_sales = np.array([120, 85, 200, 175, 95, 160, 140, 110, 190, 88, 155, 130, 145, 210])
print(f"Daily sales: {daily_sales}")

# 1. First week
first_week = daily_sales[:7]
print(f"\n1. First week: {first_week}")

# 2. Days with sales above 150
above_150 = daily_sales[daily_sales > 150]
print(f"2. Sales above 150: {above_150}")

# 3. Replace sales below 100 with 100
sales_adjusted = daily_sales.copy()  # Make a copy to preserve original
sales_adjusted[sales_adjusted < 100] = 100
print(f"3. After minimum threshold: {sales_adjusted}")

---
# Section 4: Array Operations and Broadcasting
---

## Vectorized Operations

NumPy's power comes from vectorized operations - applying operations to entire arrays without loops.

### Why This Matters

- **Faster**: Operations are implemented in optimized C code
- **Cleaner**: No explicit loops needed
- **Readable**: Code expresses intent clearly

In [None]:
# Example: Element-wise arithmetic operations

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

print(f"a = {a}")
print(f"b = {b}")
print()

# Element-wise operations
print(f"a + b = {a + b}")
print(f"a - b = {a - b}")
print(f"a * b = {a * b}")
print(f"b / a = {b / a}")
print(f"a ** 2 = {a ** 2}")

In [None]:
# Example: Operations with scalars

arr = np.array([10, 20, 30, 40, 50])
print(f"Original: {arr}")

# Scalar operations apply to all elements
print(f"arr + 5: {arr + 5}")
print(f"arr * 2: {arr * 2}")
print(f"arr / 10: {arr / 10}")
print(f"arr ** 2: {arr ** 2}")
print(f"100 - arr: {100 - arr}")

In [None]:
# Example: Practical use case - unit conversion

# Temperature in Celsius
temps_celsius = np.array([0, 10, 20, 25, 30, 37, 100])

# Convert to Fahrenheit: F = C * 9/5 + 32
temps_fahrenheit = temps_celsius * 9/5 + 32

print("Temperature Conversion:")
print("-" * 30)
for c, f in zip(temps_celsius, temps_fahrenheit):
    print(f"{c:3}°C = {f:6.1f}°F")

In [None]:
# Example: Comparison operations (return boolean arrays)

arr = np.array([1, 5, 10, 15, 20, 25])
print(f"Array: {arr}")

print(f"\narr > 10: {arr > 10}")
print(f"arr == 15: {arr == 15}")
print(f"arr <= 10: {arr <= 10}")

# Comparing two arrays
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
print(f"\na = {a}")
print(f"b = {b}")
print(f"a > b: {a > b}")
print(f"a == b: {a == b}")

## Broadcasting

Broadcasting allows NumPy to work with arrays of different shapes. NumPy automatically "broadcasts" smaller arrays to match larger ones.

In [None]:
# Example: Broadcasting basics

# Scalar broadcast to array
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Array:\n{arr}")
print(f"\nArray + 10:\n{arr + 10}")

In [None]:
# Example: Broadcasting 1D array across 2D array

# 2D array: 3 rows, 4 columns
matrix = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

# 1D array matching columns
column_weights = np.array([1, 2, 3, 4])

print(f"Matrix:\n{matrix}")
print(f"\nWeights: {column_weights}")
print(f"\nMatrix * Weights:\n{matrix * column_weights}")

In [None]:
# Example: Broadcasting row array across columns

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# To broadcast across rows, we need a column vector (3, 1)
row_multipliers = np.array([[10], [20], [30]])  # Shape (3, 1)

print(f"Matrix:\n{matrix}")
print(f"\nRow multipliers:\n{row_multipliers}")
print(f"\nResult:\n{matrix * row_multipliers}")

In [None]:
# Example: Practical broadcasting - normalizing data

# Sales data: rows = products, columns = months
sales = np.array([
    [100, 120, 90, 150],   # Product A
    [200, 180, 220, 190],  # Product B
    [50, 60, 55, 70]       # Product C
])

# Calculate row totals
row_totals = sales.sum(axis=1, keepdims=True)  # Sum each row, keep 2D shape

# Normalize: what percentage of total sales was each month?
normalized = sales / row_totals * 100

print("Sales data:")
print(sales)
print(f"\nRow totals:\n{row_totals}")
print(f"\nPercentage of total:\n{normalized.round(1)}")

## Practice Exercise 4.1

**Task:** You have quarterly sales data for 3 products. Calculate:
1. The percentage growth from Q1 to Q4 for each product
2. Normalize each quarter so the sum across products equals 100%

```python
quarterly_sales = np.array([
    [150, 180, 200, 220],  # Product A
    [300, 290, 310, 350],  # Product B
    [50, 60, 55, 80]       # Product C
])
```

In [None]:
# Your code here


In [None]:
# Solution 4.1

import numpy as np

quarterly_sales = np.array([
    [150, 180, 200, 220],  # Product A
    [300, 290, 310, 350],  # Product B
    [50, 60, 55, 80]       # Product C
])

products = ['Product A', 'Product B', 'Product C']

print("Quarterly Sales:")
print(quarterly_sales)

# 1. Percentage growth from Q1 to Q4
q1 = quarterly_sales[:, 0]
q4 = quarterly_sales[:, 3]
growth = ((q4 - q1) / q1) * 100

print("\n1. Growth from Q1 to Q4:")
for product, g in zip(products, growth):
    print(f"   {product}: {g:.1f}%")

# 2. Normalize by quarter (each column sums to 100%)
column_totals = quarterly_sales.sum(axis=0, keepdims=True)
normalized = quarterly_sales / column_totals * 100

print("\n2. Market share by quarter (%):")
print(normalized.round(1))

---
# Section 5: Mathematical Functions
---

NumPy provides a comprehensive library of mathematical functions that operate element-wise on arrays.

## Syntax

```python
# Basic math
np.sqrt(arr)     # Square root
np.exp(arr)      # e^x
np.log(arr)      # Natural log
np.log10(arr)    # Base-10 log
np.abs(arr)      # Absolute value

# Trigonometric
np.sin(arr), np.cos(arr), np.tan(arr)

# Rounding
np.round(arr, decimals)
np.floor(arr)    # Round down
np.ceil(arr)     # Round up

# Aggregations
np.sum(arr), np.prod(arr)
np.min(arr), np.max(arr)
np.cumsum(arr)   # Cumulative sum
```

In [None]:
# Example: Basic mathematical functions

import numpy as np

arr = np.array([1, 4, 9, 16, 25])
print(f"Array: {arr}")

print(f"\nSquare root: {np.sqrt(arr)}")
print(f"Square: {np.square(arr)}")
print(f"Cube root: {np.cbrt(arr)}")
print(f"Power of 2: {np.power(arr, 2)}")

In [None]:
# Example: Exponential and logarithmic functions

arr = np.array([1, 2, 3, 4, 5])
print(f"Array: {arr}")

print(f"\ne^x: {np.exp(arr)}")
print(f"ln(x): {np.log(arr)}")
print(f"log10(x): {np.log10(arr)}")
print(f"log2(x): {np.log2(arr)}")

# Practical: compound interest
# A = P * e^(rt)
principal = 1000
rate = 0.05
years = np.array([1, 5, 10, 20, 30])
amount = principal * np.exp(rate * years)

print(f"\nCompound Interest (continuous, 5% rate):")
for y, a in zip(years, amount):
    print(f"  {y:2} years: ${a:.2f}")

In [None]:
# Example: Rounding functions

arr = np.array([1.234, 2.567, 3.891, 4.123, 5.999])
print(f"Array: {arr}")

print(f"\nRound to 2 decimals: {np.round(arr, 2)}")
print(f"Round to 1 decimal: {np.round(arr, 1)}")
print(f"Round to integer: {np.round(arr, 0)}")
print(f"Floor (round down): {np.floor(arr)}")
print(f"Ceil (round up): {np.ceil(arr)}")
print(f"Truncate: {np.trunc(arr)}")

In [None]:
# Example: Aggregation functions

arr = np.array([10, 25, 8, 42, 15, 33, 7, 28])
print(f"Array: {arr}")

print(f"\nSum: {np.sum(arr)}")
print(f"Product: {np.prod(arr)}")
print(f"Min: {np.min(arr)}")
print(f"Max: {np.max(arr)}")
print(f"Argmin (index of min): {np.argmin(arr)}")
print(f"Argmax (index of max): {np.argmax(arr)}")

In [None]:
# Example: Cumulative functions

sales = np.array([100, 150, 120, 180, 200])
print(f"Monthly sales: {sales}")

cumulative_sales = np.cumsum(sales)
print(f"Cumulative sales: {cumulative_sales}")

cumulative_product = np.cumprod([1.1, 1.05, 0.95, 1.08, 1.12])  # Growth rates
print(f"\nGrowth rates: [1.1, 1.05, 0.95, 1.08, 1.12]")
print(f"Cumulative growth: {cumulative_product}")

In [None]:
# Example: Aggregations along axes (2D arrays)

# Sales: rows = regions, columns = quarters
sales = np.array([
    [100, 120, 140, 160],  # North
    [200, 180, 220, 240],  # South
    [150, 170, 160, 180]   # East
])

print("Quarterly Sales by Region:")
print(sales)

# Sum along axis=0 (down columns) = total per quarter
quarterly_totals = np.sum(sales, axis=0)
print(f"\nQuarterly totals: {quarterly_totals}")

# Sum along axis=1 (across rows) = total per region
region_totals = np.sum(sales, axis=1)
print(f"Region totals: {region_totals}")

# Total overall
print(f"Grand total: {np.sum(sales)}")

In [None]:
# Example: Min/Max along axes

sales = np.array([
    [100, 120, 140, 160],  # North
    [200, 180, 220, 240],  # South
    [150, 170, 160, 180]   # East
])

# Best quarter for each region
best_quarters = np.max(sales, axis=1)
print(f"Best quarter per region: {best_quarters}")

# Which quarter was best for each region?
best_quarter_idx = np.argmax(sales, axis=1)
print(f"Best quarter index per region: {best_quarter_idx}")

# Best region for each quarter
best_regions = np.max(sales, axis=0)
print(f"\nBest region per quarter: {best_regions}")

## Practice Exercise 5.1

**Task:** Given stock prices for a week:
1. Calculate the daily returns (% change from previous day)
2. Find the cumulative return over the week
3. Find the day with the highest gain and highest loss

```python
prices = np.array([100.0, 102.5, 101.0, 105.0, 103.5, 107.0, 106.0])
```

**Hint:** Daily return = (today - yesterday) / yesterday * 100

In [None]:
# Your code here


In [None]:
# Solution 5.1

import numpy as np

prices = np.array([100.0, 102.5, 101.0, 105.0, 103.5, 107.0, 106.0])
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

print(f"Stock prices: {prices}")

# 1. Daily returns
daily_returns = (prices[1:] - prices[:-1]) / prices[:-1] * 100
print("\n1. Daily returns:")
for i, ret in enumerate(daily_returns):
    print(f"   {days[i]} -> {days[i+1]}: {ret:+.2f}%")

# 2. Cumulative return
cumulative_return = (prices[-1] - prices[0]) / prices[0] * 100
print(f"\n2. Cumulative return: {cumulative_return:.2f}%")

# 3. Best and worst days
best_day_idx = np.argmax(daily_returns)
worst_day_idx = np.argmin(daily_returns)

print(f"\n3. Highest gain: {days[best_day_idx]} -> {days[best_day_idx+1]} ({daily_returns[best_day_idx]:+.2f}%)")
print(f"   Highest loss: {days[worst_day_idx]} -> {days[worst_day_idx+1]} ({daily_returns[worst_day_idx]:+.2f}%)")

---
# Section 6: Statistical Functions
---

NumPy provides essential statistical functions for data analysis.

## Syntax

```python
# Central tendency
np.mean(arr)      # Average
np.median(arr)    # Middle value

# Dispersion
np.std(arr)       # Standard deviation
np.var(arr)       # Variance

# Percentiles and quantiles
np.percentile(arr, q)  # q-th percentile
np.quantile(arr, q)    # q-th quantile (0-1)

# Correlation
np.corrcoef(arr1, arr2)
```

In [None]:
# Example: Basic statistics

import numpy as np

# Sample data: test scores
scores = np.array([85, 90, 78, 92, 88, 76, 95, 89, 82, 91, 73, 87, 94, 80, 86])
print(f"Test scores: {scores}")
print(f"Number of students: {len(scores)}")

print(f"\nMean: {np.mean(scores):.2f}")
print(f"Median: {np.median(scores):.2f}")
print(f"Standard deviation: {np.std(scores):.2f}")
print(f"Variance: {np.var(scores):.2f}")
print(f"Min: {np.min(scores)}, Max: {np.max(scores)}")
print(f"Range: {np.ptp(scores)}")  # Peak-to-peak (max - min)

In [None]:
# Example: Percentiles and quartiles

scores = np.array([85, 90, 78, 92, 88, 76, 95, 89, 82, 91, 73, 87, 94, 80, 86])

print(f"Scores (sorted): {np.sort(scores)}")
print()

# Percentiles
p25 = np.percentile(scores, 25)
p50 = np.percentile(scores, 50)  # Same as median
p75 = np.percentile(scores, 75)

print(f"25th percentile (Q1): {p25}")
print(f"50th percentile (Q2/Median): {p50}")
print(f"75th percentile (Q3): {p75}")
print(f"Interquartile range (IQR): {p75 - p25}")

# Multiple percentiles at once
percentiles = np.percentile(scores, [10, 25, 50, 75, 90])
print(f"\nPercentiles [10, 25, 50, 75, 90]: {percentiles}")

In [None]:
# Example: Statistics along axes

# Exam scores: rows = students, columns = subjects
scores = np.array([
    [85, 90, 78],   # Student 1: Math, Science, English
    [92, 88, 95],   # Student 2
    [76, 82, 89],   # Student 3
    [88, 91, 84],   # Student 4
    [79, 85, 92]    # Student 5
])

subjects = ['Math', 'Science', 'English']
students = ['Student 1', 'Student 2', 'Student 3', 'Student 4', 'Student 5']

print("Exam Scores:")
print(scores)

# Average per subject (down rows)
subject_means = np.mean(scores, axis=0)
print("\nAverage by subject:")
for subj, mean in zip(subjects, subject_means):
    print(f"  {subj}: {mean:.2f}")

# Average per student (across columns)
student_means = np.mean(scores, axis=1)
print("\nAverage by student:")
for stud, mean in zip(students, student_means):
    print(f"  {stud}: {mean:.2f}")

In [None]:
# Example: Correlation coefficient

# Study hours vs test scores
study_hours = np.array([2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
test_scores = np.array([65, 70, 72, 78, 82, 85, 88, 92, 95, 98])

# Calculate correlation
correlation = np.corrcoef(study_hours, test_scores)
print("Correlation matrix:")
print(correlation)
print(f"\nCorrelation coefficient: {correlation[0, 1]:.4f}")

# Interpretation
r = correlation[0, 1]
if r > 0.7:
    strength = "strong positive"
elif r > 0.3:
    strength = "moderate positive"
elif r > -0.3:
    strength = "weak"
elif r > -0.7:
    strength = "moderate negative"
else:
    strength = "strong negative"

print(f"Interpretation: {strength} correlation")

In [None]:
# Example: Detecting outliers using statistics

data = np.array([12, 15, 14, 10, 13, 11, 14, 100, 12, 15, 13, 11])  # 100 is outlier
print(f"Data: {data}")

# Method 1: Using Z-scores
mean = np.mean(data)
std = np.std(data)
z_scores = (data - mean) / std
outliers_z = data[np.abs(z_scores) > 2]

print(f"\nMean: {mean:.2f}, Std: {std:.2f}")
print(f"Outliers (|z| > 2): {outliers_z}")

# Method 2: Using IQR
q1 = np.percentile(data, 25)
q3 = np.percentile(data, 75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

outliers_iqr = data[(data < lower_bound) | (data > upper_bound)]
print(f"\nQ1: {q1}, Q3: {q3}, IQR: {iqr}")
print(f"Bounds: [{lower_bound:.2f}, {upper_bound:.2f}]")
print(f"Outliers (IQR method): {outliers_iqr}")

## Practice Exercise 6.1

**Task:** Analyze the employee salaries data:
1. Calculate mean, median, and standard deviation
2. Find the 25th, 50th, and 75th percentiles
3. Identify any outliers using the IQR method

```python
salaries = np.array([45000, 52000, 48000, 150000, 55000, 51000, 49000, 53000, 47000, 54000, 200000, 50000, 52000, 48000, 51000])
```

In [None]:
# Your code here


In [None]:
# Solution 6.1

import numpy as np

salaries = np.array([45000, 52000, 48000, 150000, 55000, 51000, 49000, 53000, 47000, 54000, 200000, 50000, 52000, 48000, 51000])

print(f"Salaries: {salaries}")
print(f"\n1. Basic Statistics:")
print(f"   Mean: ${np.mean(salaries):,.2f}")
print(f"   Median: ${np.median(salaries):,.2f}")
print(f"   Standard Deviation: ${np.std(salaries):,.2f}")

# Notice: Mean is much higher than median - indicates right skew/outliers

print(f"\n2. Percentiles:")
q1 = np.percentile(salaries, 25)
q2 = np.percentile(salaries, 50)
q3 = np.percentile(salaries, 75)
print(f"   25th percentile (Q1): ${q1:,.2f}")
print(f"   50th percentile (Q2): ${q2:,.2f}")
print(f"   75th percentile (Q3): ${q3:,.2f}")

print(f"\n3. Outlier Detection (IQR method):")
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
print(f"   IQR: ${iqr:,.2f}")
print(f"   Bounds: [${lower_bound:,.2f}, ${upper_bound:,.2f}]")

outliers = salaries[(salaries < lower_bound) | (salaries > upper_bound)]
print(f"   Outliers: {['${:,.0f}'.format(x) for x in outliers]}")

---
# Section 7: Reshaping and Manipulating Arrays
---

Changing array shapes is essential for data manipulation and preparing data for analysis.

## Syntax

```python
# Reshaping
arr.reshape(new_shape)   # Return reshaped view
arr.flatten()            # Return 1D copy
arr.ravel()              # Return 1D view

# Transposing
arr.T                    # Transpose
np.transpose(arr)        # Same as .T

# Joining
np.concatenate([a, b])   # Join arrays
np.vstack([a, b])        # Stack vertically
np.hstack([a, b])        # Stack horizontally

# Splitting
np.split(arr, n)         # Split into n parts
np.vsplit(arr, n)        # Split vertically
np.hsplit(arr, n)        # Split horizontally
```

In [None]:
# Example: Reshaping arrays

import numpy as np

# Create a 1D array
arr = np.arange(12)
print(f"Original: {arr}")
print(f"Shape: {arr.shape}")

# Reshape to 2D
arr_3x4 = arr.reshape(3, 4)  # 3 rows, 4 columns
print(f"\nReshaped to 3x4:\n{arr_3x4}")

arr_4x3 = arr.reshape(4, 3)  # 4 rows, 3 columns
print(f"\nReshaped to 4x3:\n{arr_4x3}")

arr_2x6 = arr.reshape(2, 6)  # 2 rows, 6 columns
print(f"\nReshaped to 2x6:\n{arr_2x6}")

In [None]:
# Example: Using -1 for automatic dimension calculation

arr = np.arange(24)
print(f"Original: {arr}")

# Use -1 to let NumPy calculate one dimension
arr_6xauto = arr.reshape(6, -1)  # 6 rows, calculate columns
print(f"\n6 rows, auto columns:\n{arr_6xauto}")
print(f"Shape: {arr_6xauto.shape}")

arr_autox4 = arr.reshape(-1, 4)  # Calculate rows, 4 columns
print(f"\nAuto rows, 4 columns:\n{arr_autox4}")
print(f"Shape: {arr_autox4.shape}")

In [None]:
# Example: Flattening arrays

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"Original:\n{arr}")

# flatten() returns a copy
flat = arr.flatten()
print(f"\nFlattened: {flat}")

# ravel() returns a view (more memory efficient)
raveled = arr.ravel()
print(f"Raveled: {raveled}")

In [None]:
# Example: Transposing arrays

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(f"Original (2x4):\n{arr}")
print(f"Shape: {arr.shape}")

transposed = arr.T
print(f"\nTransposed (4x2):\n{transposed}")
print(f"Shape: {transposed.shape}")

In [None]:
# Example: Concatenating arrays

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# 1D concatenation
combined = np.concatenate([a, b])
print(f"a = {a}")
print(f"b = {b}")
print(f"Concatenated: {combined}")

In [None]:
# Example: Stacking arrays

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertical stack (creates rows)
vstacked = np.vstack([a, b])
print(f"Vertical stack:\n{vstacked}")
print(f"Shape: {vstacked.shape}")

# Horizontal stack (extends columns)
hstacked = np.hstack([a, b])
print(f"\nHorizontal stack: {hstacked}")
print(f"Shape: {hstacked.shape}")

In [None]:
# Example: Stacking 2D arrays

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print(f"Array 1:\n{arr1}")
print(f"\nArray 2:\n{arr2}")

# Vertical stack (add rows)
vstacked = np.vstack([arr1, arr2])
print(f"\nVertical stack:\n{vstacked}")

# Horizontal stack (add columns)
hstacked = np.hstack([arr1, arr2])
print(f"\nHorizontal stack:\n{hstacked}")

In [None]:
# Example: Splitting arrays

arr = np.arange(12)
print(f"Original: {arr}")

# Split into 3 equal parts
parts = np.split(arr, 3)
print(f"\nSplit into 3:")
for i, part in enumerate(parts):
    print(f"  Part {i+1}: {part}")

# Split at specific indices
parts = np.split(arr, [3, 7])  # Split at index 3 and 7
print(f"\nSplit at indices [3, 7]:")
for i, part in enumerate(parts):
    print(f"  Part {i+1}: {part}")

In [None]:
# Example: Adding and removing dimensions

arr = np.array([1, 2, 3])
print(f"Original: {arr}, shape: {arr.shape}")

# Add dimension (useful for broadcasting)
row = arr[np.newaxis, :]  # Add axis at position 0
print(f"\nAs row: {row}, shape: {row.shape}")

col = arr[:, np.newaxis]  # Add axis at position 1
print(f"\nAs column:\n{col}\nShape: {col.shape}")

# Alternative using reshape
col2 = arr.reshape(-1, 1)
print(f"\nColumn via reshape:\n{col2}")

## Practice Exercise 7.1

**Task:** You have sales data as a 1D array representing 4 quarters for 3 years. Reshape it appropriately and calculate:
1. Total sales per year
2. Average sales per quarter (across all years)

```python
# Q1, Q2, Q3, Q4 for each year
sales = np.array([100, 120, 140, 130,    # Year 1
                  110, 130, 150, 140,    # Year 2
                  120, 140, 160, 150])   # Year 3
```

In [None]:
# Your code here


In [None]:
# Solution 7.1

import numpy as np

sales = np.array([100, 120, 140, 130, 110, 130, 150, 140, 120, 140, 160, 150])
print(f"Original data: {sales}")

# Reshape to 3 years x 4 quarters
sales_reshaped = sales.reshape(3, 4)
print(f"\nReshaped (3 years x 4 quarters):\n{sales_reshaped}")

# 1. Total sales per year (sum across columns)
yearly_totals = np.sum(sales_reshaped, axis=1)
print(f"\n1. Total sales per year:")
for i, total in enumerate(yearly_totals, 1):
    print(f"   Year {i}: {total}")

# 2. Average sales per quarter (mean down rows)
quarterly_avg = np.mean(sales_reshaped, axis=0)
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
print(f"\n2. Average sales per quarter:")
for q, avg in zip(quarters, quarterly_avg):
    print(f"   {q}: {avg:.2f}")

---
# Section 8: Random Number Generation
---

Random numbers are essential for:
- **Simulations**: Monte Carlo methods, bootstrapping
- **Data augmentation**: Creating synthetic data
- **Machine learning**: Weight initialization, sampling
- **Testing**: Generating test data

## Syntax

```python
# Set seed for reproducibility
np.random.seed(42)

# Modern approach (recommended)
rng = np.random.default_rng(42)

# Random values
np.random.rand(shape)       # Uniform [0, 1)
np.random.randn(shape)      # Standard normal
np.random.randint(low, high, size)
np.random.choice(array, size)

# Distributions
np.random.normal(mean, std, size)
np.random.uniform(low, high, size)
np.random.exponential(scale, size)
```

In [None]:
# Example: Basic random number generation

import numpy as np

# Set seed for reproducibility (same seed = same random numbers)
np.random.seed(42)

# Random floats between 0 and 1
random_floats = np.random.rand(5)
print(f"Random floats [0,1): {random_floats}")

# Random 2D array
random_2d = np.random.rand(3, 4)
print(f"\nRandom 2D (3x4):\n{random_2d}")

In [None]:
# Example: Random integers

np.random.seed(42)

# Random integers between low and high (exclusive)
random_ints = np.random.randint(1, 100, size=10)
print(f"Random integers 1-99: {random_ints}")

# Simulate dice rolls
dice_rolls = np.random.randint(1, 7, size=20)
print(f"\n20 dice rolls: {dice_rolls}")
print(f"Average roll: {np.mean(dice_rolls):.2f}")

In [None]:
# Example: Normal (Gaussian) distribution

np.random.seed(42)

# Standard normal (mean=0, std=1)
standard_normal = np.random.randn(5)
print(f"Standard normal: {standard_normal}")

# Custom normal distribution
# Simulate test scores: mean=75, std=10
test_scores = np.random.normal(loc=75, scale=10, size=100)
print(f"\nSimulated test scores:")
print(f"  Mean: {np.mean(test_scores):.2f}")
print(f"  Std: {np.std(test_scores):.2f}")
print(f"  Min: {np.min(test_scores):.2f}")
print(f"  Max: {np.max(test_scores):.2f}")

In [None]:
# Example: Uniform distribution

np.random.seed(42)

# Uniform between low and high
uniform_vals = np.random.uniform(low=10, high=50, size=10)
print(f"Uniform 10-50: {uniform_vals.round(2)}")

# Simulate prices between $20 and $100
prices = np.random.uniform(20, 100, size=5)
print(f"\nRandom prices: ${prices.round(2)}")

In [None]:
# Example: Random choice and sampling

np.random.seed(42)

products = ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones']

# Random choice with equal probability
sample = np.random.choice(products, size=10)
print(f"Random product picks: {sample}")

# Random choice with weighted probability
weights = [0.1, 0.3, 0.25, 0.15, 0.2]  # Must sum to 1
weighted_sample = np.random.choice(products, size=10, p=weights)
print(f"\nWeighted picks: {weighted_sample}")

# Unique selection (no replacement)
unique_sample = np.random.choice(products, size=3, replace=False)
print(f"\nUnique picks: {unique_sample}")

In [None]:
# Example: Shuffling arrays

np.random.seed(42)

arr = np.arange(10)
print(f"Original: {arr}")

# Shuffle in place
np.random.shuffle(arr)
print(f"Shuffled: {arr}")

# For permutation without modifying original
arr2 = np.arange(10)
permuted = np.random.permutation(arr2)
print(f"\nOriginal preserved: {arr2}")
print(f"Permutation: {permuted}")

In [None]:
# Example: Modern random generator (recommended approach)

# Create a random generator with a seed
rng = np.random.default_rng(42)

# Same functions, better randomness properties
print(f"Random floats: {rng.random(5)}")
print(f"Random integers: {rng.integers(1, 100, size=5)}")
print(f"Normal: {rng.normal(50, 10, size=5).round(2)}")
print(f"Choice: {rng.choice(['A', 'B', 'C'], size=5)}")

In [None]:
# Example: Practical - simulating sales data

np.random.seed(42)

# Simulate 30 days of sales
days = 30

# Base sales + random variation
base_sales = 1000
daily_variation = np.random.normal(0, 100, size=days)  # Mean 0, std 100

# Weekend boost (days 5, 6, 12, 13, etc.)
weekend_boost = np.zeros(days)
for i in range(days):
    if i % 7 in [5, 6]:  # Saturday, Sunday
        weekend_boost[i] = np.random.uniform(100, 200)

sales = base_sales + daily_variation + weekend_boost
sales = np.maximum(sales, 0)  # Ensure no negative sales

print(f"Simulated 30-day sales:")
print(f"  Mean: ${np.mean(sales):.2f}")
print(f"  Std: ${np.std(sales):.2f}")
print(f"  Min: ${np.min(sales):.2f}")
print(f"  Max: ${np.max(sales):.2f}")
print(f"  Total: ${np.sum(sales):.2f}")

## Practice Exercise 8.1

**Task:** Create a simulation of a simple coin flip experiment:
1. Simulate 1000 coin flips (0 = tails, 1 = heads)
2. Count the number of heads and tails
3. Calculate the percentage of heads
4. Simulate 10 experiments of 1000 flips each and show the variation in head percentages

Use seed 42 for reproducibility.

In [None]:
# Your code here


In [None]:
# Solution 8.1

import numpy as np

np.random.seed(42)

# 1. Simulate 1000 coin flips
flips = np.random.randint(0, 2, size=1000)
print(f"1. First 20 flips: {flips[:20]}")

# 2. Count heads and tails
heads = np.sum(flips == 1)
tails = np.sum(flips == 0)
print(f"\n2. Heads: {heads}, Tails: {tails}")

# 3. Percentage of heads
head_percentage = heads / len(flips) * 100
print(f"\n3. Head percentage: {head_percentage:.2f}%")

# 4. Multiple experiments
np.random.seed(42)
experiments = 10
flips_per_experiment = 1000

head_percentages = []
for i in range(experiments):
    flips = np.random.randint(0, 2, size=flips_per_experiment)
    pct = np.sum(flips) / flips_per_experiment * 100
    head_percentages.append(pct)

head_percentages = np.array(head_percentages)
print(f"\n4. Head percentages across 10 experiments:")
print(f"   Values: {head_percentages.round(2)}")
print(f"   Mean: {np.mean(head_percentages):.2f}%")
print(f"   Std: {np.std(head_percentages):.2f}%")
print(f"   Range: {np.min(head_percentages):.2f}% - {np.max(head_percentages):.2f}%")

---
# Module Summary

## Key Takeaways

1. **NumPy arrays** are faster and more memory-efficient than Python lists for numerical operations
2. **Array creation** can be done via `np.array()`, `np.zeros()`, `np.ones()`, `np.arange()`, `np.linspace()`
3. **Indexing and slicing** work similarly to lists, with powerful additions like boolean and fancy indexing
4. **Vectorized operations** eliminate the need for loops and are much faster
5. **Broadcasting** allows operations between arrays of different shapes
6. **Statistical functions** like `mean()`, `std()`, `percentile()` are essential for data analysis
7. **Reshaping** with `reshape()`, `flatten()`, and stacking functions helps organize data
8. **Random generation** is crucial for simulations, testing, and machine learning

## Essential Functions

```python
# Creating arrays
np.array(), np.zeros(), np.ones(), np.arange(), np.linspace()

# Array info
arr.shape, arr.dtype, arr.ndim, arr.size

# Math operations
np.sum(), np.mean(), np.std(), np.min(), np.max()

# Reshaping
arr.reshape(), arr.flatten(), arr.T, np.concatenate()

# Random
np.random.rand(), np.random.randn(), np.random.randint()
```

## Next Module

In the next module, we'll cover **Pandas for Data Analysis** - the most important library for data manipulation in Python. You'll learn to work with DataFrames, which build on NumPy arrays to provide labeled, tabular data structures.

## Additional Practice

For extra practice, try these challenges:

1. **Portfolio Analysis**: Create a 2D array of stock returns for 5 stocks over 12 months. Calculate monthly and annual returns, correlations between stocks, and identify the best/worst performing stock.

2. **Monte Carlo Simulation**: Simulate 10,000 random walks of stock prices starting at $100, with daily returns following a normal distribution (mean=0.001, std=0.02). Calculate the probability of the stock being above $150 after 252 trading days.

3. **Data Normalization**: Create a function that takes a 2D array and normalizes each column to have mean=0 and std=1 (z-score normalization). Test it on random data.