# Day 1 — NumPy Basics (Foundations, Phase 1)

**Goals:**  
- Understand deeply what NumPy arrays are and why they are important  
- Learn arrays, dtypes, shapes in detail  
- Indexing & slicing (1D/2D) with full explanation  
- Reshape, view vs copy with memory model concepts  
- Broadcasting with formal rules and math interpretation  
- Vectorization vs loops with performance explanation  
- Descriptive statistics & linear algebra basics with math meaning  
- Random numbers with reproducibility and practical use cases  
- Mini task with a CSV file to simulate dataset  

> Estimated time: ~2–3 hours (reading + coding + exercises).


## 0) Environment & Version

Before starting, always check your library versions. This ensures consistency when sharing code.

- `import numpy as np` : imports NumPy with alias `np`.  
- `np.__version__` : shows which version is installed.  


In [None]:
import numpy as np
print('NumPy version:', np.__version__)

## 1) Arrays, dtypes, and shapes

A **NumPy array** (ndarray) is the core data structure.

### Key properties:
- `shape`: tuple of dimensions. `(5,)` means vector of 5 elements. `(2,3)` means 2 rows, 3 columns.  
- `ndim`: number of dimensions. 1D = vector, 2D = matrix, 3D+ = tensor.  
- `dtype`: type of elements (int, float, complex, …).  
- Stored in **contiguous memory**, unlike Python lists. This makes operations fast.

### Why is this important?
Machine Learning uses vectors, matrices, and tensors to represent data (features, weights, activations). NumPy arrays map directly to these concepts.


In [None]:
a = np.array([1,2,3,4,5])
print('a:', a, '| shape:', a.shape, '| ndim:', a.ndim, '| dtype:', a.dtype)

b = np.array([[1,2,3],[4,5,6]])
print('b:\n', b, '\nshape:', b.shape, '| ndim:', b.ndim, '| dtype:', b.dtype)

zeros = np.zeros((2,3))
ones = np.ones((2,3))
rng = np.arange(0,10,2)   # start, stop, step
lin = np.linspace(0,1,5)  # evenly spaced
zeros, ones, rng, lin

## 2) Indexing & Slicing

Indexing extracts elements.  

### Rules:
- 1D arrays: like Python lists (`a[2]`).  
- Slices: `a[start:stop:step]`. Stop is **exclusive**.  
- 2D arrays: `a[row, col]`.  
- `:` means "all elements along that dimension".

**Math interpretation:**  
For matrix \(A\), `A[i, j]` corresponds to element \(a_{ij}\).


In [None]:
c = np.arange(10)
print('c:', c)
print('c[2]:', c[2])
print('c[2:7]:', c[2:7])
print('c[:5]:', c[:5])
print('c[::2]:', c[::2])

print('b[0,1]:', b[0,1])  # row 0, col 1
print('b[:,1]:', b[:,1])  # all rows, col 1

## 3) Reshape, Views, and Copies

`reshape` changes how data is *viewed*, not the data itself.

- **View**: shares the same memory → fast, but risky. Changing one changes the other.  
- **Copy**: separate memory. Use `.copy()` when you need independence.

**Memory model:**  
NumPy arrays are stored as a continuous block. `reshape` just interprets the same block with new dimensions.


In [None]:
d = np.arange(12)
m = d.reshape(3,4)
print('m shape:', m.shape, "\n")
print('m :\n', m, "\n")

m[0,0] = 999
print('d[0] after modifying m:', d[0], "\n")

m_copy = m.copy()
m_copy[0,0] = -1
print('m[0,0]:', m[0,0], '| m_copy[0,0]:', m_copy[0,0], "\n")

print('m :\n', m, "\n")
print('m_copy :\n', m_copy)

In [None]:
m = d.reshape(4,3)
print('m shape:', m.shape, "\n")
print('m :\n', m, "\n")

In [None]:

m = d.reshape(1,12)
print('m shape:', m.shape, "\n")
print('m :\n', m, "\n")

## 4) Broadcasting

Broadcasting = automatic expansion to make shapes compatible.  

### Rules:
1. Align shapes from rightmost dimension.  
2. If dimensions equal or one is 1 → compatible.  
3. Else → error.

**Example:**  
Matrix (3×3) + vector (3,) → NumPy treats vector as (1×3), then replicates along rows.

**Mathematical view:**  
Equivalent to adding same vector to each row of a matrix.


In [None]:
X = np.ones((3,3))
v = np.array([1,2,3])

print('X:\n', X, "\n")
print('v:\n', v, "\n")

print('X Shape:\n', X.shape, "\n")
print('v Shape:\n', v.shape, "\n")

print('X + v:\n', X + v, "\n")

**More Example for Practice:**  

In [None]:
A = np.ones((3,3))
B = np.array([[1,2,3]])

print('A:\n', A, "\n")
print('B:\n', B, "\n")
print('A Shape:\n', A.shape)
print('B Shape:\n', B.shape, "\n")

A1 = np.ones((3,3))
B1 = np.array([1,2,3]).reshape(3,1)

print('A1:\n', A1, "\n")
print('B1:\n', B1, "\n")
print('A1 Shape:\n', A1.shape)
print('B1 Shape:\n', B1.shape, "\n")

A2 = np.ones((3,3))
B2 = np.array([1,2,3]).reshape(1,3)

print('A2:\n', A2, "\n")
print('B2:\n', B2, "\n")
print('A2 Shape:\n', A2.shape)
print('B2 Shape:\n', B2.shape, "\n")

In [79]:
A = np.arange(6).reshape(2,3)
B = np.arange(2).reshape(2,1)

print('A:\n', A, "\n")
print('B:\n', B, "\n")

print('A Shape:\n', A.shape, "\n")
print('B Shape:\n', B.shape, "\n")

print('A + B:\n', A + B, "\n")

A:
 [[0 1 2]
 [3 4 5]] 

B:
 [[0]
 [1]] 

A Shape:
 (2, 3) 

B Shape:
 (2, 1) 

A + B:
 [[0 1 2]
 [4 5 6]] 



## 5) Vectorization vs Loops

Vectorization = apply operation to whole arrays.  
Loops = apply step by step.  

**Performance reason:**  
- Vectorization uses optimized C and SIMD instructions.  
- Loops are interpreted in Python, slower.

**Math example:**  
Compute \(y = x^2 + 2x + 1\) for 100k values.


In [81]:
x = np.arange(100_000)
print('x:', x, "\n")    
y_vec = x*x + 2*x + 1

def poly_slow(xx):
    out = []
    for t in xx:
        out.append(t*t + 2*t + 1)
    return np.array(out)

y_slow = poly_slow(x)
print('Equal?', np.allclose(y_vec, y_slow))

x: [    0     1     2 ... 99997 99998 99999] 

Equal? True


## 6) Descriptive Statistics

Statistics summarize data.  

- **mean**: $\mu = \frac{1}{n}\sum x_i$  
- **std**: $\sqrt{\frac{1}{n}\sum (x_i-\mu)^2}$  
- **percentile**: quantiles of distribution.


**Usage in ML:** normalization, feature scaling, evaluation metrics.


In [92]:
arr = np.array([3,7,2,9,5,10,4])
print('mean:', arr.mean(), '| std:', arr.std(), '| min:', arr.min(), '| max:', arr.max())
print('argmin:', arr.argmin(), '| argmax:', arr.argmax())
print('percentiles:', np.percentile(arr, [25,50,75]))

mean: 5.714285714285714 | std: 2.8139593719417437 | min: 2 | max: 10
argmin: 2 | argmax: 5
percentiles: [3.5 5.  8. ]


## 7) Linear Algebra

Linear Algebra = backbone of ML.  

- **Matrix-vector multiplication**: `@`  
- **Determinant**: scaling factor of transformation  
- **Inverse**: matrix that undoes transformation

**ML link:** weights × inputs, solving linear systems, optimization.


In [None]:
M = np.array([[2,0],[0,3]])
v = np.array([5,1])

print('M @ v =', M @ v)
print('det(M) =', np.linalg.det(M))
print('inv(M) =', np.linalg.inv(M))

## 8) Random Numbers & Reproducibility

Randomness is essential in ML (initial weights, shuffling, dropout).  

- `default_rng(seed)`: new generator.  
- `normal`: Gaussian distribution.  
- `integers`: uniform integers.  
- **Seed** ensures reproducibility → critical for experiments.


In [None]:
rng = np.random.default_rng(seed=42)
print('Normal samples:', rng.normal(0,1,5))
print('Integer samples:', rng.integers(0,10,5))

## 9) Mini Task: CSV

We simulate a dataset (tiny CSV).  

Steps:  
1. Create CSV with (id, value).  
2. Load with `genfromtxt`.  
3. Compute mean & sum.

This is a small-scale dataset handling task, similar to real ML preprocessing.


In [None]:
import os, csv

os.makedirs('data', exist_ok=True)
csv_path = os.path.join('data','toy.csv')
with open(csv_path,'w',newline='') as f:
    w = csv.writer(f)
    w.writerow(['id','value'])
    w.writerows([[1,10],[2,20],[3,35],[4,50]])

data = np.genfromtxt(csv_path, delimiter=',', names=True)
vals = data['value']
print('values:', vals)
print('mean:', np.mean(vals), '| sum:', np.sum(vals))

# ✅ Summary

- Arrays = vectors/matrices/tensors in math  
- Shapes/dtypes = structure of data  
- Indexing/slicing = precise selection of elements  
- Reshape = flexible views, copy = safe independence  
- Broadcasting = extend smaller arrays to bigger ones mathematically  
- Vectorization = speed with array operations  
- Statistics + Linear Algebra = ML math foundation  
- Random numbers = reproducibility of experiments  
- CSV = step into dataset handling
