# NumPy Foundations: Arrays, Indexing, Reshaping, UFuncs, Broadcasting

This notebook is a **lecture + hands-on tutorial** on core NumPy concepts:

- NumPy arrays  
- Array attributes  
- Built-in array creation functions  
- Indexing & slicing  
- Reshaping arrays  
- Universal functions (ufuncs) + `%%timeit` performance comparisons  
- Broadcasting

> Tip: Run cells top-to-bottom. Try the “Exercises” as you go.


## Setup

In [15]:
import numpy as np
import math

# For reproducibility in examples
rng = np.random.default_rng(42)

# Setting the print option defaults.
np.set_printoptions(precision=3, suppress=True)


## 1) NumPy arrays

A **NumPy array** (`ndarray`) is a fast, contiguous (often) block of memory with:

- a single **dtype** (all elements share one type)
- a fixed **shape**
- efficient vectorized operations


In [16]:
# From a Python list
a = np.array([1, 2, 3, 4])
a


In [17]:
# 2D array
b = np.array([[1, 2, 3],
              [4, 5, 6]])
b


### Why use NumPy arrays instead of Python lists?

- Arithmetic on lists is not elementwise by default
- NumPy arrays enable **vectorization** (fast C loops under the hood)


In [18]:
py_list = [1, 2, 3, 4]

# This repeats the list, not elementwise multiplication
py_list * 2


In [19]:
# NumPy does elementwise multiplication
a * 2


## 2) Array attributes

Common attributes you'll use constantly:

- `ndim`: number of dimensions  
- `shape`: size along each dimension  
- `size`: total number of elements  
- `dtype`: data type  
- `itemsize`: bytes per element  
- `nbytes`: total bytes consumed by the elements


In [20]:
x = rng.integers(0, 10, size=(3, 4))
x


In [21]:
x.ndim, x.shape, x.size


In [22]:
x.dtype, x.itemsize, x.nbytes


### Dtypes (quick taste)

NumPy arrays are homogeneous. You can choose a dtype explicitly:


In [23]:
floats = np.array([1, 2, 3], dtype=np.float64)
ints   = np.array([1, 2, 3], dtype=np.int32)

floats, floats.dtype, ints, ints.dtype


## 3) Built-in functions for array creation

NumPy provides many constructors that are safer and faster than building lists manually.

### Common ones
- `np.zeros`, `np.ones`, `np.full`
- `np.arange` (like `range`, but returns an array)
- `np.linspace` (evenly spaced values in an interval)
- `np.eye` (identity matrix)
- Random arrays via `rng` (recommended over legacy `np.random`)


In [24]:
np.zeros((2, 3))


In [25]:
np.ones((2, 3))


In [26]:
np.full((2, 3), fill_value=7)


In [27]:
np.arange(0, 10, 2)   # start, stop, step


In [28]:
np.linspace(0, 1, 6)  # start, stop, num points (inclusive endpoints)


In [29]:
np.eye(4)            # 4x4 identity


In [30]:
# Random examples
rng.random((2, 3))           # uniform in [0, 1)


In [31]:
rng.normal(loc=0.0, scale=1.0, size=(2, 3))  # Gaussian


### Exercise 1

Create a 1D array of the integers 10 through 29 (inclusive).  
Then create a 3×4 array filled with the value -1.

*(Try it before revealing the solution.)*


In [32]:
# --- Your turn ---
# a1 = ...
# a2 = ...
# a1, a2


In [33]:
# Solution
a1 = np.arange(10, 30)          # stop is exclusive
a2 = np.full((3, 4), -1)
a1, a2


## 4) Array indexing

### 1D indexing
- Python-style 0-based indexing
- supports negative indices
- slicing uses `start:stop:step`


In [34]:
v = np.array([10, 20, 30, 40, 50])
v[0], v[-1]


In [35]:
v[1:4]   # elements at indices 1,2,3


In [36]:
v[::2]   # every other element


### 2D indexing

Use `[row, col]`. Slices apply per dimension.


In [37]:
M = np.array([[ 1,  2,  3,  4],
              [ 5,  6,  7,  8],
              [ 9, 10, 11, 12]])
M


In [38]:
M[0, 2]   # row 0, col 2


In [39]:
M[1, :]   # entire row 1


In [40]:
M[:, 1]   # entire column 1


In [41]:
M[0:2, 1:3]  # rows 0-1 and cols 1-2 (stop is exclusive)


### Boolean indexing

Create a boolean mask and index with it to filter values.


In [42]:
mask = M % 2 == 0   # True where even
mask


In [43]:
M[mask]   # 1D result containing selected elements


### Fancy indexing (index arrays)

Index with arrays/lists of indices (powerful, but can copy data).


In [44]:
M[[0, 2], [1, 3]]  # picks (0,1) and (2,3)


### Views vs copies (important!)

- **Slicing** usually returns a **view** (shares memory)
- **Fancy indexing / boolean indexing** typically returns a **copy**


In [45]:
A = np.arange(10)
slice_view = A[2:6]
slice_view[:] = -99

# See how the original array also changes.
print(A)
print(slice_view)


In [46]:
A = np.arange(10)
fancy_copy = A[[2, 3, 4, 5]]
fancy_copy[:] = -99

# See how the original array stays intact
print(A)
print(fancy_copy)


### Exercise 2

Given `M` (3×4 matrix above):

1. Extract the last column.  
2. Extract the submatrix consisting of rows 1..2 and columns 0..1 (a 2×2 block).  
3. Select all values greater than 6.

*(Try it, then check the solution.)*


In [47]:
# --- Your turn ---
# col_last = ...
# block = ...
# gt6 = ...
# col_last, block, gt6


In [48]:
# Solution
col_last = M[:, -1]
block = M[1:3, 0:2]
gt6 = M[M > 6]
col_last, block, gt6


## 5) Reshaping arrays

Reshaping changes how the same data is interpreted across dimensions.

Key tools:
- `.reshape(newshape)` (often returns a view)
- `.ravel()` (flatten, usually a view)
- `.flatten()` (flatten, always a copy)
- `.transpose()` / `.T` (swap axes)


In [49]:
x = np.arange(12)
x


In [50]:
x2 = x.reshape(3, 4)
x2


In [51]:
# Use -1 to infer one dimension automatically
x3 = x.reshape(2, -1)
x3


In [52]:
x2.ravel()     # usually a view


In [53]:
x2.flatten()   # always a copy


In [54]:
x2.T           # transpose: (3,4) -> (4,3)


### Reshaping rules

- Total number of elements must stay the same.
- For most arrays, reshape uses row-major (C-style) order by default.


In [55]:
# This will fail because 12 elements can't become a 5x3 (15 elements)
try:
    x.reshape(5, 3)
except ValueError as e:
    print("ValueError:", e)


### Exercise 3

Create a 1D array with numbers 1..24 and reshape it into a 3D array of shape (2, 3, 4).  
Then transpose it to swap the last two axes.

*(Try it, then check the solution.)*


In [56]:
# --- Your turn ---
# t = ...
# t3 = ...
# t_swapped = ...
# t3.shape, t_swapped.shape


In [57]:
# Solution
t = np.arange(1, 25)
t3 = t.reshape(2, 3, 4)
t_swapped = np.transpose(t3, axes=(0, 2, 1))  # swap axes 1 and 2
t3.shape, t_swapped.shape


## 6) Universal functions (ufuncs)

A **ufunc** is a fast, elementwise function implemented in optimized C.  
Examples: `np.add`, `np.subtract`, `np.multiply`, `np.sqrt`, `np.exp`, `np.log`, `np.sin`, …

They:
- operate elementwise
- support broadcasting
- are typically much faster than Python loops


In [58]:
x = np.array([1, 4, 9, 16], dtype=np.float64)

np.sqrt(x)


In [59]:
# Compare NumPy ufunc vs Python loop for a nontrivial size
arr = rng.random(1_000_000)

# Vectorized ufunc
# (We'll time it next with %%timeit)


### Timing: ufunc vs Python loop with `%%timeit`

Run the next two cells. Observe that vectorized NumPy is usually **orders of magnitude** faster.

`%%` denotes a "magic command" in Jupyter notebook.


In [60]:
%%timeit
np.sqrt(arr)


In [61]:
%%timeit
out = [math.sqrt(v) for v in arr]  # Python loop in a list comprehension


### Another common pattern: combining ufuncs

You can build complex expressions without explicit loops.


In [None]:
x = rng.normal(size=10)
y = rng.normal(size=10)

# Example: z = (sin(x) + exp(y)) / (1 + x^2)
z = (np.sin(x) + np.exp(y)) / (1 + x**2)
z


### Exercise 4

Given `x = np.linspace(0, 2*np.pi, 9)`:

1. Compute `sin(x)`  
2. Compute `sin(x)^2 + cos(x)^2` and confirm it's ~1 (numerical tolerance)

*(Try it, then check the solution.)*


In [None]:
# --- Your turn ---
# x = ...
# s = ...
# identity = ...
# s, identity


In [None]:
# Solution
x = np.linspace(0, 2*np.pi, 9)
s = np.sin(x)
identity = np.sin(x)**2 + np.cos(x)**2
s, identity


## 7) Broadcasting

Broadcasting is NumPy’s rule system for applying elementwise operations on arrays of **different shapes**, without manually copying data.

### Core idea
Two shapes are compatible when, comparing dimensions from right to left:
- they are equal, or
- one of them is 1

If compatible, the dimension with size 1 is **virtually stretched** to match.


### Broadcasting with a scalar

In [None]:
A = np.arange(6).reshape(2, 3)
A


In [None]:
A + 10   # scalar broadcasts to every element


### Broadcasting with a 1D array (row-wise)

In [None]:
row = np.array([100, 200, 300])
A + row


### Broadcasting with a column vector (col-wise)

To make a column vector, use shape `(n, 1)`:


In [None]:
col = np.array([10, 20]).reshape(2, 1)
col, col.shape


In [None]:
A + col


### Broadcasting pitfalls (shape mismatch)

If shapes are not compatible, NumPy raises an error.


In [None]:
try:
    bad = A + np.array([1, 2])  # shape (2,) can't match (2,3) along last dim
except ValueError as e:
    print("ValueError:", e)


### Practical example: standardizing features

For a data matrix `X` of shape `(n_samples, n_features)`:
- subtract per-feature mean (shape `(n_features,)`)
- divide by per-feature std  (shape `(n_features,)`)

Broadcasting makes this concise and fast.


In [None]:
X = rng.normal(loc=5, scale=2, size=(5, 3))
X


In [None]:
mu = X.mean(axis=0)     # (3,)
sigma = X.std(axis=0)          # (3,)

X_std = (X - mu) / sigma
mu, sigma, X_std


In [None]:
# Verify means ~0 and stds ~1 (within numerical tolerance)
X_std.mean(axis=0), X_std.std(axis=0)


### Exercise 5 (Broadcasting)

Let `A` be a 4×3 matrix of random integers 0..9.  
Create a length-3 vector `w` and compute the weighted matrix `A * w` (column-wise scaling).  
Then compute the row sums of the weighted matrix.

*(Try it, then check the solution.)*


In [None]:
# --- Your turn ---
# A = ...
# w = ...
# weighted = ...
# row_sums = ...
# A, w, weighted, row_sums


In [None]:
# Solution
A = rng.integers(0, 10, size=(4, 3))
w = np.array([0.1, 1.0, 10.0])  # scales each column
weighted = A * w
row_sums = weighted.sum(axis=1)

A, w, weighted, row_sums


## Wrap-up

You now covered:
- creating arrays and inspecting attributes
- constructing arrays with built-ins
- indexing (basic, slicing, boolean, fancy) + views vs copies
- reshaping and transposing
- ufuncs + `%%timeit` performance intuition
- broadcasting rules and common patterns

**Futher exploration topics:**
- reductions (`sum`, `mean`, `std`) with `axis=...`
- linear algebra with `np.linalg`
- file I/O (`np.loadtxt`, `np.save`, `np.load`)
