# Intro to NumPy

NumPy is the foundational library for numerical computing in Python. Almost every ML/AI library (Pandas, Scikit-learn, PyTorch, TensorFlow) builds on top of it. This notebook covers what NumPy is, why it's used, and the essentials: **ndarray**, **shape**, **dtype**, and **ndim**.

## What is NumPy and why is it used?

**NumPy** = *Num*erical *Py*thon. It provides:

- **Fixed-type arrays**: Unlike Python lists (which can hold any type and change size), NumPy arrays store one type in a contiguous block of memory.
- **Speed**: Operations are implemented in C and run on the whole array at once (vectorization). No Python loop overhead.
- **Rich array operations**: Element-wise math, linear algebra, broadcasting, indexing—all optimized.

**Why it's faster than “traditional” Python:**

- Python lists are objects: each element is a pointer to another object. Looping and doing math in Python means type checks and function calls for every element.
- NumPy arrays are homogeneous: one `dtype`, one block of memory. Operations are done in compiled code over that block—often 10–100× faster for large arrays.

You use NumPy whenever you have numeric data (vectors, matrices, tensors) and want fast, consistent operations. That’s why it’s the base for data science and ML.

In [3]:
import numpy as np

# Quick comparison: Python list vs NumPy array
# List: mixed types, loop required for element-wise math
py_list = [1, 2, 3, 4, 5]
squared_list = [x**2 for x in py_list]
print(f"Python list squared (loop): {squared_list}")

# NumPy: one dtype, vectorized operations—no loop
arr = np.array([1, 2, 3, 4, 5])
squared_arr = arr**2  # one expression, no loop—vectorized
print(f"NumPy array squared (vectorized): {squared_arr}")

Python list squared (loop): [1, 4, 9, 16, 25]
NumPy array squared (vectorized): [ 1  4  9 16 25]


## The ndarray: NumPy’s core type

In NumPy, **everything numeric lives in an `ndarray`** (n-dimensional array). Whether it’s a 1D vector, 2D matrix, or higher-dimensional tensor, it’s an `ndarray`. Three attributes you must know:

| Attribute | Meaning |
|----------|--------|
| **`shape`** | Tuple of sizes along each dimension, e.g. `(3,)` or `(2, 4)` |
| **`dtype`** | The numeric type of the elements, e.g. `float64`, `int32` |
| **`ndim`** | Number of dimensions (axes), e.g. 1 for a vector, 2 for a matrix |

![Tensors](../../../public/images/tensors.png)

In [16]:
# 1D array (vector)
v = np.array([1, 2, 3, 4, 5])
print(f"1D array: {v}")
print(f"  shape: {v.shape}  (length along the only axis)")
print(f"  dtype: {v.dtype}")
print(f"  ndim:  {v.ndim}")

1D array: [1 2 3 4 5]
  shape: (5,)  (length along the only axis)
  dtype: int64
  ndim:  1


In [17]:
# 2D array (matrix)
M = np.array([[1, 2, 3], [4, 5, 6]])
print(f"2D array:\n{M}")
print(f"  shape: {M.shape}  (rows, columns)")
print(f"  dtype: {M.dtype}")
print(f"  ndim:  {M.ndim}")

2D array:
[[1 2 3]
 [4 5 6]]
  shape: (2, 3)  (rows, columns)
  dtype: int64
  ndim:  2


In [18]:
# 3D array (e.g. a batch of matrices or image with channels)
T = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(f"3D array (2 blocks of 2x2):\n{T}")
print(f"  shape: {T.shape}  (blocks, rows, cols)")
print(f"  dtype: {T.dtype}")
print(f"  ndim:  {T.ndim}")

3D array (2 blocks of 2x2):
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
  shape: (2, 2, 2)  (blocks, rows, cols)
  dtype: int64
  ndim:  3


In [19]:
# dtype: you can set it explicitly (important for memory and consistency)
ints = np.array([1, 2, 3], dtype=np.int32)
floats = np.array([1.0, 2.0, 3.0], dtype=np.float64)
print(f"int32 array:  {ints}, dtype={ints.dtype}")
print(f"float64 array: {floats}, dtype={floats.dtype}")
# Summary: every ndarray has shape, dtype, and ndim—inspect these whenever you get a new array
print(f"\nSummary: shape={ints.shape}, dtype={ints.dtype}, ndim={ints.ndim}")

int32 array:  [1 2 3], dtype=int32
float64 array: [1. 2. 3.], dtype=float64

Summary: shape=(3,), dtype=int32, ndim=1
