# 20 NumPy

**NumPy** - Numerical Python Library

**NumPy** is the fundamental package for scientific computing in Python. It provides:
- High-performance multidimensional array objects (ndarray)
- Tools for working with arrays and mathematical operations
- Linear algebra, Fourier transform, and random number capabilities

**Key Use Cases:**
1. Scientific Computing: Mathematical operations on large datasets
2. Data Analysis: Foundation for pandas and data manipulation
3. Machine Learning: Base array structure for ML libraries (scikit-learn, TensorFlow)
4. Image Processing: Representing and manipulating image data
5. Signal Processing: Audio and signal analysis
6. Linear Algebra: Matrix operations and transformations
7. Statistical Analysis: Statistical computations and probability distributions
8. Simulation: Monte Carlo simulations and numerical experiments

Performance: NumPy is written in C, making it significantly faster than pure Python loops
for numerical operations through vectorization.

**Why NumPy?** Unlike Python lists, NumPy arrays are homogeneous (same data type), contiguous in memory, and support vectorized operations. This leads to better performance and memory efficiency, especially for large datasets. NumPy's ndarray is the backbone of many scientific libraries, enabling fast computations without explicit loops.

## Import NumPy

NumPy is imported using the alias `np` for convenience. This makes all NumPy functions and classes available. Checking the version ensures compatibility and confirms the library is loaded correctly.

**Details:** Importing with `import numpy as np` is a standard convention to avoid typing the full module name. The `__version__` attribute helps verify the installed version, which is crucial for reproducibility in scientific work, as different versions may have slight API changes or bug fixes.

In [2]:
import numpy as np

print("NumPy version:", np.__version__)

NumPy version: 2.3.4


## Creating Arrays

Arrays are the core data structure in NumPy. This section demonstrates creating arrays from Python lists, including 1D and 2D arrays. The `np.array()` function converts lists into efficient NumPy arrays.

**Details:** Arrays differ from lists in that they are fixed-size, homogeneous, and support advanced indexing. Creating arrays from lists is straightforward but note that nested lists become multidimensional arrays. NumPy also provides specialized functions for common array creation patterns to avoid manual list construction.

### Creating Arrays from Lists

NumPy arrays can be created directly from Python lists using `np.array()`. This converts the list into a high-performance NumPy array, preserving the structure (e.g., 1D or 2D).

**Details:** The `np.array()` function infers the data type automatically (e.g., int, float) based on the input. For mixed types, it may upcast to a common type like float. This is efficient for small to medium datasets but for large ones, use other methods to avoid copying data unnecessarily.

In [3]:
# From list
arr1 = np.array([1, 2, 3, 4, 5])
print("From list:", arr1)

# 2D array
arr2 = np.array([[1,2], [3,4]])
print("2D array:", arr2)

From list: [1 2 3 4 5]
2D array: [[1 2]
 [3 4]]


### Creating Arrays with Specific Values

`np.zeros()` creates an array filled with zeros, useful for initialization. `np.ones()` fills with ones. `np.empty()` allocates uninitialized memory for speed, but values are unpredictable.

**Details:** These functions take a shape tuple (e.g., (3,3)) and optionally a dtype. `zeros` and `ones` are great for initializing matrices in algorithms like gradient descent. `empty` is faster but risky, as it may contain garbage values—use only when you'll overwrite immediately.

In [4]:
# Zeros, ones, empty
zeros = np.zeros((3,3))
ones = np.ones((2,4))
empty = np.empty((2,2))
print("Zeros:", zeros)
print("Ones:", ones)
print("Empty:", empty)

Zeros: [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Ones: [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
Empty: [[0. 0.]
 [0. 0.]]


### Creating Arrays with Ranges

`np.arange(start, stop, step)` generates arrays with evenly spaced values, similar to Python's range but as a NumPy array. It's efficient for sequences.

**Details:** Unlike `range()`, `arange` returns an array directly, supporting floats. It's vectorized and fast. Note that floating-point precision can cause issues with large ranges—consider `linspace` for exact endpoint inclusion.

In [6]:
# Range
range_arr = np.arange(10)
print("Arange:", range_arr)

Arange: [0 1 2 3 4 5 6 7 8 9]


### Creating Arrays with Linear Spacing

`np.linspace(start, stop, num)` produces an array of `num` evenly spaced samples between `start` and `stop`, inclusive. Ideal for plotting or sampling.

**Details:** This ensures the start and stop values are included, unlike `arange`. The `endpoint` parameter defaults to True. It's perfect for creating x-values for graphs or discretizing continuous ranges in simulations.

In [9]:
# Linspace
lin = np.linspace(0, 1, 5)
print("Linspace:", lin)

Linspace: [0.   0.25 0.5  0.75 1.  ]


### Creating Random Arrays

`np.random.rand()` generates arrays of random floats between 0 and 1. Random functions are useful for simulations, testing, or initializing weights in ML.

**Details:** NumPy's random module uses a Mersenne Twister PRNG. For reproducibility, set a seed with `np.random.seed()`. Other functions like `randn()` for normal distribution are available. Always seed for consistent results in experiments.

In [10]:
# Random
rand = np.random.rand(3,3)
print("Random:", rand)

Random: [[0.33522322 0.08712484 0.37272554]
 [0.23128393 0.61738914 0.43339002]
 [0.18215275 0.85921511 0.99161153]]


## Array Attributes and Properties

NumPy arrays have attributes like `shape` (dimensions), `dtype` (data type), `ndim` (number of dimensions), `size` (total elements), and `itemsize` (bytes per element). These help understand and manipulate array structure.

**Details:** `shape` is a tuple showing size per dimension. `dtype` affects memory and operations (e.g., int32 vs float64). `size` is the product of shape. These attributes are read-only and essential for debugging and optimization.

In [7]:
arr = np.array([[1,2,3], [4,5,6]])
print("Array:", arr)
print("Shape:", arr.shape)
print("Dtype:", arr.dtype)
print("Ndim:", arr.ndim)
print("Size:", arr.size)
print("Itemsize:", arr.itemsize)

Array: [[1 2 3]
 [4 5 6]]
Shape: (2, 3)
Dtype: int64
Ndim: 2
Size: 6
Itemsize: 8


## Indexing and Slicing

Indexing accesses individual elements, while slicing extracts subarrays. For 2D arrays, use comma-separated indices. Boolean indexing filters elements based on conditions, enabling efficient data selection.

**Details:** Slicing creates views, not copies, for memory efficiency. Fancy indexing (arrays of indices) and boolean masks allow complex selections. This is powerful for data filtering without loops, common in data preprocessing.

In [11]:
arr = np.array([10, 20, 30, 40, 50])
print("Array:", arr)
print("Index 0:", arr[0])
print("Slice 1:3:", arr[1:3])

# 2D
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
print("2D Array:", arr2d)
print("Element [1,2]:", arr2d[1,2])
print("Row 0:", arr2d[0])
print("Column 1:", arr2d[:,1])

# Boolean indexing
print("Boolean indexing:", arr[arr > 25])

Array: [10 20 30 40 50]
Index 0: 10
Slice 1:3: [20 30]
2D Array: [[1 2 3]
 [4 5 6]
 [7 8 9]]
Element [1,2]: 6
Row 0: [1 2 3]
Column 1: [2 5 8]
Boolean indexing: [30 40 50]


## Array Operations

NumPy supports element-wise operations (e.g., addition, multiplication) and universal functions (ufuncs) like `np.sin()`. These are vectorized for speed, avoiding Python loops.

**Details:** Ufuncs apply operations element-wise and support broadcasting. They are compiled in C, making them much faster than Python loops. Use them for mathematical computations on arrays.

In [9]:
a = np.array([1,2,3])
b = np.array([4,5,6])
print("a:", a)
print("b:", b)
print("a + b:", a + b)
print("a * b:", a * b)
print("a ** 2:", a ** 2)
print("np.sin(a):", np.sin(a))

a: [1 2 3]
b: [4 5 6]
a + b: [5 7 9]
a * b: [ 4 10 18]
a ** 2: [1 4 9]
np.sin(a): [0.84147098 0.90929743 0.14112001]


## Broadcasting

Broadcasting allows operations on arrays of different shapes by automatically expanding smaller arrays. Here, a 1D array is broadcasted to match the 2D array's shape for addition.

**Details:** Broadcasting follows rules: dimensions must be compatible (same size or 1). It avoids explicit loops and memory copies. Essential for efficient matrix-vector operations in ML.

In [12]:
a = np.array([[1,2,3], [4,5,6]])
b = np.array([10, 20, 30])
print("a:", a)
print("b:", b)
print("a + b:", a + b)

a: [[1 2 3]
 [4 5 6]]
b: [10 20 30]
a + b: [[11 22 33]
 [14 25 36]]


## Reshaping and Transposing

`reshape()` changes array shape without altering data. `T` transposes dimensions. `flatten()` and `ravel()` create 1D copies, with `ravel()` potentially returning a view for efficiency.

**Details:** `reshape()` requires the total size to match. Transposing swaps axes. `flatten()` always copies, `ravel()` may not—use `ravel()` for performance when possible.

In [11]:
arr = np.arange(12)
print("1D:", arr)
reshaped = arr.reshape(3,4)
print("Reshaped 3x4:", reshaped)
print("Transpose:", reshaped.T)
print("Flatten:", reshaped.flatten())
print("Ravel:", reshaped.ravel())

1D: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped 3x4: [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Transpose: [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
Flatten: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Ravel: [ 0  1  2  3  4  5  6  7  8  9 10 11]


## Concatenation and Splitting

`np.concatenate()` joins arrays along an axis. `vstack()` and `hstack()` stack vertically or horizontally. `np.split()` divides arrays into subarrays.

**Details:** Concatenation preserves dimensions. Axis matters: 0 for rows, 1 for columns. Splitting is useful for batching data. These operations are memory-efficient and vectorized.

In [12]:
a = np.array([1,2])
b = np.array([3,4])
print("a:", a)
print("b:", b)
print("Concatenate:", np.concatenate((a,b)))
print("Vstack:", np.vstack((a,b)))
print("Hstack:", np.hstack((a,b)))

arr = np.arange(10)
print("Array:", arr)
print("Split into 2:", np.split(arr, 2))

a: [1 2]
b: [3 4]
Concatenate: [1 2 3 4]
Vstack: [[1 2]
 [3 4]]
Hstack: [1 2 3 4]
Array: [0 1 2 3 4 5 6 7 8 9]
Split into 2: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]


## Mathematical Functions

NumPy offers aggregation functions like `sum()`, `mean()`, and statistical measures. Trigonometric functions like `sin()` operate element-wise on arrays.

**Details:** Aggregations can specify axes for partial computations. Statistical functions handle NaNs with `nanmean()`, etc. These are optimized and support large arrays seamlessly.

In [13]:
arr = np.array([1,2,3,4,5])
print("Array:", arr)
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Std:", np.std(arr))
print("Min:", np.min(arr))
print("Max:", np.max(arr))
print("Sin:", np.sin(arr))

Array: [1 2 3 4 5]
Sum: 15
Mean: 3.0
Std: 1.4142135623730951
Min: 1
Max: 5
Sin: [ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]


## Linear Algebra Operations

Matrix operations include dot products (`np.dot()`), matrix multiplication (`@`), transposes, inverses (`np.linalg.inv()`), and eigenvalues (`np.linalg.eigvals()`).

**Details:** `np.linalg` provides BLAS/LAPACK-based functions for speed. Matrix multiplication is fundamental in ML. Eigenvalues are used in PCA, stability analysis, etc.

In [14]:
a = np.array([[1,2], [3,4]])
b = np.array([[5,6], [7,8]])
print("a:", a)
print("b:", b)
print("Dot product:", np.dot(a,b))
print("Matrix multiply:", a @ b)
print("Transpose a:", a.T)
print("Inverse a:", np.linalg.inv(a))
print("Eigenvalues:", np.linalg.eigvals(a))

a: [[1 2]
 [3 4]]
b: [[5 6]
 [7 8]]
Dot product: [[19 22]
 [43 50]]
Matrix multiply: [[19 22]
 [43 50]]
Transpose a: [[1 3]
 [2 4]]
Inverse a: [[-2.   1. ]
 [ 1.5 -0.5]]
Eigenvalues: [-0.37228132  5.37228132]


## Random Number Generation

`np.random.seed()` sets the random state for reproducibility. Functions like `randint()`, `normal()`, `choice()`, and `rand()` generate various random distributions.

**Details:** Seeding ensures deterministic outputs. Distributions like normal are parameterized (mean, std). Useful for Monte Carlo methods, bootstrapping, and model initialization.

In [15]:
np.random.seed(42)
print("Random int:", np.random.randint(0,10,5))
print("Normal:", np.random.normal(0,1,5))
print("Choice:", np.random.choice([1,2,3,4,5], 3))
print("Rand:", np.random.rand(3))

Random int: [6 3 7 4 6]
Normal: [-0.91682684 -0.12414718 -2.01096289 -0.49280342  0.39257975]
Choice: [5 2 4]
Rand: [9.38552709e-01 7.78765841e-04 9.92211559e-01]


## File I/O with Arrays

`np.save()` and `np.load()` handle binary NumPy files (.npy). `np.savetxt()` and `np.loadtxt()` work with text files, useful for data exchange.

**Details:** Binary files are faster and preserve dtype. Text files are human-readable but slower. Use compression for large arrays. Essential for saving/loading models or datasets.

In [16]:
arr = np.array([1,2,3,4,5])
np.save('temp.npy', arr)
loaded = np.load('temp.npy')
print("Saved and loaded:", loaded)

np.savetxt('temp.txt', arr)
loaded_txt = np.loadtxt('temp.txt')
print("Saved txt and loaded:", loaded_txt)

Saved and loaded: [1 2 3 4 5]
Saved txt and loaded: [1. 2. 3. 4. 5.]


## Performance Tips

This code compares vectorized operations (fast, using NumPy's C backend) versus Python loops (slow). Vectorization is key for performance on large arrays.

**Details:** Vectorization leverages SIMD instructions. Avoid loops by using built-in functions. For very large data, consider memory-mapped arrays or chunking.

In [17]:
import time

arr = np.arange(1000000)

# Vectorized
start = time.time()
result = arr * 2
end = time.time()
print("Vectorized time:", end - start)

# Loop
start = time.time()
result2 = []
for x in arr:
    result2.append(x * 2)
end = time.time()
print("Loop time:", end - start)

Vectorized time: 0.0010919570922851562
Loop time: 0.07499289512634277
Loop time: 0.07499289512634277


## Cleanup

This cell removes temporary files created earlier to keep the workspace clean. It uses `os` module functions for file management.

**Details:** Good practice to clean up after demos. `os.path.exists()` checks before removal to avoid errors. Prevents clutter in shared environments.

In [18]:
import os

for f in ["temp.npy", "temp.txt"]:
    if os.path.exists(f):
        os.remove(f)
        print(f"Removed: {f}")

print("\nCleanup complete!")

Removed: temp.npy
Removed: temp.txt

Cleanup complete!
