# NUMPY

NumPy (Numerical Python) is a fundamental open-source library that provides fast and efficient tools for numerical computing in Python. Its core feature is the powerful N-dimensional array object called ndarray, which allows for compact storage and high-speed manipulation of large datasets. NumPy supports vectorized operations, broadcasting, and a wide range of mathematical, statistical, and linear algebra functions, making it significantly faster and more memory-efficient than native Python lists and loops.

In Data Science, NumPy is essential for tasks like data cleaning, transformation, feature engineering, and mathematical modeling. It serves as the backbone for many popular data science and machine learning libraries, such as Pandas, Scikit-learn, TensorFlow, and PyTorch. Data scientists rely on NumPy for fast numerical computations, efficient manipulation of large data arrays, and as a foundation for building scalable AI and ML pipelines.

## Import

In [None]:
%pip install numpy

import numpy as np

## Creating & Manipulating ndarrays

### Array Creation

In [None]:
a = np.array([1, 2, 3])                # from Python list
b = np.arange(0, 10, 2)                # [0,2,4,6,8]
c = np.linspace(0, 1, num=5)           # 5 equally spaced
zeros = np.zeros((3,4))               # 3×4 of 0.0
ones = np.ones(5, dtype=np.int32)     # length‑5 ints of 1
full = np.full((2,2), fill_value=7)   # all sevens
eye = np.eye(3)                        # 3×3 identity

#### Best Practices

Pre‑allocate large arrays with zeros/empty to avoid costly Python loops.

Use dtype to control memory/precision (e.g. float32 for GPU models).

### Shape Manipulation

In [None]:
x = np.arange(12)              # shape (12,)
y = x.reshape((3,4))           # shape (3,4), view when possible
flat = y.ravel()               # view, flattened
copy_flat = y.flatten()        # copy
squeezed = y.reshape(1,3,4).squeeze()  # remove size‑1 dims
expanded = np.expand_dims(x, axis=0)   # shape (1,12)


## Dimension Management & Broadcasting

### Broadcasting Mechanics
Broadcasting automatically expands smaller arrays to match shapes when performing arithmetic, following these rules:

    Align dimensions from the right.
    Dimensions must be equal or one of them is 1.
    The one with dimension 1 is “stretched” (no actual data copy).

In [None]:
a = np.array([1,2,3])          # shape (3,)
b = np.array([[1],[2],[3]])    # shape (3,1)
a + b  # result shape (3,3)


### Advanced Indexing & Slicing

In [None]:
# Fancy Indexing

arr = np.arange(10)
idx = [2,5,7]
arr[idx]                # array([2,5,7])

# Boolean Masking

mask = arr % 2 == 0
evens = arr[mask]

# Conditional Selection

arr[arr > 5] = -1

# Multi‑dimensional Slicing

mat = np.arange(16).reshape(4,4)
sub = mat[1:3, 2:4]     # rows 1–2, cols 2–3

## Data Types & Memory Optimization

In [None]:
# Dtypes: float32, float64, int8, int32, bool, etc.

a = np.arange(5, dtype=np.int64)
b = a.astype(np.float32)


## Mathematical & Statistical Operations

### Vectorized & Element‑wise

In [None]:
a = np.random.rand(1000000)
b = 2 * a + 1           # vectorized, no Python loop
c = np.sin(a) * np.log(a+1)


### Aggregation Functions

In [None]:
m = a.mean()            # scalar
s = a.std(axis=0)       # along axis
total = a.sum()
cum = np.cumsum(a)
perc = np.percentile(a, 90)


Axis Management: specify axis=0 (rows) or axis=1 (cols) for 2D arrays.

Ignoring NaNs: np.nanmean(), np.nansum(), etc.

## Linear Algebra Essentials

In [None]:
A = np.random.randn(4,4)
B = np.random.randn(4,4)
# Multiplication
C1 = A.dot(B)
C2 = A @ B
# Transpose
At = A.T
# Determinant, inverse, rank, eigendecomp
det = np.linalg.det(A)
inv = np.linalg.inv(A)
rank = np.linalg.matrix_rank(A)
w, v = np.linalg.eig(A)      # eigenvalues w, eigenvectors v
# Solve Ax = b
b = np.random.rand(4)
x = np.linalg.solve(A, b)


## Random Number Generation & Simulation

In [None]:
rng = np.random.default_rng(seed=42)      # modern Generator API
u = rng.uniform(0,1, size=1000)
n = rng.normal(loc=0, scale=1, size=(1000,))
b = rng.binomial(n=10, p=0.3, size=500)


## Fast Computations via Vectorization

In [None]:
# Replace Python loops

# Slow
out = []
for x in a:
    out.append(x**2 + 2*x + 1)
# Fast
out = a**2 + 2*a + 1

#Real‑world Example: feature engineering for pairwise distances

# Given pts shape (N,2), compute NxN distance matrix
diff = pts[:,None,:] - pts[None,:,:]   # broadcasting to (N,N,2)
dists = np.sqrt((diff**2).sum(axis=2))


## Handling NaNs & Infs

In [None]:
data = np.array([1, np.nan, np.inf, 4])
np.isnan(data)
np.isinf(data)
clean = np.nan_to_num(data, nan=0.0, posinf=1e6)
# Stats ignoring NaN
mean_no_nan = np.nanmean(data)


## Concatenation, Splitting & Repeating

In [None]:
a,b = np.random.rand(3,3), np.random.rand(3,3)
v = np.vstack([a,b])     # shape (6,3)
h = np.hstack([a,b])     # shape (3,6)
c = np.column_stack([a[:,0], b[:,1]])
# Splitting
parts = np.split(v, 2, axis=0)
# Tiling & repeating
tile = np.tile(a, (2,3))     # repeat blocks
rep = np.repeat(a, 2, axis=1)


## Efficient File I/O

In [None]:
# Binary
np.save('arr.npy', a)
b = np.load('arr.npy')
# Multiple arrays
np.savez('multi.npz', a=a, b=b)
# Text
np.savetxt('data.csv', m, delimiter=',', header='c1,c2,c3')
m2 = np.loadtxt('data.csv', delimiter=',', skiprows=1)
m3 = np.genfromtxt('data.csv', delimiter=',', names=True)


## Integrating with Pandas & ML Pipelines

In [None]:
#Convert back‑and‑forth

import pandas as pd
df = pd.DataFrame(a, columns=['x','y','z'])
arr = df.values   # or df.to_numpy()

#Preprocessing with NumPy

# Standardization
X_std = (X - X.mean(axis=0)) / X.std(axis=0)
# Min‑Max scaling
X_mm = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))

## Performance Benchmarking

In [None]:
# Timing
%timeit np.sum(a)
%timeit sum(a)          # pure Python
# Vectorize Python function
f = np.vectorize(lambda x: x**3 + 2)
v = f(a)                # still slower than direct arithmetics
