# NumPy for AI/ML & Data Science

This notebook covers **essential NumPy concepts** needed for Artificial Intelligence (AI), Machine Learning (ML), and Data Science:
- Installation & Import
- Array creation & data types
- Indexing, slicing, and boolean masking
- Shape, dimension, and reshaping
- Broadcasting rules
- Vectorized arithmetic operations
- Aggregate & statistical functions
- Random number generation & reproducibility
- Stacking, splitting, concatenation
- Linear algebra (`dot`, `matmul`, inversion, eigenvalues)
- Advanced indexing & fancy indexing
- Performance tips & comparison with Python lists

At the end, there are **practice problems** and a **capstone exercise** related to real AI/ML preprocessing tasks.

## 1) Installation & Import
Use pip to install NumPy if not already installed.

In [74]:

# Installation (uncomment if running locally and NumPy is not installed)
# !pip install numpy

import numpy as np
print(np.__version__)


2.3.2


## 2) Creating Arrays
Create arrays from lists, tuples, or using NumPy built-ins.

In [75]:

# 1D array
a = np.array([1, 2, 3])
# 2D array
b = np.array([[1, 2, 3], [4, 5, 6]])
# Specifying dtype
c = np.array([1, 2, 3], dtype=float)

print(a, a.dtype)
print(b, b.ndim)
print(c)

[1 2 3] int64
[[1 2 3]
 [4 5 6]] 2
[1. 2. 3.]


## 3) Array Attributes

In [76]:

arr = np.arange(12).reshape(3,4)
print("Array contents:\n", arr)

print("\n\nShape:", arr.shape)
print("Dimensions:", arr.ndim)
print("Data type:", arr.dtype)
print("Size (elements):", arr.size)
print("Memory per element (bytes):", arr.itemsize)


Array contents:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


Shape: (3, 4)
Dimensions: 2
Data type: int64
Size (elements): 12
Memory per element (bytes): 8


## 4) Indexing & Slicing
Access individual elements, rows, columns, or subarrays.

In [77]:

arr = np.arange(1, 13).reshape(3, 4)
print("Array contents:\n", arr)

print("\n\nElement [0,2]:", arr[0, 2])
print("First row:", arr[0, :])
print("Last column:", arr[:, -1])
print("Subarray:", arr[0:2, 1:3])

Array contents:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Element [0,2]: 3
First row: [1 2 3 4]
Last column: [ 4  8 12]
Subarray: [[2 3]
 [6 7]]


## 5) Boolean Masking & Filtering
Select elements based on conditions (essential for preprocessing).

In [78]:

arr = np.array([10, 20, 30, 40, 50])
mask = arr > 25
print("Mask:", mask)
print("Filtered:", arr[mask])

Mask: [False False  True  True  True]
Filtered: [30 40 50]


## 6) Special Array Creation

In [79]:

print("Array of zeros:\n", np.zeros((2,3)))

print("\nArray of ones:\n", np.ones((2,3)))

print("\nArray filled with 7:\n", np.full((2,3), 7))

print("\nIdentity matrix:\n", np.eye(4))  # identity matrix

print("\nArray with arange:\n", np.arange(0, 10, 3))

print("\nArray with linspace:\n", np.linspace(0, 1, 5))


Array of zeros:
 [[0. 0. 0.]
 [0. 0. 0.]]

Array of ones:
 [[1. 1. 1.]
 [1. 1. 1.]]

Array filled with 7:
 [[7 7 7]
 [7 7 7]]

Identity matrix:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Array with arange:
 [0 3 6 9]

Array with linspace:
 [0.   0.25 0.5  0.75 1.  ]


## 7) Random Numbers & Reproducibility
For AI/ML experiments, fix seeds to reproduce results.

In [80]:
np.random.seed(42)

print("Random array (uniform [0,1):\n", np.random.rand(2,3))

print("\nRandom array (standard normal):\n", np.random.randn(2,3))

print("\nRandom array (integers):\n", np.random.randint(1, 10, (2,3)))

Random array (uniform [0,1):
 [[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]]

Random array (standard normal):
 [[ 1.57921282  0.76743473 -0.46947439]
 [ 0.54256004 -0.46341769 -0.46572975]]

Random array (integers):
 [[6 9 1]
 [3 7 4]]


## 8) Reshaping & Transpose

In [81]:

arr = np.arange(12)
reshaped = arr.reshape(3,4)
print(reshaped)
print("\nTranspose:\n", reshaped.T)


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Transpose:
 [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]


## 9) Broadcasting
Different-shaped arrays in operations.

In [82]:

A = np.ones((3,3))
b = np.array([1, 2, 3])

print(A + b)  # b is broadcasted across rows


[[2. 3. 4.]
 [2. 3. 4.]
 [2. 3. 4.]]


## 10) Vectorized Arithmetic
Fast element-wise operations (no explicit loops).

In [83]:

arr = np.array([1,2,3,4])

print(arr + 10)

print(arr ** 2)

print(np.sqrt(arr))

[11 12 13 14]
[ 1  4  9 16]
[1.         1.41421356 1.73205081 2.        ]


## 11) Aggregate & Statistical Functions

In [84]:

arr = np.array([[1,2,3],[4,5,6]])
print("Sum:", arr.sum())
print("Mean:", arr.mean())
print("Std Dev:", arr.std())
print("Column-wise sum:", arr.sum(axis=0))


Sum: 21
Mean: 3.5
Std Dev: 1.707825127659933
Column-wise sum: [5 7 9]


## 12) Stacking & Splitting

In [85]:

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print("Vertical stack:\n", np.vstack((a,b)))
print("\nHorizontal stack:\n", np.hstack((a,b)))


Vertical stack:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Horizontal stack:
 [[1 2 5 6]
 [3 4 7 8]]


## 13) Linear Algebra — Essential for AI/ML

In [86]:

from numpy.linalg import inv, eig, svd

A = np.array([[1,2],[3,4]])
b = np.array([5,6])

print("Matrix-Vector product:\n", np.dot(A,b))

print("\nInverse:\n", inv(A))

print("\nEigenvalues & Eigenvectors:\n", eig(A))

U, S, Vt = svd(A)
print("\nSVD \n U:\n", U, "\nS:\n", S, "\nVt:\n", Vt)

Matrix-Vector product:
 [17 39]

Inverse:
 [[-2.   1. ]
 [ 1.5 -0.5]]

Eigenvalues & Eigenvectors:
 EigResult(eigenvalues=array([-0.37228132,  5.37228132]), eigenvectors=array([[-0.82456484, -0.41597356],
       [ 0.56576746, -0.90937671]]))

SVD 
 U:
 [[-0.40455358 -0.9145143 ]
 [-0.9145143   0.40455358]] 
S:
 [5.4649857  0.36596619] 
Vt:
 [[-0.57604844 -0.81741556]
 [ 0.81741556 -0.57604844]]


## 14) Performance vs Python Lists

In [87]:

import time

size = 10_000_000
lst = list(range(size))
arr = np.arange(size)

start = time.time()
_ = [x*2 for x in lst]
print("List time:", time.time()-start)

start = time.time()
_ = arr * 2
print("NumPy time:", time.time()-start)


List time: 0.3508589267730713
NumPy time: 0.13302946090698242



## Practice Exercises
1. Create a 5x5 NumPy array with random integers between 1 and 100, then:
   - Replace all even numbers with 0.
   - Extract the middle row.

2. Simulate 1000 coin flips using `np.random.choice(['H','T'], size=1000)`, count heads/tails.

3. Create two random matrices (5x5) and compute:
   - Their dot product.
   - Element-wise multiplication.

4. Given a NumPy array of shape (10,3) representing (x,y,z) coordinates, normalize each column to mean=0, std=1.


In [88]:
arr = np.random.randint(1, 101, (5, 5))
print("Original array:\n", arr)

arr[arr % 2 == 0] = 0
print("\nArray after replacing evens with 0:\n", arr)

middle_row = arr[2, :]
print("\nMiddle row:\n", middle_row)

Original array:
 [[64  3 51  7 21]
 [73 39 18  4 89]
 [60 14  9 90 53]
 [ 2 84 92 60 71]
 [44  8 47 35 78]]

Array after replacing evens with 0:
 [[ 0  3 51  7 21]
 [73 39  0  0 89]
 [ 0  0  9  0 53]
 [ 0  0  0  0 71]
 [ 0  0 47 35  0]]

Middle row:
 [ 0  0  9  0 53]


In [89]:
flips = np.random.choice(['H','T'], size=1000)
heads = np.sum(flips == 'H')
tails = np.sum(flips == 'T')
print(f"Heads: {heads}, Tails: {tails}")

Heads: 505, Tails: 495


In [90]:
A = np.random.rand(5, 5)
B = np.random.rand(5, 5)

dot_product = np.dot(A, B)
print("Dot product:\n", dot_product)

elementwise_product = A * B
print("\nElement-wise product:\n", elementwise_product)

Dot product:
 [[1.07697369 0.67720426 0.74614338 0.97918341 0.71556699]
 [1.75465107 1.72178359 1.8451786  1.61944845 1.99873808]
 [1.27875211 1.08647963 1.15494109 1.14409599 1.41186579]
 [2.07359648 1.69953434 1.63909681 2.10426184 1.89736553]
 [1.68937248 1.36117402 1.20861682 1.88170741 1.34404685]]

Element-wise product:
 [[0.54988192 0.0256733  0.12114723 0.02019857 0.14843303]
 [0.249345   0.57999956 0.24513588 0.37500418 0.8151224 ]
 [0.06797865 0.05244683 0.02705612 0.29801527 0.14609977]
 [0.16409829 0.37060577 0.27497364 0.48523002 0.28401486]
 [0.57334137 0.24341786 0.4441661  0.09990413 0.06338309]]


In [91]:
coords = np.random.rand(10, 3)

coords_mean = coords.mean(axis=0)
print("Mean:\n", coords_mean)

coords_std = coords.std(axis=0)
print("\nStd:\n", coords_std)

normalized_coords = (coords - coords_mean) / coords_std
print("\nNormalized coordinates:\n", normalized_coords)

Mean:
 [0.54171391 0.52720293 0.38777387]

Std:
 [0.3281324  0.24104022 0.26754926]

Normalized coordinates:
 [[-0.39634212  1.44006954  0.47640642]
 [ 1.31470237  0.31004148 -0.61269018]
 [ 0.85354793 -0.75555982 -0.15008322]
 [-1.55397402  0.08924809  0.54812199]
 [-0.56599842  1.52262696 -0.96814124]
 [-0.64490551 -0.85305335 -1.10440703]
 [-0.18458559  0.6662033   0.46302929]
 [-1.172503   -0.62195831 -1.4396558 ]
 [ 0.99529093 -1.83656457  0.78304912]
 [ 1.35476743  0.03894668  2.00437066]]
