<a href="https://colab.research.google.com/github/aadarshsenapati/machine-learning/blob/main/Lab1_AP23110010458.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [None]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


[1 2 3]
[0.  0.2 0.4 0.6 0.8 1. ]
[[7.5 7.5 7.5]
 [7.5 7.5 7.5]]
dtypes: int32 float64


In [None]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones
b = np.zeros((10, 10), dtype=int)
b[1::2, ::2] = 1
b[::2, 1::2] = 1
print(b)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [None]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


shape (3, 4) ndim 2 size 12 itemsize 8 total bytes 96


In [None]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
f = np.ones((1000, 1000), dtype=float)
f.nbytes

8000000

## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [None]:
a = np.arange(1,26).reshape(5,5)
print(a[:, 0])     # first column
print(a[::2, ::2]) # every 2nd row/col
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


[ 1  6 11 16 21]
[[ 1  3  5]
 [11 13 15]
 [21 23 25]]
multiples of 3: [ 3  6  9 12 15 18 21 24]


In [None]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`
a[0], a[-1] = a[-1], a[0]
print(a)

[[21 22 23 24 25]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [None]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


b is modified: [99  1  2  3  4  5  6  7]
b unchanged with copy: [99  1  2  3  4  5  6  7]


In [None]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
a = np.array([[1,2,3],
              [4,5,6],
               [7,8,9]])
print(a)
b = a.flatten()
print("3-D to 1-D using flatten as copy: ",b)
c = a.ravel()
print("3-D to 1-D using ravel as copy: ",c)
print("3-D to 1-D using flatten as view: ",a.flatten())
print("3-D to 1-D using ravel as view: ",a.ravel())

[[1 2 3]
 [4 5 6]
 [7 8 9]]
3-D to 1-D using flatten as copy:  [1 2 3 4 5 6 7 8 9]
3-D to 1-D using ravel as copy:  [1 2 3 4 5 6 7 8 9]
3-D to 1-D using flatten as view:  [1 2 3 4 5 6 7 8 9]
3-D to 1-D using ravel as view:  [1 2 3 4 5 6 7 8 9]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [None]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


union [1 2 3 4 5 6]
intersect [1 2]
sorted descending [6 5 4 3 2 1]


In [None]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
xy.shape
np.array_split(xy, 2)

[array([1, 3, 5]), array([2, 4, 6])]

## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [None]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


In [None]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
O = np.array([0,30,45,60,90])
s = np.sin(np.deg2rad(O))
for i in range(len(O)):
  print(f"The value of sin({O[i]}) is {s[i]}")

The value of sin(0) is 0.0
The value of sin(30) is 0.49999999999999994
The value of sin(45) is 0.7071067811865475
The value of sin(60) is 0.8660254037844386
The value of sin(90) is 1.0


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [None]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)

[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]]


In [None]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.
r = np.arange(1,11).reshape(1,10)
c = np.arange(1,11).reshape(10,1)
print(r*c)

[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [None]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))

data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [None]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
b= a.flatten()
print(np.percentile(b, 25))
print(np.percentile(b, 50))
print(np.percentile(b, 75))

3.0
5.0
7.0


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [None]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735] [31 49 39 40 38]


In [None]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
r = np.random.randint(1,7,size=100)
p = np.mean(r == 6)
print(p)

0.17


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [None]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())

['Alice' 'Bob'] 27.5


In [None]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
import numpy as np
from numpy.lib.recfunctions import append_fields

data = np.array([(1, 'Alice'), (2, 'Bob')],
                dtype=[('id', 'i4'), ('name', 'U10')])

heights = [165, 180]

data = append_fields(data, 'height', heights, usemask=False)
print(data)

[(1, 'Alice', 165) (2, 'Bob', 180)]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [None]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


A·x ≈ b? True


In [None]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
w, v = np.linalg.eig(A)
print(w)
print(v)

[ 2.01791164+0.j         -0.24357717+0.15909881j -0.24357717-0.15909881j]
[[ 0.73993162+0.j         -0.1945278 -0.40881079j -0.1945278 +0.40881079j]
 [ 0.48760637+0.j          0.69835874+0.j          0.69835874-0.j        ]
 [ 0.4634018 +0.j         -0.39793102+0.38597753j -0.39793102-0.38597753j]]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [None]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


loaded equals A? True


In [None]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.
a = data
print("Original Data: ", a)
np.savetxt('stats.csv', a, delimiter=',')
b = np.loadtxt('stats.csv', delimiter=',')
print("Loaded data: ")
b

Original Data:  [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
Loaded data: 


array([[85., 64., 51., 27.],
       [31.,  5.,  8.,  2.],
       [18., 81., 65., 91.],
       [50., 61., 97., 73.],
       [63., 54., 56., 93.]])

## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [None]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05'] 1 days


In [None]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
m = np.sum(dates.astype('datetime64[D]').view('int64') % 7 == 0)
print(m)

13


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [None]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


2.3333333333333335


In [None]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
a = np.array([[1,   2,  np.nan],
                [4, np.nan, 6],
                [7,   8,  9]], dtype=float)
c = np.nanmean(a, axis=0)
i = np.where(np.isnan(a))
a[i] = np.take(c, i[1])

print(a)

[[1.  2.  7.5]
 [4.  5.  6. ]
 [7.  8.  9. ]]


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [None]:
fitness = np.genfromtxt('/content/fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


columns: ('date', 'step_count', 'mood', 'calories_burned', 'hours_of_sleep', 'bool_of_active', 'weight_kg') rows: 96


In [None]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.
m = np.array([d[3:] for d in fitness['date']])
u = np.unique(m)

s = [np.sum(fitness['step_count'][m == i]) for i in u]

print("\nMonthly step totals:")
for month, steps in zip(u, s):
    print(f"{month}: {steps}")

sh = fitness['hours_of_sleep']
mood = fitness['mood']
corr = np.corrcoef(sh, mood)[0, 1]
print("\nSleep vs mood correlation:", round(corr, 3))

d = np.array([np.datetime64(f"{d[6:]}-{d[3:5]}-{d[:2]}") for d in fitness['date']])
w = dates.astype('datetime64[W]')

r = []
for week in np.unique(w):
    mask = w == week
    r.append((
        str(week),
        np.sum(fitness['step_count'][mask]),
        np.mean(fitness['mood'][mask]),
        np.mean(fitness['hours_of_sleep'][mask])
    ))

print("\nWeekly summary (week, total_steps, avg_mood, avg_sleep):")
for i in r:
    print(i)


Monthly step totals:
01-2018: 10163
10-2017: 79051
11-2017: 103071
12-2017: 89565

Sleep vs mood correlation: 0.21

Weekly summary (week, total_steps, avg_mood, avg_sleep):
('2017-10-05', np.int64(28451), np.float64(133.33333333333334), np.float64(5.5))
('2017-10-12', np.int64(19456), np.float64(128.57142857142858), np.float64(6.142857142857143))
('2017-10-19', np.int64(19524), np.float64(142.85714285714286), np.float64(5.714285714285714))
('2017-10-26', np.int64(16055), np.float64(242.85714285714286), np.float64(6.142857142857143))
('2017-11-02', np.int64(24977), np.float64(285.7142857142857), np.float64(3.857142857142857))
('2017-11-09', np.int64(27678), np.float64(300.0), np.float64(5.857142857142857))
('2017-11-16', np.int64(20375), np.float64(285.7142857142857), np.float64(5.428571428571429))
('2017-11-23', np.int64(21998), np.float64(257.14285714285717), np.float64(6.142857142857143))
('2017-11-30', np.int64(20393), np.float64(300.0), np.float64(7.0))
('2017-12-07', np.int64(224

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*