# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [None]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


[1 2 3]
[0.  0.2 0.4 0.6 0.8 1. ]
[[7.5 7.5 7.5]
 [7.5 7.5 7.5]]
dtypes: int32 float64


In [None]:
import numpy as np

chessboard = np.zeros((10,10), dtype=int)
chessboard[1::2,::2]=1
chessboard[::2,1::2]=1
print(chessboard)




[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


In [None]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [None]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


shape (3, 4) ndim 2 size 12 itemsize 8 total bytes 96


In [11]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
import numpy as np
a= np.zeros((1000,1000), dtype=np.float64)
print(a.nbytes)
print(a.nbytes/1024 ** 2)


8000000
7.62939453125


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [None]:
a = np.arange(1,26).reshape(5,5)
print(a[:, 0])
print(a[::2, ::2])
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


In [None]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`
import numpy as np


a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])


a[[0, -1]] = a[[-1, 0]]

print(a)


[[7 8 9]
 [4 5 6]
 [1 2 3]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [None]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


In [12]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
import numpy as np


a = np.arange(24).reshape((2, 3, 4))
print("Original 3D array:\n", a)


ravel_arr = a.ravel()
print("\nFlattened with ravel():\n", ravel_arr)


flatten_arr = a.flatten()
print("\nFlattened with flatten():\n", flatten_arr)

a[0, 0, 0] = 999
print("\nAfter modifying a[0,0,0] = 999:")

print("\nravel() result reflects change (view):\n", ravel_arr)
print("\nflatten() result does NOT reflect change (copy):\n", flatten_arr)


Original 3D array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Flattened with ravel():
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

Flattened with flatten():
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

After modifying a[0,0,0] = 999:

ravel() result reflects change (view):
 [999   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23]

flatten() result does NOT reflect change (copy):
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [None]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


In [None]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
import numpy as np


xy = np.array([1, 2, 3, 4, 5, 6])  # or a 2D array

halves = np.array_split(xy, 2)


first_half, second_half = halves

print("First half:", first_half)
print("Second half:", second_half)


First half: [1 2 3]
Second half: [4 5 6]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [None]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


In [None]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
import numpy as np


degrees = np.array([0, 30, 45, 60, 90])


radians = np.radians(degrees)
sin_values = np.sin(radians)

print("Radians:", radians)
print("Sine values:", sin_values)



Radians: [0.         0.52359878 0.78539816 1.04719755 1.57079633]
Sine values: [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [None]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]]


In [None]:
import numpy as np


nums = np.arange(1, 11)


table = nums[:, np.newaxis] * nums

print("10×10 Multiplication Table:\n", table)

10×10 Multiplication Table:
 [[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [None]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


In [None]:

import numpy as np

data = np.array([[10, 20, 30],
                 [40, 50, 60]])
flattened = data.flatten()
percentiles = np.percentile(flattened, [25, 50, 75])

print("25th percentile:", percentiles[0])
print("50th percentile (median):", percentiles[1])
print("75th percentile:", percentiles[2])



25th percentile: 22.5
50th percentile (median): 35.0
75th percentile: 47.5


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [None]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


In [1]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
import numpy as np
rolls = np.random.randint(1, 7, size=100)
num_sixes = np.count_nonzero(rolls == 6)
proportion = num_sixes / 100

print("Number of 6s:", num_sixes)
print("Proportion of 6s:", proportion)



Number of 6s: 25
Proportion of 6s: 0.25


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [None]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


In [2]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
import numpy as np
from numpy.lib import recfunctions as rfn
data = np.array([(1, 'Alice'), (2, 'Bob')],
                dtype=[('id', 'i4'), ('name', 'U10')])
heights = [160.0, 175.5]
new_data = rfn.append_fields(data, 'height', heights, dtypes='f4', usemask=False)
print(new_data)


[(1, 'Alice', 160. ) (2, 'Bob', 175.5)]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [None]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


A·x ≈ b? True


In [None]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
import numpy as np

A = np.random.random((3,3))
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:\n", A)
print("\nEigenvalues:\n", eigenvalues)
print("\nEigenvectors (columns):\n", eigenvectors)



Matrix A:
 [[0.42110465 0.54670858 0.11460936]
 [0.36456686 0.10849575 0.36574886]
 [0.2078681  0.97005772 0.94404827]]

Eigenvalues:
 [ 1.38489503  0.44081135 -0.35205772]

Eigenvectors (columns):
 [[-0.29932932 -0.76739528  0.45567076]
 [-0.34086223 -0.1579329  -0.74616877]
 [-0.89118735  0.62141908  0.48538266]]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [None]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


In [3]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.
import numpy as np
data = np.random.rand(5, 3)
np.savetxt("data.csv", data, delimiter=",", fmt="%.4f")

loaded_data = np.loadtxt("data.csv", delimiter=",")

print("Original Data:\n", data)
print("\nReloaded Data:\n", loaded_data)



Original Data:
 [[0.7027703  0.64231811 0.93188002]
 [0.30344017 0.3850412  0.94693833]
 [0.5219807  0.7920882  0.00134104]
 [0.70958369 0.85581989 0.20248715]
 [0.16570193 0.54672754 0.50318834]]

Reloaded Data:
 [[0.7028 0.6423 0.9319]
 [0.3034 0.385  0.9469]
 [0.522  0.7921 0.0013]
 [0.7096 0.8558 0.2025]
 [0.1657 0.5467 0.5032]]


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [13]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])
weekdays = dates.astype('datetime64[D]').astype(int) % 7
num_mondays = np.count_nonzero(weekdays == 0)
print("Number of Mondays:", num_mondays)



['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05'] 1 days
Number of Mondays: 13


In [4]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
import numpy as np
dates = np.arange('2024-01-01', '2024-12-31', dtype='datetime64[D]')

weekdays = dates.astype('datetime64[D]').astype(int) % 7
mondays = dates[np.datetime_as_string(dates, unit='D') != '']
num_mondays = np.sum(dates.astype('datetime64[D]').view('int64') % 7 == 0)

print("Number of Mondays:", num_mondays)



Number of Mondays: 52


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [None]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


In [6]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
import numpy as np

arr = np.array([
    [1, 2, np.nan],
    [4, np.nan, 6],
    [7, 8, 9]
], dtype='float')
arr = np.where(np.isnan(arr), np.nanmean(arr, axis=0), arr)

print("After replacing NaNs with column means:\n", arr)



After replacing NaNs with column means:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [7.  8.  9. ]]


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [None]:
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


In [9]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.
months = np.array([d[:7] for d in fitness['date']])
unique_months = np.unique(months)
for m in unique_months:
    total_steps = fitness['step_count'][months == m].sum()
    print(f"Month {m}: Total steps = {total_steps}")

sleep_mood_corr = np.corrcoef(fitness['hours_of_sleep'], fitness['mood'])[0,1]
print("Sleep vs mood correlation:", sleep_mood_corr)

# Weekly summary
import pandas as pd
dates_pd = pd.to_datetime(fitness['date'], format="%d-%m-%Y")
weeks = dates_pd.values.astype('datetime64[W]')
unique_weeks = np.unique(weeks)
for w in unique_weeks:
    avg_steps = fitness['step_count'][weeks == w].mean()
    print(f"Week {w}: Avg steps = {avg_steps}")




Month 01-01-2: Total steps = 299
Month 01-11-2: Total steps = 4435
Month 01-12-2: Total steps = 774
Month 02-01-2: Total steps = 1447
Month 02-11-2: Total steps = 4779
Month 02-12-2: Total steps = 1421
Month 03-01-2: Total steps = 2599
Month 03-11-2: Total steps = 1831
Month 03-12-2: Total steps = 4064
Month 04-01-2: Total steps = 702
Month 04-11-2: Total steps = 2255
Month 04-12-2: Total steps = 2725
Month 05-01-2: Total steps = 133
Month 05-11-2: Total steps = 539
Month 05-12-2: Total steps = 5934
Month 06-01-2: Total steps = 153
Month 06-10-2: Total steps = 5464
Month 06-11-2: Total steps = 5464
Month 06-12-2: Total steps = 1867
Month 07-01-2: Total steps = 500
Month 07-10-2: Total steps = 6041
Month 07-11-2: Total steps = 6041
Month 07-12-2: Total steps = 3721
Month 08-01-2: Total steps = 2127
Month 08-10-2: Total steps = 25
Month 08-11-2: Total steps = 4068
Month 08-12-2: Total steps = 2374
Month 09-01-2: Total steps = 2203
Month 09-10-2: Total steps = 5461
Month 09-11-2: Total st

## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*