<a href="https://colab.research.google.com/github/Vineelag2122/Machine-Learning/blob/main/Lab1_AP23110011644.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).



**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [None]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


[1 2 3]
[0.  0.2 0.4 0.6 0.8 1. ]
[[7.5 7.5 7.5]
 [7.5 7.5 7.5]]
dtypes: int32 float64


In [None]:
print(np.zeros((10,10),dtype=np.int64))
print("----------------------------------------")
print(np.ones((10,10)))
print("----------------------------------------")
print(np.arange(1,101).reshape(10,10))
print("----------------------------------------")
print(np.eye(6,dtype=np.int64))
print("----------------------------------------")
print(np.identity(6))
print("----------------------------------------")
print(np.diag([1,2,3,4,5,6]))
print("----------------------------------------")
print(np.empty((3,3)))

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]
----------------------------------------
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
----------------------------------------
[[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  8

In [None]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones
chessboard = np.zeros((10, 10), dtype=np.int64)
chessboard[1::2, ::2] = 1
chessboard[::2, 1::2] = 1
print(chessboard)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [None]:
M = np.arange(12).reshape(3,4)
print(M)
print('-------------------------------------------')
print('shape', M.shape,'\n', 'ndim', M.ndim,'\n', 'size', M.size,'\n','dtype', M.dtype ,'\n', 'itemsize', M.itemsize,'\n' ,'total bytes', M.nbytes)


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
-------------------------------------------
shape (3, 4) 
 ndim 2 
 size 12 
 dtype int64 
 itemsize 8 
 total bytes 96


In [None]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
m=np.arange(1,1000001).reshape(1000,1000)
print(m)
print('-------------------------------------------')
print(m.itemsize)
print('-------------------------------------------')
print(m.nbytes)

[[      1       2       3 ...     998     999    1000]
 [   1001    1002    1003 ...    1998    1999    2000]
 [   2001    2002    2003 ...    2998    2999    3000]
 ...
 [ 997001  997002  997003 ...  997998  997999  998000]
 [ 998001  998002  998003 ...  998998  998999  999000]
 [ 999001  999002  999003 ...  999998  999999 1000000]]
-------------------------------------------
8
-------------------------------------------
8000000


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [None]:
a = np.arange(1,26).reshape(5,5)
print(a)
print('-------------------------------------------')
print(a[:, 0])     # first column
print('-------------------------------------------')
print(a[::2, ::2]) # every 2nd row/col
print('-------------------------------------------')
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
-------------------------------------------
[ 1  6 11 16 21]
-------------------------------------------
[[ 1  3  5]
 [11 13 15]
 [21 23 25]]
-------------------------------------------
multiples of 3: [ 3  6  9 12 15 18 21 24]


In [None]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of a

a = np.arange(1,26).reshape(5,5)
num_rows = a.shape[0]
indices = np.arange(num_rows)
indices[0], indices[num_rows - 1] = indices[num_rows - 1], indices[0]

a_swapped = a[indices]

print("Original array 'a':")
print(a)
print("\nArray 'a' with first and last rows swapped:")
print(a_swapped)

Original array 'a':
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]

Array 'a' with first and last rows swapped:
[[21 22 23 24 25]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [ 1  2  3  4  5]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [None]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


b is modified: [99  1  2  3  4  5  6  7]
b unchanged with copy: [99  1  2  3  4  5  6  7]


In [34]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
import numpy as np

# Create a 3-D array (2x2x3)
arr = np.array([
    [[1, 2, 3], [4, 5, 6]],
    [[7, 8, 9], [10, 11, 12]]
])

print("Original array:\n", arr)
print("Shape:", arr.shape)

# Flatten using ravel() -> returns a view if possible
ravel_arr = arr.ravel()
print("\nFlattened with ravel():\n", ravel_arr)

# Flatten using flatten() -> always returns a copy
flatten_arr = arr.flatten()
print("\nFlattened with flatten():\n", flatten_arr)

# Modify ravel result
ravel_arr[0] = 999
print("\nAfter modifying ravel() result:")
print("Original array:\n", arr)  # Notice change reflected in original
print("ravel() array:\n", ravel_arr)

# Modify flatten result
flatten_arr[1] = 555
print("\nAfter modifying flatten() result:")
print("Original array:\n", arr)  # No change here
print("flatten() array:\n", flatten_arr)


Original array:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
Shape: (2, 2, 3)

Flattened with ravel():
 [ 1  2  3  4  5  6  7  8  9 10 11 12]

Flattened with flatten():
 [ 1  2  3  4  5  6  7  8  9 10 11 12]

After modifying ravel() result:
Original array:
 [[[999   2   3]
  [  4   5   6]]

 [[  7   8   9]
  [ 10  11  12]]]
ravel() array:
 [999   2   3   4   5   6   7   8   9  10  11  12]

After modifying flatten() result:
Original array:
 [[[999   2   3]
  [  4   5   6]]

 [[  7   8   9]
  [ 10  11  12]]]
flatten() array:
 [  1 555   3   4   5   6   7   8   9  10  11  12]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [35]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


union [1 2 3 4 5 6]
intersect [1 2]
sorted descending [6 5 4 3 2 1]


In [36]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])

split_xy = np.array_split(xy, 2)
print("Original concatenated array:", xy)
print("Split array:", split_xy)

Original concatenated array: [1 3 5 2 4 6]
Split array: [array([1, 3, 5]), array([2, 4, 6])]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [37]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


exp [ 1.          2.71828183  7.3890561  20.08553692 54.59815003]
sin [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
vectorised addition [10 11 12 13 14]


In [38]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
degrees = np.array([0, 30, 45, 60, 90])
radians = np.deg2rad(degrees)
sin_values = np.sin(radians)

print("Degrees:", degrees)
print("Radians:", radians)
print("Sin values:", sin_values)

Degrees: [ 0 30 45 60 90]
Radians: [0.         0.52359878 0.78539816 1.04719755 1.57079633]
Sin values: [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [39]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]]


In [40]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.
a = np.arange(1, 11).reshape(10, 1)
b = np.arange(1, 11)
multiplication_table = a * b
print(multiplication_table)

[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [49]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [51]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.
percentiles = np.percentile(data.flatten(), [25, 50, 75])
print("25th, 50th, and 75th percentiles of flattened data:", percentiles)

25th, 50th, and 75th percentiles of flattened data: [30.  58.5 75. ]


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [None]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735] [31 49 39 40 38]


In [52]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
rng = np.random.default_rng() # Using a new random number generator instance
num_rolls = 100
rolls = rng.integers(1, 7, size=num_rolls) # Simulate 100 rolls (integers from 1 to 6 inclusive)

# Count the number of 6s
num_sixes = np.sum(rolls == 6)

# Estimate the proportion of 6s
proportion_sixes = num_sixes / num_rolls

print(f"Number of rolls: {num_rolls}")
print(f"Number of 6s: {num_sixes}")
print(f"Estimated proportion of 6s: {proportion_sixes:.4f}")

Number of rolls: 100
Number of 6s: 19
Estimated proportion of 6s: 0.1900


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [53]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


['Alice' 'Bob'] 27.5


In [55]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).

from numpy.lib.recfunctions import append_fields

# Sample height data (replace with actual data)
height_data = np.array([165.5, 180.0], dtype='f4') # Example heights for Alice and Bob

# Append the 'height' field
people_with_height = append_fields(people, 'height', height_data, usemask=False)

print("Structured array with 'height' field:")
display(people_with_height)

Structured array with 'height' field:


array([('Alice', 25, 55. , 165.5), ('Bob', 30, 85.5, 180. )],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4'), ('height', '<f4')])

## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [56]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


A·x ≈ b? True


In [None]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues of A:")
print(eigenvalues)
# You can also print eigenvectors if needed:
# print("\nEigenvectors of A:")
# print(eigenvectors)

Eigenvalues of A:
[ 1.96546675+0.j         -0.1851633 +0.22605411j -0.1851633 -0.22605411j]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [None]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


loaded equals A? True


In [None]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.
file_path = 'data.csv'
np.savetxt(file_path, data, delimiter=',')
print(f"Data saved to {file_path}")

# Load data from the CSV file
loaded_data = np.loadtxt(file_path, delimiter=',')
print(f"\nData loaded from {file_path}:")
print(loaded_data)

# Verify that the loaded data is the same as the original data
print("\nLoaded data equals original data?", np.allclose(loaded_data, data))

Data saved to data.csv

Data loaded from data.csv:
[[85. 64. 51. 27.]
 [31.  5.  8.  2.]
 [18. 81. 65. 91.]
 [50. 61. 97. 73.]
 [63. 54. 56. 93.]]

Loaded data equals original data? True


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [None]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05'] 1 days


In [None]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
# The dates array was created in the previous cell:
# dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')

# Convert to datetime objects to easily get the day of the week
# Alternatively, you can use the .astype(int) method to get day index
# dayofweek: Monday=0, Sunday=6
mondays_count = np.sum(dates.astype('datetime64[W]').astype(int) == 0)

print(f"The 'dates' array contains {mondays_count} Mondays.")

The 'dates' array contains 0 Mondays.


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [57]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


2.3333333333333335


In [58]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.

# 1. Create a sample array with NaNs
data_with_nan = np.array([[1, 2, np.nan, 4],
                          [5, np.nan, 7, 8],
                          [9, 10, 11, np.nan],
                          [13, 14, 15, 16]])

print("Original array with NaNs:")
print(data_with_nan)

Original array with NaNs:
[[ 1.  2. nan  4.]
 [ 5. nan  7.  8.]
 [ 9. 10. 11. nan]
 [13. 14. 15. 16.]]


In [59]:
# 2. Calculate column means, ignoring NaNs
# Use np.nanmean to compute the mean of each column, ignoring NaN values
column_means = np.nanmean(data_with_nan, axis=0)

print("\nColumn means (ignoring NaNs):")
print(column_means)


Column means (ignoring NaNs):
[ 7.          8.66666667 11.          9.33333333]


In [60]:
# 3. Replace NaNs with column means
# Find where the NaNs are in the array
nan_indices = np.isnan(data_with_nan)

# Replace NaNs with the corresponding column means
# We use boolean indexing to select the NaN values, and then use
# the column_means array indexed by the column index of the NaN.
# The column index for each NaN is given by np.where(nan_indices)[1]
data_filled = data_with_nan.copy() # Create a copy to avoid modifying the original array
data_filled[nan_indices] = column_means[np.where(nan_indices)[1]]

print("\nArray with NaNs replaced by column means:")
print(data_filled)


Array with NaNs replaced by column means:
[[ 1.          2.         11.          4.        ]
 [ 5.          8.66666667  7.          8.        ]
 [ 9.         10.         11.          9.33333333]
 [13.         14.         15.         16.        ]]


4. **Verify the result**: The printed `data_filled` array shows that the NaN values have been replaced with the calculated column means.

5. **Finish task**: The task of replacing NaNs with column means is complete.

## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [62]:
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


columns: ('date', 'step_count', 'mood', 'calories_burned', 'hours_of_sleep', 'bool_of_active', 'weight_kg') rows: 96


In [None]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.


In [66]:
import datetime

# 1. Ensure the 'date' column is datetime64
# Convert the string dates to datetime64 using datetime.strptime and specifying the format
converted_dates = []
# Assuming the format is 'MM-DD-YYYY' based on the previous error
date_format = '%m-%d-%Y'
for d_str in fitness['date']:
    try:
        dt_obj = datetime.datetime.strptime(d_str, date_format)
        converted_dates.append(np.datetime64(dt_obj))
    except ValueError:
        # If 'MM-DD-YYYY' fails, try 'DD-MM-YYYY'
        date_format = '%d-%m-%Y'
        dt_obj = datetime.datetime.strptime(d_str, date_format)
        converted_dates.append(np.datetime64(dt_obj))


fitness['date'] = np.array(converted_dates, dtype='datetime64[D]')

# 2. Extract the month and 3. Group by month
# Create a new array with month and year for grouping
months = fitness['date'].astype('datetime64[M]')

# Use np.unique to get unique months and their indices
unique_months, month_indices = np.unique(months, return_inverse=True)

# 4. Calculate the sum of 'step_count' for each month
monthly_step_count = np.bincount(month_indices, weights=fitness['step_count'])

# 5. Print the monthly step count analysis results
print("Monthly Step Count:")
for i, month in enumerate(unique_months):
    print(f"{month.astype('datetime64[M]').item().strftime('%Y-%m')}: {monthly_step_count[i]:.0f}")

Monthly Step Count:
2017-06: 5464
2017-07: 6041
2017-08: 25
2017-09: 5461
2017-10: 53175
2017-11: 107616
2017-12: 93905
2018-01: 10163


In [67]:
# 1. Extract the 'hours_of_sleep' and 'mood' columns
hours_of_sleep = fitness['hours_of_sleep']
mood = fitness['mood']

# 2. Calculate the Pearson correlation coefficient
# np.corrcoef returns a 2x2 matrix, the off-diagonal elements are the correlation coefficients
correlation_matrix = np.corrcoef(hours_of_sleep, mood)
correlation_coefficient = correlation_matrix[0, 1]

# 3. Print the calculated correlation coefficient
print(f"Correlation coefficient between hours of sleep and mood: {correlation_coefficient:.4f}")

Correlation coefficient between hours of sleep and mood: 0.2104


In [68]:
# 1. Extract the required columns
dates = fitness['date']
step_counts = fitness['step_count']
calories_burned = fitness['calories_burned']
hours_of_sleep = fitness['hours_of_sleep']

# 2. Convert the 'date' column to datetime64[W] to group data by week
weekly_dates = dates.astype('datetime64[W]')

# 3. Use np.unique with return_inverse=True on the weekly dates
unique_weeks, week_indices = np.unique(weekly_dates, return_inverse=True)

# 4. Calculate the sum of 'step_count' and 'calories_burned' for each week
weekly_step_count = np.bincount(week_indices, weights=step_counts)
weekly_calories_burned = np.bincount(week_indices, weights=calories_burned)

# 5. Calculate the average of 'hours_of_sleep' for each week
weekly_sleep_sum = np.bincount(week_indices, weights=hours_of_sleep)
weekly_counts = np.bincount(week_indices) # Count of entries per week
weekly_sleep_avg = weekly_sleep_sum / weekly_counts

# 6. Iterate through the unique weeks and print the weekly summary
print("Weekly Fitness Summary:")
for i, week_start_date in enumerate(unique_weeks):
    # Format the week start date for readability (e.g., YYYY-MM-DD)
    formatted_date = week_start_date.astype(datetime.datetime).strftime('%Y-%m-%d')
    print(f"Week starting {formatted_date}:")
    print(f"  Total Steps: {weekly_step_count[i]:.0f}")
    print(f"  Total Calories Burned: {weekly_calories_burned[i]:.0f}")
    print(f"  Average Hours of Sleep: {weekly_sleep_avg[i]:.2f}\n")


Weekly Fitness Summary:
Week starting 2017-06-08:
  Total Steps: 5464
  Total Calories Burned: 181
  Average Hours of Sleep: 5.00

Week starting 2017-07-06:
  Total Steps: 6041
  Total Calories Burned: 197
  Average Hours of Sleep: 8.00

Week starting 2017-08-10:
  Total Steps: 25
  Total Calories Burned: 0
  Average Hours of Sleep: 5.00

Week starting 2017-09-07:
  Total Steps: 5461
  Total Calories Burned: 174
  Average Hours of Sleep: 4.00

Week starting 2017-10-05:
  Total Steps: 6915
  Total Calories Burned: 223
  Average Hours of Sleep: 5.00

Week starting 2017-10-12:
  Total Steps: 15116
  Total Calories Burned: 482
  Average Hours of Sleep: 6.17

Week starting 2017-10-19:
  Total Steps: 19524
  Total Calories Burned: 621
  Average Hours of Sleep: 5.71

Week starting 2017-10-26:
  Total Steps: 16055
  Total Calories Burned: 517
  Average Hours of Sleep: 6.14

Week starting 2017-11-02:
  Total Steps: 24977
  Total Calories Burned: 811
  Average Hours of Sleep: 3.86

Week starting

## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*