# Week 1 - Day 3 Lab: Data & Matrix Manipulation
In this lab, you'll work with a realistic weather dataset. You'll use **Pandas** to explore and clean the data, and **NumPy** to perform matrix operations.

**Dataset:** `hourly_weather_10_days.csv` (10 days of hourly weather data)

## Step 1: Load the Data
- Use Pandas to load the CSV file
- Display the first few rows
- Check the number of rows and columns

In [None]:
# TODO: Load the data into a DataFrame
import pandas as pd

# Replace the file path if needed
df = pd.read_csv('hourly_weather_10_days.csv')
df.head()

Unnamed: 0,timestamp,temperature_C,humidity_%,wind_speed_kmph,pressure_hPa,visibility_km
0,2023-03-01 00:00:00,16.6,74.4,5.7,1012.5,9.5
1,2023-03-01 01:00:00,16.2,78.5,5.0,1012.1,10.3
2,2023-03-01 02:00:00,15.3,73.3,4.7,,11.1
3,2023-03-01 03:00:00,15.8,72.4,1.3,1005.0,8.9
4,2023-03-01 04:00:00,20.9,70.6,6.8,1016.3,9.8


## Step 2: Basic Exploration
- Check column names and data types
- Display basic statistics using `.describe()`
- Count missing values in each column

In [None]:
# TODO: Explore the DataFrame
print(df.info())
print(df.describe())
print(df.isna().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   timestamp        240 non-null    object 
 1   temperature_C    228 non-null    float64
 2   humidity_%       224 non-null    float64
 3   wind_speed_kmph  226 non-null    float64
 4   pressure_hPa     223 non-null    float64
 5   visibility_km    228 non-null    float64
dtypes: float64(5), object(1)
memory usage: 11.4+ KB
None
       temperature_C  humidity_%  wind_speed_kmph  pressure_hPa  visibility_km
count     228.000000  224.000000       226.000000    223.000000     228.000000
mean       21.315789   66.795982        10.105310   1011.884753       9.989474
std         3.421237    8.190300         3.940668      5.187080       1.022166
min        11.500000   47.800000         1.300000    998.100000       6.800000
25%        18.700000   61.075000         6.625000   1008.900000       9.275

## Step 3: Handle Missing Values
- Drop or fill missing values
- Justify your approach (e.g., fill with mean, forward fill, etc.)

In [None]:
# TODO: Fill missing values
# Example: df['column'] = df['column'].fillna(df['column'].mean())



# Fill in your logic here

df['temperature_C'] = df['temperature_C'].fillna(df['temperature_C'].mean())
df['humidity_%'] = df['humidity_%'].fillna(df['humidity_%'].mean())
df['wind_speed_kmph'] = df['wind_speed_kmph'].fillna(df['wind_speed_kmph'].mean())
df['pressure_hPa'] = df['pressure_hPa'].fillna(df['pressure_hPa'].mean())
df['visibility_km'] = df['visibility_km'].fillna(df['visibility_km'].mean())

print(df.isna().sum())



timestamp          0
temperature_C      0
humidity_%         0
wind_speed_kmph    0
pressure_hPa       0
visibility_km      0
dtype: int64


## Step 4: Data Analysis
- Calculate daily average temperature
- Find max, min, mean for each metric
- Which hour of the day is the most humid on average?

In [None]:
# TODO: Perform analysis
# Use groupby, aggregation, and filtering functions
# Placeholder example:
# df['timestamp'] = pd.to_datetime(df['timestamp'])
# df['hour'] = df['timestamp'].dt.hour
# avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()

df['timestamp'] = pd.to_datetime(df['timestamp'])  # convert to datetime
df['date'] = df['timestamp'].dt.date               # extract date for daily grouping
df['hour'] = df['timestamp'].dt.hour               # extract hour for hourly analysis

daily_avg_temp = df.groupby('date')['temperature_C'].mean()


metrics = ['temperature_C', 'humidity_%', 'wind_speed_kmph', 'pressure_hPa', 'visibility_km']

resul_metrics = df[metrics].agg(['max', 'min', 'mean'])


avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()
most_humid_hour = avg_humidity_by_hour.idxmax()  # gives hour with highest average humidity
most_humid_value = avg_humidity_by_hour.max()    # value of max humidity



print("Daily Average Temperature:\n", daily_avg_temp)
print("\nFind max, min, mean for each metric:\n", resul_metrics)
print(f"\nMost humid hour on average: {most_humid_hour}:00 with {most_humid_value:.2f}% humidity")


Daily Average Temperature:
 date
2023-03-01    21.263158
2023-03-02    21.258991
2023-03-03    21.304825
2023-03-04    21.425658
2023-03-05    21.529825
2023-03-06    21.858333
2023-03-07    21.179825
2023-03-08    20.947807
2023-03-09    20.792325
2023-03-10    21.597149
Name: temperature_C, dtype: float64

Find max, min, mean for each metric:
       temperature_C  humidity_%  wind_speed_kmph  pressure_hPa  visibility_km
max       28.700000   88.100000         17.80000   1027.000000      12.600000
min       11.500000   47.800000          1.30000    998.100000       6.800000
mean      21.315789   66.795982         10.10531   1011.884753       9.989474

Most humid hour on average: 1:00 with 78.42% humidity


## Step 5: NumPy Matrix Exercises
Convert relevant DataFrame columns into NumPy arrays and perform matrix operations.

In [None]:
# TODO: Extract temperature and wind_speed as NumPy arrays
import numpy as np

temp = df['temperature_C'].to_numpy()
wind = df['wind_speed_kmph'].to_numpy()

### a) Reshape into matrix form
- Assume each row is a day
- Reshape temperature into a (10, 24) matrix
- Calculate daily min, max, and mean using axis-based operations

In [None]:
# TODO: Reshape and aggregate
# Hint: temp_matrix = temp.reshape((10, 24))
# Write functions to find min, max, mean across rows

temp_matrix = temp.reshape((10, 24)) # 10 days × 24 hours/day.

def daily_min(matrix):
    return np.min(matrix, axis=1)

def daily_max(matrix):
    return np.max(matrix, axis=1)

def daily_mean(matrix):
    return np.mean(matrix, axis=1)


daily_min_temp = daily_min(temp_matrix)
daily_max_temp = daily_max(temp_matrix)
daily_mean_temp = daily_mean(temp_matrix)

print("Daily Min Temperature:\n", daily_min_temp)
print("\nDaily Max Temperature:\n", daily_max_temp)
print("\nDaily Mean Temperature:\n", daily_mean_temp)

Daily Min Temperature:
 [14.7 15.7 13.6 15.9 12.4 15.5 15.3 13.5 14.3 11.5]

Daily Max Temperature:
 [28.2 28.7 25.7 27.1 24.9 26.2 25.9 26.  27.1 28.5]

Daily Mean Temperature:
 [21.26315789 21.25899123 21.30482456 21.42565789 21.52982456 21.85833333
 21.17982456 20.94780702 20.79232456 21.59714912]


### b) Normalize the temperature matrix
- Subtract the mean and divide by std deviation
- Do it manually using NumPy functions

In [None]:
# TODO: Normalize temp_matrix
# Placeholder for function: def normalize(matrix):
# return ...

# Apply it to temp_matrix

def normalize(matrix):
    mean = np.mean(matrix)
    std = np.std(matrix)
    return (matrix - mean) / std

normalized_temp_matrix = normalize(temp_matrix)

print("Normalized Matrix:\n", normalized_temp_matrix)
print("\nMean of normalized matrix:", np.mean(normalized_temp_matrix))
print("Std Deviation of normalized matrix:", np.std(normalized_temp_matrix))


Normalized Matrix:
 [[-1.4173072  -1.53752522 -1.80801577 -1.65774325 -0.12496347 -0.15501798
   0.44607213  0.35590862 -0.03479995  2.06901543  0.          1.28759828
   1.22748927  0.62639917  1.88868839  0.74661719  0.20563609  0.71656268
   0.53623565  0.2356906  -0.93643512 -0.57578105 -0.48561753 -1.98834281]
 [-1.62768874 -1.68779775 -1.11676215 -0.39545402  0.20563609 -0.03479995
   0.41601763  0.05536356  0.68650818  2.21928795 -0.24518149  1.13732576
   0.2356906   0.95699873  0.38596312  0.17558158  0.77667169  0.
   0.86683521  0.20563609 -0.42550852 -0.75610808 -1.44736171 -0.99654413]
 [-2.31894237 -1.68779775 -0.8462716  -1.62768874  1.31765279 -0.12496347
  -0.15501798 -0.18507248  0.29579961  1.31765279  1.04716224 -0.03479995
   1.25754378  0.17558158  0.32585411  1.01710774  0.92694422  0.
   0.59634466  0.92694422  0.41601763 -0.12496347 -0.81621709 -1.77796127]
 [-1.3872527  -0.78616259 -1.02659863 -1.59763424  0.05536356  0.2356906
   1.73841587  0.71656268  0.175

### c) Apply custom mask/filter
- Create a mask for wind speed > 15 kmph
- Use it to extract high-wind readings

In [None]:
# TODO: Create boolean mask and filter wind speeds
# mask = wind > 15
# high_wind = wind[mask]

mask = wind > 15
high_wind = wind[mask]

print("High Wind Speeds (>15 km/h):\n", high_wind)
print("\nTotal High-Wind Readings:", len(high_wind))

High Wind Speeds (>15 km/h):
 [17.6 16.  16.5 16.3 16.7 15.8 17.8 15.1 16.3 15.2 17.  15.9 15.6 15.8
 15.4 15.6 16.3 15.3 16.2 16.9 15.3 15.2 15.5 17.4 17.4 15.4 15.4 16.5
 17.  15.7]

Total High-Wind Readings: 30


## Final Challenge: Write Your Own Function
Write a function `daily_summary(matrix)` that takes a NumPy matrix of shape (10, 24) and returns a summary dictionary for each day.

In [None]:
# TODO: Write and test your function

def daily_summary(matrix):
    # Create an empty list to hold daily summaries
    summaries = []

    # Loop through each day's data (each row in the matrix)
    for row in matrix:
        # Calculate daily min, max, and mean
        day_min = np.min(row)
        day_max = np.max(row)
        day_mean = np.mean(row)

        # Create a dictionary with the summary
        summary = {
            'min': day_min,
            'max': day_max,
            'mean': day_mean
        }

        # Append the dictionary to the list
        summaries.append(summary)

    # Return the list of daily summaries
    return summaries

# Example usage:
summaries = daily_summary(temp_matrix)
for i, summary in enumerate(summaries):
    print(f"Day {i+1} Summary: {summary}")



Day 1 Summary: {'min': np.float64(14.7), 'max': np.float64(28.2), 'mean': np.float64(21.263157894736846)}
Day 2 Summary: {'min': np.float64(15.7), 'max': np.float64(28.7), 'mean': np.float64(21.258991228070176)}
Day 3 Summary: {'min': np.float64(13.6), 'max': np.float64(25.7), 'mean': np.float64(21.304824561403507)}
Day 4 Summary: {'min': np.float64(15.9), 'max': np.float64(27.1), 'mean': np.float64(21.425657894736844)}
Day 5 Summary: {'min': np.float64(12.4), 'max': np.float64(24.9), 'mean': np.float64(21.52982456140351)}
Day 6 Summary: {'min': np.float64(15.5), 'max': np.float64(26.2), 'mean': np.float64(21.858333333333334)}
Day 7 Summary: {'min': np.float64(15.3), 'max': np.float64(25.9), 'mean': np.float64(21.17982456140351)}
Day 8 Summary: {'min': np.float64(13.5), 'max': np.float64(26.0), 'mean': np.float64(20.947807017543862)}
Day 9 Summary: {'min': np.float64(14.3), 'max': np.float64(27.1), 'mean': np.float64(20.792324561403507)}
Day 10 Summary: {'min': np.float64(11.5), 'max':

## ✅ Submit your notebook once complete.
- Add comments where necessary