# Weather Dataset Analysis

This notebook analyzes time-series weather data containing per-hour information about:
- Temperature
- Dew Point Temperature
- Relative Humidity
- Wind Speed 
- Visibility
- Pressure
- Weather Conditions

The analysis will explore various aspects of the dataset using Pandas.

In [45]:
import pandas as pd

# Load the dataset with proper error handling
try:
    weather_df = pd.read_csv("Weather Dataset.csv")
    print("Dataset loaded successfully")
except FileNotFoundError:
    print("Error: File not found. Please check the file path.")
except Exception as e:
    print(f"An error occurred while loading the dataset: {str(e)}")

Dataset loaded successfully


## Initial Data Exploration
Let's examine the basic structure and characteristics of our dataset.

In [46]:
print("\n=== First 5 Rows ===")
display(weather_df.head())


=== First 5 Rows ===


Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather
0,1/1/2012 0:00,-1.8,-3.9,86,4,8.0,101.24,Fog
1,1/1/2012 1:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2,1/1/2012 2:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
3,1/1/2012 3:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
4,1/1/2012 4:00,-1.5,-3.3,88,7,4.8,101.23,Fog


In [47]:
print("\n=== DataFrame Shape ===")
print(f"Rows: {weather_df.shape[0]}, Columns: {weather_df.shape[1]}")


=== DataFrame Shape ===
Rows: 8784, Columns: 8


In [48]:
print("\n=== Column Information ===")
print(weather_df.columns.tolist())


=== Column Information ===
['Date/Time', 'Temp_C', 'Dew Point Temp_C', 'Rel Hum_%', 'Wind Speed_km/h', 'Visibility_km', 'Press_kPa', 'Weather']


In [49]:
print("\n=== Data Types ===")
print(weather_df.dtypes)


=== Data Types ===
Date/Time            object
Temp_C              float64
Dew Point Temp_C    float64
Rel Hum_%             int64
Wind Speed_km/h       int64
Visibility_km       float64
Press_kPa           float64
Weather              object
dtype: object


In [50]:
print("\n=== Basic Statistics ===")
display(weather_df.describe(include='all'))


=== Basic Statistics ===


Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather
count,8784,8784.0,8784.0,8784.0,8784.0,8784.0,8784.0,8784
unique,8784,,,,,,,50
top,1/1/2012 0:00,,,,,,,Mainly Clear
freq,1,,,,,,,2106
mean,,8.798144,2.555294,67.431694,14.945469,27.664447,101.051623,
std,,11.687883,10.883072,16.918881,8.688696,12.622688,0.844005,
min,,-23.3,-28.5,18.0,0.0,0.2,97.52,
25%,,0.1,-5.9,56.0,9.0,24.1,100.56,
50%,,9.3,3.3,68.0,13.0,25.0,101.07,
75%,,18.8,11.8,81.0,20.0,25.0,101.59,


In [51]:
print("\n=== Missing Values Summary ===")
print(weather_df.isnull().sum())


=== Missing Values Summary ===
Date/Time           0
Temp_C              0
Dew Point Temp_C    0
Rel Hum_%           0
Wind Speed_km/h     0
Visibility_km       0
Press_kPa           0
Weather             0
dtype: int64


## Data Cleaning and Preparation

In [52]:
# Rename columns for consistency
weather_df.rename(columns={
    'Weather': 'Weather_Condition',
    'Wind Speed_km/h': 'Wind_Speed_kmh',
    'Visibility_km': 'Visibility_km',
    'Press_kPa': 'Pressure_kPa',
    'Rel Hum_%': 'Relative_Humidity_pct'
}, inplace=True)

In [53]:
# Verify changes
print("Updated columns:", weather_df.columns.tolist())

Updated columns: ['Date/Time', 'Temp_C', 'Dew Point Temp_C', 'Relative_Humidity_pct', 'Wind_Speed_kmh', 'Visibility_km', 'Pressure_kPa', 'Weather_Condition']


## Analysis Questions

In [54]:
# Q1: Unique wind speeds
unique_wind_speeds = weather_df['Wind_Speed_kmh'].unique()
print("\n1. Unique wind speeds:", sorted(unique_wind_speeds))


1. Unique wind speeds: [0, 2, 4, 6, 7, 9, 11, 13, 15, 17, 19, 20, 22, 24, 26, 28, 30, 32, 33, 35, 37, 39, 41, 43, 44, 46, 48, 50, 52, 54, 57, 63, 70, 83]


In [55]:
# Q2: Count of clear weather conditions
clear_weather_count = weather_df[weather_df['Weather_Condition'] == 'Clear'].shape[0]
print(f"\n2. Clear weather occurrences: {clear_weather_count}")


2. Clear weather occurrences: 1326


In [56]:
# Q3: Wind speed exactly 4 km/h
wind_speed_4_count = weather_df[weather_df['Wind_Speed_kmh'] == 4].shape[0]
print(f"\n3. Wind speed = 4 km/h occurrences: {wind_speed_4_count}")
    
# Q4: Null values check (already shown in basic info)


3. Wind speed = 4 km/h occurrences: 474


In [57]:
# Q6-8: Statistical measures
print("\n6. Mean Visibility:", weather_df['Visibility_km'].mean())
print("7. Pressure Standard Deviation:", weather_df['Pressure_kPa'].std())
print("8. Relative Humidity Variance:", weather_df['Relative_Humidity_pct'].var())


6. Mean Visibility: 27.664446721311478
7. Pressure Standard Deviation: 0.8440047459486474
8. Relative Humidity Variance: 286.2485501984998


In [58]:
# Q9: Snow records
snow_records = weather_df[weather_df['Weather_Condition'].str.contains('Snow', case=False, na=False)]
print(f"\n9. Snow records found: {snow_records.shape[0]}")


9. Snow records found: 583


In [59]:
# Q10: High wind speed and good visibility
high_wind_good_vis = weather_df[(weather_df['Wind_Speed_kmh'] > 24) & 
                           (weather_df['Visibility_km'] == 25)]
print("\n10. Records with wind >24 km/h and visibility 25 km:")
display(high_wind_good_vis)


10. Records with wind >24 km/h and visibility 25 km:


Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Relative_Humidity_pct,Wind_Speed_kmh,Visibility_km,Pressure_kPa,Weather_Condition
23,1/1/2012 23:00,5.3,2.0,79,30,25.0,99.31,Cloudy
24,1/2/2012 0:00,5.2,1.5,77,35,25.0,99.26,Rain Showers
25,1/2/2012 1:00,4.6,0.0,72,39,25.0,99.26,Cloudy
26,1/2/2012 2:00,3.9,-0.9,71,32,25.0,99.26,Mostly Cloudy
27,1/2/2012 3:00,3.7,-1.5,69,33,25.0,99.30,Mostly Cloudy
...,...,...,...,...,...,...,...,...
8705,12/28/2012 17:00,-8.6,-12.0,76,26,25.0,101.34,Mainly Clear
8753,12/30/2012 17:00,-12.1,-15.8,74,28,25.0,101.26,Mainly Clear
8755,12/30/2012 19:00,-13.4,-16.5,77,26,25.0,101.47,Mainly Clear
8759,12/30/2012 23:00,-12.1,-15.1,78,28,25.0,101.52,Mostly Cloudy


In [60]:
# Q11: Mean by weather condition
print("\n11. Mean values by weather condition:")
display(weather_df.groupby('Weather_Condition').mean(numeric_only=True))


11. Mean values by weather condition:


Unnamed: 0_level_0,Temp_C,Dew Point Temp_C,Relative_Humidity_pct,Wind_Speed_kmh,Visibility_km,Pressure_kPa
Weather_Condition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Clear,6.825716,0.089367,64.497738,10.557315,30.153243,101.587443
Cloudy,7.970544,2.37581,69.592593,16.127315,26.625752,100.911441
Drizzle,7.353659,5.504878,88.243902,16.097561,17.931707,100.435366
"Drizzle,Fog",8.0675,7.03375,93.275,11.8625,5.2575,100.786625
"Drizzle,Ice Pellets,Fog",0.4,-0.7,92.0,20.0,4.0,100.79
"Drizzle,Snow",1.05,0.15,93.5,14.0,10.5,100.89
"Drizzle,Snow,Fog",0.693333,0.12,95.866667,15.533333,5.513333,99.281333
Fog,4.303333,3.159333,92.286667,7.946667,6.248,101.184067
Freezing Drizzle,-5.657143,-8.0,83.571429,16.571429,9.2,100.202857
"Freezing Drizzle,Fog",-2.533333,-4.183333,88.5,17.0,5.266667,100.441667


In [61]:
# Q12: Min/Max by weather condition
print("\n12a. Minimum values by weather condition:")
display(weather_df.groupby('Weather_Condition').min(numeric_only=True))
print("\n12b. Maximum values by weather condition:")
display(weather_df.groupby('Weather_Condition').max(numeric_only=True))


12a. Minimum values by weather condition:


Unnamed: 0_level_0,Temp_C,Dew Point Temp_C,Relative_Humidity_pct,Wind_Speed_kmh,Visibility_km,Pressure_kPa
Weather_Condition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Clear,-23.3,-28.5,20,0,11.3,99.52
Cloudy,-21.4,-26.8,18,0,11.3,98.39
Drizzle,1.1,-0.2,74,0,6.4,97.84
"Drizzle,Fog",0.0,-1.6,85,0,1.0,98.65
"Drizzle,Ice Pellets,Fog",0.4,-0.7,92,20,4.0,100.79
"Drizzle,Snow",0.9,0.1,92,9,9.7,100.63
"Drizzle,Snow,Fog",0.3,-0.1,92,7,2.4,97.79
Fog,-16.0,-17.2,80,0,0.2,98.31
Freezing Drizzle,-9.0,-12.2,78,6,4.8,98.44
"Freezing Drizzle,Fog",-6.4,-9.0,82,6,3.6,98.74



12b. Maximum values by weather condition:


Unnamed: 0_level_0,Temp_C,Dew Point Temp_C,Relative_Humidity_pct,Wind_Speed_kmh,Visibility_km,Pressure_kPa
Weather_Condition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Clear,32.8,20.4,99,33,48.3,103.63
Cloudy,30.5,22.6,99,54,48.3,103.65
Drizzle,18.8,17.7,96,30,25.0,101.56
"Drizzle,Fog",19.9,19.1,100,28,9.7,102.07
"Drizzle,Ice Pellets,Fog",0.4,-0.7,92,20,4.0,100.79
"Drizzle,Snow",1.2,0.2,95,19,11.3,101.15
"Drizzle,Snow,Fog",1.1,0.6,98,32,9.7,100.15
Fog,20.8,19.6,100,22,9.7,103.04
Freezing Drizzle,-2.3,-3.3,93,26,12.9,101.02
"Freezing Drizzle,Fog",-0.3,-2.3,94,33,8.0,101.27


In [62]:
# Q13: Fog records
fog_records = weather_df[weather_df['Weather_Condition'] == 'Fog']
print(f"\n13. Fog records found: {fog_records.shape[0]}")


13. Fog records found: 150


In [63]:
# Q14: Clear or good visibility
clear_or_good_vis = weather_df[(weather_df['Weather_Condition'] == 'Clear') | 
                          (weather_df['Visibility_km'] > 40)]
print("\n14. Records with clear weather OR visibility >40 km:")
display(clear_or_good_vis.tail(50))


14. Records with clear weather OR visibility >40 km:


Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Relative_Humidity_pct,Wind_Speed_kmh,Visibility_km,Pressure_kPa,Weather_Condition
8387,12/15/2012 11:00,-9.3,-14.9,64,19,48.3,102.74,Mainly Clear
8388,12/15/2012 12:00,-9.1,-15.1,62,19,48.3,102.71,Mainly Clear
8389,12/15/2012 13:00,-8.4,-14.7,60,19,48.3,102.64,Clear
8390,12/15/2012 14:00,-8.0,-14.2,61,13,48.3,102.59,Mainly Clear
8391,12/15/2012 15:00,-7.8,-13.7,63,15,48.3,102.55,Mainly Clear
8392,12/15/2012 16:00,-8.5,-14.8,60,20,48.3,102.54,Mainly Clear
8394,12/15/2012 18:00,-9.1,-15.1,62,17,25.0,102.54,Clear
8396,12/15/2012 20:00,-8.7,-15.1,60,20,25.0,102.5,Clear
8408,12/16/2012 8:00,-9.5,-14.8,65,32,48.3,101.85,Cloudy
8599,12/24/2012 7:00,-11.1,-13.9,80,15,25.0,101.23,Clear


In [64]:
# Q15: Complex condition
complex_condition = weather_df[((weather_df['Weather_Condition'] == 'Clear') & 
                          (weather_df['Relative_Humidity_pct'] > 50)) | 
                         (weather_df['Visibility_km'] > 40)]
print("\n15. Records meeting complex condition:")


15. Records meeting complex condition:
