<a href="https://colab.research.google.com/github/DonaldNwachukwu/Weather-Dataset-Project-with-Python/blob/main/Weather_dataset_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working on Real Dataset with Python

# (A part of Big Data Analysis)

--------

# The Weather Dataset

--------

Here, The Weather Dataset is a time-series data set with per-hour information about the weather conditions at a particular location. It records Temperature, Dew Point Temperature, Relative Humidity, Wind Speed, Visibility, Pressure, and Conditions.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
data = pd.read_csv('/content/Weather Dataset.csv')

In [None]:
data

Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather
0,1/1/2012 0:00,-1.8,-3.9,86,4,8.0,101.24,Fog
1,1/1/2012 1:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2,1/1/2012 2:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
3,1/1/2012 3:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
4,1/1/2012 4:00,-1.5,-3.3,88,7,4.8,101.23,Fog
...,...,...,...,...,...,...,...,...
8779,12/31/2012 19:00,0.1,-2.7,81,30,9.7,100.13,Snow
8780,12/31/2012 20:00,0.2,-2.4,83,24,9.7,100.03,Snow
8781,12/31/2012 21:00,-0.5,-1.5,93,28,4.8,99.95,Snow
8782,12/31/2012 22:00,-0.2,-1.8,89,28,9.7,99.91,Snow


**Date/Time** – The recorded date and time of the weather observation (timestamp of when the measurement was taken).

**Temp_C** – The air temperature measured in degrees Celsius (°C).

**Dew Point Temp_C** – The dew point temperature in °C, i.e., the temperature at which air becomes saturated with moisture and dew begins to form.

**Rel Hum_%** – Relative humidity expressed as a percentage (%), showing how much water vapor is in the air compared to the maximum it could hold at that temperature.

**Wind Speed_km/h** – The speed of the wind measured in kilometers per hour (km/h).

**Visibility_km** – The horizontal distance (in kilometers) one can clearly see, usually affected by fog, rain, snow, or other obstructions.

**Press_kPa** – Atmospheric (air) pressure measured in kilopascals (kPa).

**Weather** – A descriptive text of the weather conditions at that time (e.g., “Clear,” “Rain,” “Snow,” “Foggy,” “Cloudy”).

---------

# Analyzing the DataFrame

In [None]:
data.head(100)

Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather
0,1/1/2012 0:00,-1.8,-3.9,86,4,8.0,101.24,Fog
1,1/1/2012 1:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2,1/1/2012 2:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
3,1/1/2012 3:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
4,1/1/2012 4:00,-1.5,-3.3,88,7,4.8,101.23,Fog
...,...,...,...,...,...,...,...,...
95,1/4/2012 23:00,-9.6,-12.6,79,6,9.7,100.42,Snow
96,1/5/2012 0:00,-8.8,-11.7,79,4,9.7,100.32,Snow
97,1/5/2012 1:00,-7.5,-10.2,81,0,9.7,100.29,Snow
98,1/5/2012 2:00,-5.4,-8.3,80,9,8.0,100.28,Snow


In [None]:
data.shape

(8784, 8)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8784 entries, 0 to 8783
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Date/Time         8784 non-null   object 
 1   Temp_C            8784 non-null   float64
 2   Dew Point Temp_C  8784 non-null   float64
 3   Rel Hum_%         8784 non-null   int64  
 4   Wind Speed_km/h   8784 non-null   int64  
 5   Visibility_km     8784 non-null   float64
 6   Press_kPa         8784 non-null   float64
 7   Weather           8784 non-null   object 
dtypes: float64(4), int64(2), object(2)
memory usage: 549.1+ KB


--------

Q3. Find the number of times when the 'Wind Speed was exactly 4 km/h'

In [None]:
(data['Wind Speed_km/h'] == 4).count()

np.int64(8784)

Q4.  Find out all the Null Values in the data

In [None]:
data.isnull().sum()

Unnamed: 0,0
Date/Time,0
Temp_C,0
Dew Point Temp_C,0
Rel Hum_%,0
Wind Speed_km/h,0
Visibility_km,0
Press_kPa,0
Weather,0


Q5. Rename the column name 'Weather' of the dataframe to 'Weather Condition'

In [None]:
data.rename(columns={'Weather':'Weather Condition'}, inplace=True)

In [None]:
data.columns

Index(['Date/Time', 'Temp_C', 'Dew Point Temp_C', 'Rel Hum_%',
       'Wind Speed_km/h', 'Visibility_km', 'Press_kPa', 'Weather Condition'],
      dtype='object')

Q6. What is the mean 'Visibility'

In [None]:
data["Visibility_km"].mean()

np.float64(27.664446721311478)

Q7. What is the Standard Deviation of 'Pressure' in this data

In [None]:
data['Press_kPa'].std()

0.8440047459486474

Q8. Whats is the Variance of 'Relative Humidity' in this data

In [None]:
data['Rel Hum_%'].var()

286.2485501984998

Q9. Find all instances when 'Snow' was recorded


In [None]:
data[data["Weather Condition"].str.contains("Snow", case=False, na=False)]

Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather Condition
41,1/2/2012 17:00,-2.1,-9.5,57,22,25.0,99.66,Snow Showers
44,1/2/2012 20:00,-5.6,-13.4,54,24,25.0,100.07,Snow Showers
45,1/2/2012 21:00,-5.8,-12.8,58,26,25.0,100.15,Snow Showers
47,1/2/2012 23:00,-7.4,-14.1,59,17,19.3,100.27,Snow Showers
48,1/3/2012 0:00,-9.0,-16.0,57,28,25.0,100.35,Snow Showers
...,...,...,...,...,...,...,...,...
8779,12/31/2012 19:00,0.1,-2.7,81,30,9.7,100.13,Snow
8780,12/31/2012 20:00,0.2,-2.4,83,24,9.7,100.03,Snow
8781,12/31/2012 21:00,-0.5,-1.5,93,28,4.8,99.95,Snow
8782,12/31/2012 22:00,-0.2,-1.8,89,28,9.7,99.91,Snow


Q10.  Find all instances when 'Wind Speed is above 24' and 'Visibility is 25'.

In [None]:
(data['Wind Speed_km/h'] > 24) & (data['Visibility_km'] == 25)

Unnamed: 0,0
0,False
1,False
2,False
3,False
4,False
...,...
8779,False
8780,False
8781,False
8782,False


Q11.  What is the Mean value of each column against each 'Weather Conditon'

In [None]:
data.groupby("Weather Condition").mean(numeric_only=True)

Unnamed: 0_level_0,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa
Weather Condition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Clear,6.825716,0.089367,64.497738,10.557315,30.153243,101.587443
Cloudy,7.970544,2.37581,69.592593,16.127315,26.625752,100.911441
Drizzle,7.353659,5.504878,88.243902,16.097561,17.931707,100.435366
"Drizzle,Fog",8.0675,7.03375,93.275,11.8625,5.2575,100.786625
"Drizzle,Ice Pellets,Fog",0.4,-0.7,92.0,20.0,4.0,100.79
"Drizzle,Snow",1.05,0.15,93.5,14.0,10.5,100.89
"Drizzle,Snow,Fog",0.693333,0.12,95.866667,15.533333,5.513333,99.281333
Fog,4.303333,3.159333,92.286667,7.946667,6.248,101.184067
Freezing Drizzle,-5.657143,-8.0,83.571429,16.571429,9.2,100.202857
"Freezing Drizzle,Fog",-2.533333,-4.183333,88.5,17.0,5.266667,100.441667


Q12. What is the Minimum & Maximum value of each column against each 'Weather Conditon


In [None]:
data.groupby('Weather Condition').agg(['min', 'max'])

Unnamed: 0_level_0,Date/Time,Date/Time,Temp_C,Temp_C,Dew Point Temp_C,Dew Point Temp_C,Rel Hum_%,Rel Hum_%,Wind Speed_km/h,Wind Speed_km/h,Visibility_km,Visibility_km,Press_kPa,Press_kPa
Unnamed: 0_level_1,min,max,min,max,min,max,min,max,min,max,min,max,min,max
Weather Condition,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
Clear,1/11/2012 1:00,9/9/2012 5:00,-23.3,32.8,-28.5,20.4,20,99,0,33,11.3,48.3,99.52,103.63
Cloudy,1/1/2012 17:00,9/9/2012 23:00,-21.4,30.5,-26.8,22.6,18,99,0,54,11.3,48.3,98.39,103.65
Drizzle,1/23/2012 21:00,9/30/2012 3:00,1.1,18.8,-0.2,17.7,74,96,0,30,6.4,25.0,97.84,101.56
"Drizzle,Fog",1/23/2012 20:00,9/30/2012 2:00,0.0,19.9,-1.6,19.1,85,100,0,28,1.0,9.7,98.65,102.07
"Drizzle,Ice Pellets,Fog",12/17/2012 9:00,12/17/2012 9:00,0.4,0.4,-0.7,-0.7,92,92,20,20,4.0,4.0,100.79,100.79
"Drizzle,Snow",12/17/2012 15:00,12/19/2012 18:00,0.9,1.2,0.1,0.2,92,95,9,19,9.7,11.3,100.63,101.15
"Drizzle,Snow,Fog",12/18/2012 21:00,12/22/2012 3:00,0.3,1.1,-0.1,0.6,92,98,7,32,2.4,9.7,97.79,100.15
Fog,1/1/2012 0:00,9/22/2012 0:00,-16.0,20.8,-17.2,19.6,80,100,0,22,0.2,9.7,98.31,103.04
Freezing Drizzle,1/13/2012 10:00,2/1/2012 5:00,-9.0,-2.3,-12.2,-3.3,78,93,6,26,4.8,12.9,98.44,101.02
"Freezing Drizzle,Fog",1/1/2012 2:00,12/10/2012 5:00,-6.4,-0.3,-9.0,-2.3,82,94,6,33,3.6,8.0,98.74,101.27


Q13. Show all the Records where Weather Condition is Fog

In [None]:
data[data['Weather Condition'] == 'Fog']

Unnamed: 0,Date/Time,Temp_C,Dew Point Temp_C,Rel Hum_%,Wind Speed_km/h,Visibility_km,Press_kPa,Weather Condition
0,1/1/2012 0:00,-1.8,-3.9,86,4,8.0,101.24,Fog
1,1/1/2012 1:00,-1.8,-3.7,87,4,8.0,101.24,Fog
4,1/1/2012 4:00,-1.5,-3.3,88,7,4.8,101.23,Fog
5,1/1/2012 5:00,-1.4,-3.3,87,9,6.4,101.27,Fog
6,1/1/2012 6:00,-1.5,-3.1,89,7,6.4,101.29,Fog
...,...,...,...,...,...,...,...,...
8716,12/29/2012 4:00,-16.0,-17.2,90,6,9.7,101.25,Fog
8717,12/29/2012 5:00,-14.8,-15.9,91,4,6.4,101.25,Fog
8718,12/29/2012 6:00,-13.8,-15.3,88,4,9.7,101.25,Fog
8719,12/29/2012 7:00,-14.8,-16.4,88,7,8.0,101.22,Fog


Q14. Find all instances when 'Weather is Clear' or 'Visibility is above 40'

In [None]:
(data['Weather Condition'] == "Clear") | (data['Visibility_km'] > 40)

Unnamed: 0,0
0,False
1,False
2,False
3,False
4,False
...,...
8779,False
8780,False
8781,False
8782,False


Q15.  'Weather is Clear' and 'Relative Humidity is greater than 50'

In [None]:
(data['Weather Condition'] == "Clear") & (data['Rel Hum_%'] > 50)

Unnamed: 0,0
0,False
1,False
2,False
3,False
4,False
...,...
8779,False
8780,False
8781,False
8782,False


--------