# Weather Analysis of a data set of a particular location

Here, the weather dataset is a time-series data set with per-hour information about the weather conditions at a particular location. It records temperature, dew point temperature, relative humidity, wind speed, visibility, pressure and weather description.

The data is available as a csv file. Below is a simple analysis of this data using NumPy and Pandas.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv("weather-data.csv")

In [3]:
display(df)

Unnamed: 0,Date/Time,Temperature (in Degree Celsius),Dew Point Temperature (in Degree Celsius),Relative Humidity (in %),Wind Speed (in km/h),Visibility (in km),Pressure (in kilopascal),Weather Description
0,1/1/2012 0:00,-1.8,-3.9,86,4,8.0,101.24,Fog
1,1/1/2012 1:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2,1/1/2012 2:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
3,1/1/2012 3:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
4,1/1/2012 4:00,-1.5,-3.3,88,7,4.8,101.23,Fog
...,...,...,...,...,...,...,...,...
8779,12/31/2012 19:00,0.1,-2.7,81,30,9.7,100.13,Snow
8780,12/31/2012 20:00,0.2,-2.4,83,24,9.7,100.03,Snow
8781,12/31/2012 21:00,-0.5,-1.5,93,28,4.8,99.95,Snow
8782,12/31/2012 22:00,-0.2,-1.8,89,28,9.7,99.91,Snow


## Finding mean temperature

In [4]:
display(df["Temperature (in Degree Celsius)"])

0      -1.8
1      -1.8
2      -1.8
3      -1.5
4      -1.5
       ... 
8779    0.1
8780    0.2
8781   -0.5
8782   -0.2
8783    0.0
Name: Temperature (in Degree Celsius), Length: 8784, dtype: float64

In [5]:
temperature_arr = df["Temperature (in Degree Celsius)"].to_numpy()

In [6]:
temperature_mean = np.mean(temperature_arr)

In [7]:
print(temperature_mean)

8.798144353369764


## Finding mean dew point temperature

In [8]:
display(df["Dew Point Temperature (in Degree Celsius)"])

0      -3.9
1      -3.7
2      -3.4
3      -3.2
4      -3.3
       ... 
8779   -2.7
8780   -2.4
8781   -1.5
8782   -1.8
8783   -2.1
Name: Dew Point Temperature (in Degree Celsius), Length: 8784, dtype: float64

In [9]:
dew_point_temperature_arr = df["Dew Point Temperature (in Degree Celsius)"].to_numpy()

In [10]:
dew_point_temperature_mean = np.mean(dew_point_temperature_arr)

In [11]:
print(dew_point_temperature_mean)

2.5552937158469944


## Finding mean relative humidity

In [12]:
display(df["Relative Humidity (in %)"])

0       86
1       87
2       89
3       88
4       88
        ..
8779    81
8780    83
8781    93
8782    89
8783    86
Name: Relative Humidity (in %), Length: 8784, dtype: int64

In [13]:
relative_humidity_arr = df["Relative Humidity (in %)"].to_numpy()

In [14]:
relative_humidity_mean = np.mean(relative_humidity_arr)

In [15]:
print(relative_humidity_mean)

67.43169398907104


## Finding mean wind speed

In [16]:
display(df["Wind Speed (in km/h)"])

0        4
1        4
2        7
3        6
4        7
        ..
8779    30
8780    24
8781    28
8782    28
8783    30
Name: Wind Speed (in km/h), Length: 8784, dtype: int64

In [17]:
wind_speed_arr = df["Wind Speed (in km/h)"].to_numpy()

In [18]:
wind_speed_mean = np.mean(wind_speed_arr)

In [19]:
print(wind_speed_mean)

14.94546903460838


## Finding mean visibility

In [20]:
display(df["Visibility (in km)"])

0        8.0
1        8.0
2        4.0
3        4.0
4        4.8
        ... 
8779     9.7
8780     9.7
8781     4.8
8782     9.7
8783    11.3
Name: Visibility (in km), Length: 8784, dtype: float64

In [21]:
visibility_arr = df["Visibility (in km)"].to_numpy();

In [22]:
print(visibility_arr)

[ 8.   8.   4.  ...  4.8  9.7 11.3]


In [23]:
visibility_mean = np.mean(visibility_arr)

In [24]:
print(visibility_mean)

27.664446721311478


## Finding mean pressure

In [25]:
display(df["Pressure (in kilopascal)"])

0       101.24
1       101.24
2       101.26
3       101.27
4       101.23
         ...  
8779    100.13
8780    100.03
8781     99.95
8782     99.91
8783     99.89
Name: Pressure (in kilopascal), Length: 8784, dtype: float64

In [26]:
pressure_arr = df["Pressure (in kilopascal)"].to_numpy()

In [27]:
pressure_mean = np.mean(pressure_arr)

In [28]:
print(pressure_mean)

101.05162340619307


## Handling null values in the dataframe

### Counting the number of individual cells that are null in the entire dataframe)

In [29]:
null_count = df.isnull().sum().sum()

In [30]:
print("There are", null_count, "null values in the dataframe.")

There are 0 null values in the dataframe.


### Counting the number of rows that have at least one null value

In [31]:
null_rows_count = df.isnull().any(axis=1).sum()

In [32]:
print("There are", null_rows_count, "rows that contain at least one null value.")

There are 0 rows that contain at least one null value.


### Counting the number of columns that have at least one null value

In [33]:
null_columns_count = df.isnull().any(axis=0).sum()

In [34]:
print("There are", null_columns_count, "columns that contain at least one null value.")

There are 0 columns that contain at least one null value.


## Displaying all the records where the weather condition is 'Snow'

In [35]:
snow_rows = df[df["Weather Description"] == 'Snow']

In [36]:
display(snow_rows)

Unnamed: 0,Date/Time,Temperature (in Degree Celsius),Dew Point Temperature (in Degree Celsius),Relative Humidity (in %),Wind Speed (in km/h),Visibility (in km),Pressure (in kilopascal),Weather Description
55,1/3/2012 7:00,-14.0,-19.5,63,19,25.0,100.95,Snow
84,1/4/2012 12:00,-13.7,-21.7,51,11,24.1,101.25,Snow
86,1/4/2012 14:00,-11.3,-19.0,53,7,19.3,100.97,Snow
87,1/4/2012 15:00,-10.2,-16.3,61,11,9.7,100.89,Snow
88,1/4/2012 16:00,-9.4,-15.5,61,13,19.3,100.79,Snow
...,...,...,...,...,...,...,...,...
8779,12/31/2012 19:00,0.1,-2.7,81,30,9.7,100.13,Snow
8780,12/31/2012 20:00,0.2,-2.4,83,24,9.7,100.03,Snow
8781,12/31/2012 21:00,-0.5,-1.5,93,28,4.8,99.95,Snow
8782,12/31/2012 22:00,-0.2,-1.8,89,28,9.7,99.91,Snow


In [37]:
print("The number of rows where the weather is 'Snow':", len(snow_rows))

The number of rows where the weather is 'Snow': 390


## Finding all the records where the weather is 'Snow' and the temperature is less than -10 degree celsius
(Humans cannot survive -10 degrees Celsius without clothing and other environmental regulation)

In [38]:
snow_rows_less_than_negative_10_temperature = df[(df["Weather Description"] == "Snow") & (df["Temperature (in Degree Celsius)"] < -10)]

In [39]:
display(snow_rows_less_than_negative_10_temperature)

Unnamed: 0,Date/Time,Temperature (in Degree Celsius),Dew Point Temperature (in Degree Celsius),Relative Humidity (in %),Wind Speed (in km/h),Visibility (in km),Pressure (in kilopascal),Weather Description
55,1/3/2012 7:00,-14.0,-19.5,63,19,25.0,100.95,Snow
84,1/4/2012 12:00,-13.7,-21.7,51,11,24.1,101.25,Snow
86,1/4/2012 14:00,-11.3,-19.0,53,7,19.3,100.97,Snow
87,1/4/2012 15:00,-10.2,-16.3,61,11,9.7,100.89,Snow
123,1/6/2012 3:00,-10.6,-16.0,64,0,9.7,100.76,Snow
124,1/6/2012 4:00,-11.3,-16.1,68,15,3.2,100.7,Snow
125,1/6/2012 5:00,-11.8,-16.0,71,19,2.8,100.61,Snow
126,1/6/2012 6:00,-12.0,-16.2,71,22,4.8,100.58,Snow
127,1/6/2012 7:00,-14.4,-16.3,85,22,2.4,100.52,Snow
128,1/6/2012 8:00,-12.3,-16.2,73,24,11.3,100.51,Snow


In [40]:
print("Number of rows having less than 10 degree celsius temperature and weather description as 'Snow' is:", len(snow_rows_less_than_negative_10_temperature))

Number of rows having less than 10 degree celsius temperature and weather description as 'Snow' is: 36


## Finding minimum and maximum values of temperature

In [41]:
display(df["Temperature (in Degree Celsius)"])

0      -1.8
1      -1.8
2      -1.8
3      -1.5
4      -1.5
       ... 
8779    0.1
8780    0.2
8781   -0.5
8782   -0.2
8783    0.0
Name: Temperature (in Degree Celsius), Length: 8784, dtype: float64

In [42]:
print("Minimum temperature:", df["Temperature (in Degree Celsius)"].min())

Minimum temperature: -23.3


In [43]:
print("Maximum temperature:", df["Temperature (in Degree Celsius)"].max())

Maximum temperature: 33.0


## Finding the variance of relative humidity

In [44]:
variance_relative_humidity = df["Relative Humidity (in %)"].var()

In [45]:
print(variance_relative_humidity)

286.24855019850196


## Finding the standard deviation of pressure

In [46]:
standard_deviation_pressure = df["Pressure (in kilopascal)"].var()

In [47]:
print(standard_deviation_pressure)

0.7123440111838423


## Finding the records in which the wind speed is greater than 70 km/h
(By 75 km/h, the wind is strong enough to cause	slight structural damage and by 90 km/h it can uproot entire trees)

In [48]:
dangerous_wind_speed = df[df["Wind Speed (in km/h)"] > 70]

In [49]:
display(dangerous_wind_speed)

Unnamed: 0,Date/Time,Temperature (in Degree Celsius),Dew Point Temperature (in Degree Celsius),Relative Humidity (in %),Wind Speed (in km/h),Visibility (in km),Pressure (in kilopascal),Weather Description
409,1/18/2012 1:00,3.7,-2.1,66,83,25.0,98.36,Mostly Cloudy


In [50]:
print("Number of records in which wind speed is greater than 70 km/h:", len(dangerous_wind_speed))

Number of records in which wind speed is greater than 70 km/h: 1
