## Project Weather Condition Analysis 

The Weather Dataset is a **time-series data set with per-hour** information about the weather conditions at a particular location. It records Temperature, Dew Point Temperature, Relative Humidity, Wind Speed, Visibility, Pressure, and Conditions.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

### Task

#Provides basics information about the dataframe: rows, columns, types (if automatically detected)
#Shows the first N rows in the data (by default, N=5)
#Find out different weather conditions available
#Find the number of times when the weather was exactly 'Clear'
#Find all the unique Wind Speed values recorded in the dataset
#Find the number of times when the wind speed was exactly 7 km/h
#Get the Weather and Temperature columns from the dataframe
#Get the first 20rows from the dataframe
#What were the first 5 pressure values recorded on Jan 6
#Find all instances when wind speed was above 30 and visibility was 25
#Which were the top 10 hottest temp values and their counts?
#What is the mean temperature recorded by month?
#Extract month name from the date and store in new column named as month
#Get the lists of months in which it rained with thunderstorm

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import calendar 
import datetime


In [44]:
url = 'https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/weather_2012.csv'
df = pd.read_csv("https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/weather_2012.csv",index_col=0, parse_dates=True)
df.head()

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog


In [45]:
#Provides basics information about the dataframe: rows, columns, types (if automatically detected)
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 8784 entries, 2012-01-01 00:00:00 to 2012-12-31 23:00:00
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Temp (C)            8784 non-null   float64
 1   Dew Point Temp (C)  8784 non-null   float64
 2   Rel Hum (%)         8784 non-null   int64  
 3   Wind Spd (km/h)     8784 non-null   int64  
 4   Visibility (km)     8784 non-null   float64
 5   Stn Press (kPa)     8784 non-null   float64
 6   Weather             8784 non-null   object 
dtypes: float64(4), int64(2), object(1)
memory usage: 549.0+ KB


In [46]:
#Shows the first N rows in the data (by default, N=5)
df.head()

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog


In [47]:
#Find out different weather conditions available
df['Weather'].unique()

array(['Fog', 'Freezing Drizzle,Fog', 'Mostly Cloudy', 'Cloudy', 'Rain',
       'Rain Showers', 'Mainly Clear', 'Snow Showers', 'Snow', 'Clear',
       'Freezing Rain,Fog', 'Freezing Rain', 'Freezing Drizzle',
       'Rain,Snow', 'Moderate Snow', 'Freezing Drizzle,Snow',
       'Freezing Rain,Snow Grains', 'Snow,Blowing Snow', 'Freezing Fog',
       'Haze', 'Rain,Fog', 'Drizzle,Fog', 'Drizzle',
       'Freezing Drizzle,Haze', 'Freezing Rain,Haze', 'Snow,Haze',
       'Snow,Fog', 'Snow,Ice Pellets', 'Rain,Haze', 'Thunderstorms,Rain',
       'Thunderstorms,Rain Showers', 'Thunderstorms,Heavy Rain Showers',
       'Thunderstorms,Rain Showers,Fog', 'Thunderstorms',
       'Thunderstorms,Rain,Fog',
       'Thunderstorms,Moderate Rain Showers,Fog', 'Rain Showers,Fog',
       'Rain Showers,Snow Showers', 'Snow Pellets', 'Rain,Snow,Fog',
       'Moderate Rain,Fog', 'Freezing Rain,Ice Pellets,Fog',
       'Drizzle,Ice Pellets,Fog', 'Drizzle,Snow', 'Rain,Ice Pellets',
       'Drizzle,Snow,Fog', 

In [48]:
#Find the number of times when the weather was exactly 'Clear'
#df['Weatjer'].value_counts()
len(df[df.Weather=='Clear'])

1326

In [49]:
#Find the number of times when the weather was exactly 'Clear'
df['Weather'].value_counts()

Mainly Clear                               2106
Mostly Cloudy                              2069
Cloudy                                     1728
Clear                                      1326
Snow                                        390
Rain                                        306
Rain Showers                                188
Fog                                         150
Rain,Fog                                    116
Drizzle,Fog                                  80
Snow Showers                                 60
Drizzle                                      41
Snow,Fog                                     37
Snow,Blowing Snow                            19
Rain,Snow                                    18
Thunderstorms,Rain Showers                   16
Haze                                         16
Drizzle,Snow,Fog                             15
Freezing Rain                                14
Freezing Drizzle,Snow                        11
Freezing Drizzle                        

In [50]:
#another way
df[(df.Weather=='Clear')].count()[1]

1326

In [51]:
#Find all the unique Wind Speed values recorded in the dataset
df['Wind Spd (km/h)'].unique()

array([ 4,  7,  6,  9, 15, 13, 20, 22, 19, 24, 30, 35, 39, 32, 33, 26, 44,
       43, 48, 37, 28, 17, 11,  0, 83, 70, 57, 46, 41, 52, 50, 63, 54,  2],
      dtype=int64)

In [52]:
#Find the number of times when the wind speed was exactly 7 km/h
df['Wind Spd (km/h)'].value_counts() #position 7 = 677

9     830
11    791
13    735
15    719
7     677
17    666
19    616
6     609
20    496
4     474
22    439
24    374
0     309
26    242
28    205
30    161
32    139
33     85
35     53
37     45
39     24
41     22
44     14
43     13
48     13
46     11
52      7
57      5
50      4
2       2
83      1
70      1
63      1
54      1
Name: Wind Spd (km/h), dtype: int64

In [53]:
#Find the number of times when the wind speed was exactly 7 km/h
df['Wind Spd (km/h)'].value_counts()
#or more finest way
df['Wind Spd (km/h)'].value_counts()[7] #indicate the position of the element 
len(df[(df['Wind Spd (km/h)']==7)])
#df[(df['Wind Spd (km/h)']==7)].count()


677

In [54]:
df[(df['Wind Spd (km/h)']==7)].count()

Temp (C)              677
Dew Point Temp (C)    677
Rel Hum (%)           677
Wind Spd (km/h)       677
Visibility (km)       677
Stn Press (kPa)       677
Weather               677
dtype: int64

In [55]:
df['Wind Spd (km/h)'].value_counts()[7]

677

In [56]:
#Get the Weather and Temperature columns from the dataframe
df[['Weather','Temp (C)']].head()

Unnamed: 0_level_0,Weather,Temp (C)
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-01 00:00:00,Fog,-1.8
2012-01-01 01:00:00,Fog,-1.8
2012-01-01 02:00:00,"Freezing Drizzle,Fog",-1.8
2012-01-01 03:00:00,"Freezing Drizzle,Fog",-1.5
2012-01-01 04:00:00,Fog,-1.5


In [57]:
#Get the first 20rows from the dataframe
df[0:20]

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog
2012-01-01 05:00:00,-1.4,-3.3,87,9,6.4,101.27,Fog
2012-01-01 06:00:00,-1.5,-3.1,89,7,6.4,101.29,Fog
2012-01-01 07:00:00,-1.4,-3.6,85,7,8.0,101.26,Fog
2012-01-01 08:00:00,-1.4,-3.6,85,9,8.0,101.23,Fog
2012-01-01 09:00:00,-1.3,-3.1,88,15,4.0,101.2,Fog


In [58]:
#Get the first 20rows from the dataframe
df[:20]

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog
2012-01-01 05:00:00,-1.4,-3.3,87,9,6.4,101.27,Fog
2012-01-01 06:00:00,-1.5,-3.1,89,7,6.4,101.29,Fog
2012-01-01 07:00:00,-1.4,-3.6,85,7,8.0,101.26,Fog
2012-01-01 08:00:00,-1.4,-3.6,85,9,8.0,101.23,Fog
2012-01-01 09:00:00,-1.3,-3.1,88,15,4.0,101.2,Fog


In [59]:
#Get the first 20rows from the dataframe
df.head(20)

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog"
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog"
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog
2012-01-01 05:00:00,-1.4,-3.3,87,9,6.4,101.27,Fog
2012-01-01 06:00:00,-1.5,-3.1,89,7,6.4,101.29,Fog
2012-01-01 07:00:00,-1.4,-3.6,85,7,8.0,101.26,Fog
2012-01-01 08:00:00,-1.4,-3.6,85,9,8.0,101.23,Fog
2012-01-01 09:00:00,-1.3,-3.1,88,15,4.0,101.2,Fog


In [63]:
#What were the first 5 pressure values recorded on Jan 6
#Check if that date exist / index
fecha = pd.to_datetime('2012-01-06')

if fecha in df.index:
    print("Date Exist Index")
else:
    print("Date Not Exist Index")

Date Exist Index


In [64]:
#What were the first 5 pressure values recorded on Jan 6
df.loc['2012-01-06', 'Stn Press (kPa)'][:5]

Date/Time
2012-01-06 00:00:00    100.81
2012-01-06 01:00:00    100.81
2012-01-06 02:00:00    100.84
2012-01-06 03:00:00    100.76
2012-01-06 04:00:00    100.70
Name: Stn Press (kPa), dtype: float64

In [65]:
#Find all instances when wind speed was above 30 and visibility was 25
df[(df['Wind Spd (km/h)'] > 30) & (df['Visibility (km)']==25)]

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-02 00:00:00,5.2,1.5,77,35,25.0,99.26,Rain Showers
2012-01-02 01:00:00,4.6,0.0,72,39,25.0,99.26,Cloudy
2012-01-02 02:00:00,3.9,-0.9,71,32,25.0,99.26,Mostly Cloudy
2012-01-02 03:00:00,3.7,-1.5,69,33,25.0,99.30,Mostly Cloudy
2012-01-02 04:00:00,2.9,-2.3,69,32,25.0,99.26,Mostly Cloudy
...,...,...,...,...,...,...,...
2012-12-16 07:00:00,-9.8,-14.9,66,32,25.0,101.89,Cloudy
2012-12-21 18:00:00,1.4,0.3,92,46,25.0,97.56,Rain
2012-12-23 01:00:00,-7.9,-11.8,74,32,25.0,100.27,Snow Showers
2012-12-26 19:00:00,-9.6,-13.1,76,39,25.0,102.03,Cloudy


In [66]:
#Which were the top 10 hottest temp values and their counts?
df['Temp (C)'].value_counts().sort_values(ascending=False).head(10)

16.6    65
1.1     58
0.8     47
1.5     45
19.3    44
21.1    43
2.6     43
0.4     41
1.3     40
14.6    39
Name: Temp (C), dtype: int64

In [67]:
#or
new_df = df['Temp (C)'].value_counts().sort_values(ascending=False)
new_df.iloc[:10]

16.6    65
1.1     58
0.8     47
1.5     45
19.3    44
21.1    43
2.6     43
0.4     41
1.3     40
14.6    39
Name: Temp (C), dtype: int64

In [69]:
#What is the mean temperature recorded by month?
#first solution
mean_temperatures = {}

for month in range(1,13):
    mean_temperatures[month] = df.loc[df.index.month == month, 'Temp (C)'].mean()
    
pd.Series(mean_temperatures)

1     -7.371505
2     -4.225000
3      3.121237
4      7.009306
5     16.237769
6     20.134028
7     22.790054
8     22.279301
9     16.484444
10    10.954973
11     0.931389
12    -3.306317
dtype: float64

In [70]:
#or using pivot 
mean_temperature_df = df.pivot_table(values='Temp (C)', index=df.index.month, aggfunc=np.mean)
mean_temperature_df 

Unnamed: 0_level_0,Temp (C)
Date/Time,Unnamed: 1_level_1
1,-7.371505
2,-4.225
3,3.121237
4,7.009306
5,16.237769
6,20.134028
7,22.790054
8,22.279301
9,16.484444
10,10.954973


In [None]:
#using groupby
mean_temperature_df = df.groupby(df.index.month).agg(np.mean).reset_index()
mean_temperature_df['Temp (C)']

In [75]:
#Extract month name from the date and store in new column named as month
df['Month'] = df.index.month
df['Month'] = df['Month'].apply(lambda x: calendar.month_name[x])
df.head()

Unnamed: 0_level_0,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa),Weather,Month
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24,Fog,January
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24,Fog,January
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26,"Freezing Drizzle,Fog",January
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27,"Freezing Drizzle,Fog",January
2012-01-01 04:00:00,-1.5,-3.3,88,7,4.8,101.23,Fog,January


In [83]:
#Get the lists of months in which it rained with thunderstorm

def checklist(x):
    weatherList = ['rain', 'thunderstorm']
    count = 0
    for substring in weatherList:
        if substring in x:
            count += 1
    if count == 2:
        return True
    else:
        return False

print("2) Get the list of months in which it rained with thunderstorm\n")

rained = df['Weather'].str.lower().apply(checklist)

month_list = df[rained]['Month'].unique()

print(month_list)

2) Get the list of months in which it rained with thunderstorm

['May' 'June' 'July' 'August' 'September']
