# Exploratory Data Analysis on Weather Data 
This is module of Miloo Workshop : BOOTCAMP CYCLING PREDICTION-ARTIFICIAL INTELLIGENCE. 

This module will give example of how to do analysis on weather data start from install and importing required library, transforming data, and make visualization to help getting insight on weather data.

Please refer to this link for more info regarding the dataset : https://www.kaggle.com/selfishgene/historical-hourly-weather-data

# 1. Install and Import Required Library

For this Exploratory Data Analysis, we use pandas, seaborn, matplotlib, datetime, and haversine

In [None]:
!pip install haversine

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
import datetime as dt

import seaborn as sns 
from haversine import haversine, Unit

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# 2. Transform Data 

For this section, we transform and merge data from many .csv file into 1 DataFrame. We will do 2 process :
1. Expand Time : transform data date into year, month. year, day, hour, and day name using datetime library
2. Get Specific City Data : we will compare 2 city so we need to filter the data to get only specific city data

## 2.1 Define Expand Time Function

In [None]:
def expand_time(input_df,time_col):
    # enrich more time elements
    input_df['datetime'] = pd.to_datetime(input_df[time_col])
    input_df['year'] =  input_df['datetime'].dt.year
    input_df['month'] =  input_df['datetime'].dt.year * 100 + input_df['datetime'].dt.month
    input_df['day'] =  input_df['datetime'].dt.year * 10000 + input_df['datetime'].dt.month * 100 + input_df['datetime'].dt.day
    input_df['hour'] =  input_df['datetime'].dt.hour
    input_df['dayname'] = input_df['datetime'].apply(lambda x: dt.datetime.strftime(x, '%A'))
    
    return input_df

## 2.2 Define Filter Specific City Data Function

In [None]:
def get_one_sity(city, humidity, wind_speed, wind_dir, press, weat, temp):

    # bring all weather elemnts to one city
    
    # humidity
    df_sample = pd.merge(humidity[[city,'day','hour','dayname']],weat[[city,'day','hour']],how='inner', left_on=['day','hour'],right_on=['day','hour'])
    df_sample = df_sample.dropna()
    df_sample.columns = ['humidity','day','hour','dayname','weather']

    # temperature
    df_sample = pd.merge(df_sample,temp[[city,'day','hour','dayname']],how='inner', left_on=['day','hour'],right_on=['day','hour'])
    df_sample = df_sample.dropna()
    df_sample.drop('dayname_y',inplace=True,axis=1)
    df_sample.columns = ['humidity','day','hour','dayname','weather','temperature']
    df_sample['temperature'] = df_sample['temperature']-273.15 # convert from kelvin to celcius 

    # pressure
    df_sample = pd.merge(df_sample,press[[city,'day','hour','dayname']],how='inner', left_on=['day','hour'],right_on=['day','hour'])
    df_sample = df_sample.dropna()
    df_sample.drop('dayname_y',inplace=True,axis=1)
    df_sample.columns = ['humidity','day','hour','dayname','weather','temperature','pressure']

    # wind speed
    df_sample = pd.merge(df_sample,wind_speed[[city,'day','hour','dayname']],how='inner', left_on=['day','hour'],right_on=['day','hour'])
    df_sample = df_sample.dropna()
    df_sample.drop('dayname_y',inplace=True,axis=1)
    df_sample.columns = ['humidity','day','hour','dayname','weather','temperature','pressure', 'wind_speed']

    # wind dir
    df_sample = pd.merge(df_sample,wind_dir[[city,'day','hour','dayname']],how='inner', left_on=['day','hour'],right_on=['day','hour'])
    df_sample = df_sample.dropna()
    df_sample.drop('dayname_y',inplace=True,axis=1)
    df_sample.columns = ['humidity','day','hour','dayname','weather','temperature','pressure', 'wind_speed','wind_dir']

    # rearrange column
    df_sample = df_sample[['day','hour','weather','dayname','humidity','temperature','pressure','wind_speed','wind_dir']]
    
    
    # simplified weather 
    # change weather granularity

    df_sample['weather2'] = df_weat.replace({city: dict_weather})[city]

    return df_sample

## 2.3 Read all CSV files
in this sub-section, we download dataset from github and load it to jupyter notebook

In [None]:
!wget https://raw.githubusercontent.com/Miloo-workshop/weather-prediction/main/data_archive.zip

In [None]:
!unzip data_archive.zip

In [None]:
df_hum = pd.read_csv('archive/humidity.csv')
df_wind_dir = pd.read_csv('archive/wind_direction.csv')
df_wind_sp = pd.read_csv('archive/wind_speed.csv')
df_pres = pd.read_csv('archive/pressure.csv')
df_temp = pd.read_csv('archive/temperature.csv')
df_weat = pd.read_csv('archive/weather_description.csv')
df_city = pd.read_csv('archive/city_attributes.csv')
df_weat_sim = pd.read_excel('archive/weather_category_simplified.xlsx')
df_weat_sim.drop('count',axis=1,inplace=True)

dict_weather = {}

## 2.4 Convert data to dict 

In [None]:
for index, row in df_weat_sim.iterrows():
    dict_weather[row['weather']] = row['weather2']

## 2.5 Expand time column

In [None]:
df_hum = expand_time(df_hum,'datetime')
df_wind_dir = expand_time(df_wind_dir,'datetime')
df_wind_sp = expand_time(df_wind_sp,'datetime')
df_pres = expand_time(df_pres,'datetime')
df_weat = expand_time(df_weat,'datetime')
df_temp = expand_time(df_temp,'datetime')

## 2.6 Filter Selected City : Miami and Vancouver

In [None]:
df_miami = get_one_sity(city='Miami',humidity = df_hum, wind_speed = df_wind_sp, wind_dir = df_wind_dir, press = df_pres, weat = df_weat, temp = df_temp)
df_vancouver = get_one_sity(city='Vancouver',humidity = df_hum, wind_speed = df_wind_sp, wind_dir = df_wind_dir, press = df_pres, weat = df_weat, temp = df_temp)

In [None]:
df_miami.head()

In [None]:
df_miami.shape, df_vancouver.shape

# 3 Exploratory Data Analysis
In this section, we will explore and compare the data in both city (Miami and Vancouver) based on selected feature like Humidity, temperature, wind speed, pressure, etc. We will use various visualization to make the difference easiy understandable

## 3.1 check data consistency 
We will check the completeness hour of daily data in both cities and visualize it to know how many date which have no complete hour data

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
df_miami.groupby(['day']).agg({'hour':'count'}).reset_index()['hour'].hist(bins = 50)

In [None]:
df_miami.groupby(['day']).agg({'hour':'count'}).reset_index()['hour'].value_counts(ascending=False)

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
df_vancouver.groupby(['day']).agg({'hour':'count'}).reset_index()['hour'].hist(bins = 50)

In [None]:
df_vancouver.groupby(['day']).agg({'hour':'count'}).reset_index()['hour'].value_counts().sort_values(ascending=False)

## 3.2 join both date to get balance granularity 

In [None]:
df_inner = pd.merge(df_miami[['day','hour']],df_vancouver[['day','hour']],'inner',left_on=['day','hour'],right_on=['day','hour'])
df_inner.columns = ['inner_day','inner_hour']

df_miami = pd.merge(df_inner, df_miami, 'inner',right_on=['day','hour'],left_on=['inner_day','inner_hour'])
df_miami = df_miami.drop(['day','hour'],axis=1)
df_miami = df_miami.rename(columns={"inner_hour": "hour", "inner_day": "day"})


df_vancouver = pd.merge(df_inner, df_vancouver, 'inner',right_on=['day','hour'],left_on=['inner_day','inner_hour'])
df_vancouver = df_vancouver.drop(['day','hour'],axis=1)
df_vancouver = df_vancouver.rename(columns={"inner_hour": "hour", "inner_day": "day"})

In [None]:
df_inner.shape, df_miami.shape, df_vancouver.shape

## 3.3 Explore & Compare Data : Temperature
We will explore differences between Miami and Vancouver based on temperature data and will show distribution of temperature, daily, and monthly data

In [None]:
df_temp_m = pd.DataFrame()
df_temp_m['temperature'] = df_miami['temperature']
df_temp_m['city'] = 'miami'

df_temp_v = pd.DataFrame()
df_temp_v['temperature'] = df_vancouver['temperature']
df_temp_v['city'] = 'vancouver'

df_temp_mv = pd.concat([df_temp_m,df_temp_v])
df_temp_mv['temperature'] = df_temp_mv['temperature'].astype(float)

In [None]:
df_temp_mv.tail()

### 3.3.1 Compare distribution of temperature data of both cities

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.histplot(data=df_temp_mv, x="temperature", hue='city')

### 3.3.2 Compare average temperature of each month
This sub-section will explore the change of average temperature of each month in every year

In [None]:
# miami

df_miami_temp = df_miami[['temperature','dayname','hour','day']]
df_miami_temp['dateime'] = df_miami_temp['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_miami_temp['month'] = df_miami_temp['dateime'].dt.month_name()
df_miami_temp['year'] = df_miami_temp['dateime'].dt.year

#agg houly
df_miami_temp_hourly = df_miami_temp[['year','month','temperature']].groupby(['year','month']).agg({'temperature':'mean'}).reset_index()
df_miami_temp_hourly_pivot = df_miami_temp_hourly.pivot(index='year', columns='month', values='temperature').fillna(0)

sns.lineplot(data=df_miami_temp_hourly, x="year", y="temperature", hue="month")

In [None]:
# vancouver

df_van_temp = df_vancouver[['temperature','dayname','hour','day']]
df_van_temp['dateime'] = df_van_temp['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_van_temp['month'] = df_van_temp['dateime'].dt.month_name()
df_van_temp['year'] = df_van_temp['dateime'].dt.year

#agg houly
df_van_temp_hourly = df_van_temp[['year','month','temperature']].groupby(['year','month']).agg({'temperature':'mean'}).reset_index()
df_van_temp_hourly_pivot = df_van_temp_hourly.pivot(index='year', columns='month', values='temperature').fillna(0)

sns.lineplot(data=df_van_temp_hourly, x="year", y="temperature", hue="month")

### 3.3.3 Annual trend

In [None]:
df_van_ann_temp = pd.DataFrame()
df_van_ann_temp['year'] = df_vancouver['day'].apply(lambda x:str(x)[:4])
df_van_ann_temp['day'] = df_vancouver['day']
df_van_ann_temp['temperature'] = df_vancouver['temperature']
df_van_ann_temp = df_van_ann_temp[df_van_ann_temp['year']=='2015']
df_van_ann_temp_agg = df_van_ann_temp[['day','temperature']].groupby('day').agg({'temperature':'mean'}).reset_index()
df_van_ann_temp_agg['dateime'] = df_van_ann_temp_agg['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_van_ann_temp_agg['city'] = 'vancouver'

df_mia_ann_temp = pd.DataFrame()
df_mia_ann_temp['year'] = df_miami['day'].apply(lambda x:str(x)[:4])
df_mia_ann_temp['day'] = df_miami['day']
df_mia_ann_temp['temperature'] = df_miami['temperature']
df_mia_ann_temp = df_mia_ann_temp[df_mia_ann_temp['year']=='2015']
df_mia_ann_temp_agg = df_mia_ann_temp[['day','temperature']].groupby('day').agg({'temperature':'mean'}).reset_index()
df_mia_ann_temp_agg['dateime'] = df_mia_ann_temp_agg['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_mia_ann_temp_agg['city'] = 'miami'

df_ann_temp = pd.concat([df_mia_ann_temp_agg,df_van_ann_temp_agg],axis=0)

sns.lineplot(data=df_ann_temp, x="dateime", y="temperature",hue='city')

### 3.3.4 Boxplot to compare temperature's distribution and skewness for both Miami and Vancouver

In [None]:
df_miami_2 = df_miami.drop(['weather','weather2','dayname'],axis=1).melt(id_vars=["day", "hour"], 
        var_name="type", 
        value_name="value")
df_miami_2['city'] = 'miami'

df_vancouver_2 = df_vancouver.drop(['weather','weather2','dayname'],axis=1).melt(id_vars=["day", "hour"], 
        var_name="type", 
        value_name="value")
df_vancouver_2['city'] = 'vancouver'

df_van_miami_2 = pd.concat([df_miami_2, df_vancouver_2])

sns.boxplot(x="type", y="value", hue= 'city', data=df_van_miami_2[df_van_miami_2['type']=='temperature'])

## 3.4 Explore & Compare Data : Weather
We will explore differences between Miami and Vancouver based on weather data, show show proportion of data and weather pattern of each city 

### 3.4.1 weather proportion in both cities 

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower

df_miami['weather2'].value_counts().plot.pie()
# df.plot.pie(y='mass', figsize=(5, 5))

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower

df_vancouver['weather2'].value_counts().plot.pie()

### 3.4.2 Average rainfall pattern in one day for each month
We will measure average of how many hour rain will fall in each month for Miami and Vancouver to find both cities pattern

In [None]:
# miami

df_miami_rain = df_miami[['weather2','dayname','hour','day']]
df_miami_rain['dateime'] = df_miami_rain['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_miami_rain['month'] = df_miami_rain['dateime'].dt.month_name()
df_miami_rain['year'] = df_miami_rain['dateime'].dt.year

#agg houly
df_miami_rain_hourly = df_miami_rain[['year','month','dayname','weather2','day','hour']].groupby(['year','month','dayname','day','weather2']).agg({'hour':'count'}).reset_index()
df_miami_rain_hourly = df_miami_rain_hourly[df_miami_rain_hourly['weather2'] == 'rain']

#agg daily
df_miami_rain_daily = df_miami_rain_hourly[['month','dayname','hour']].groupby(['month','dayname']).agg({'hour':'mean'}).reset_index()

df_miami_rain_daily_pivot = df_miami_rain_daily.pivot(index='month', columns='dayname', values='hour')

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.heatmap(df_miami_rain_daily_pivot, annot=True, cmap="YlGnBu")

In [None]:
# vancouver

df_van_rain = df_vancouver[['weather2','dayname','hour','day']]
df_van_rain['dateime'] = df_van_rain['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_van_rain['month'] = df_van_rain['dateime'].dt.month_name()
df_van_rain['year'] = df_van_rain['dateime'].dt.year

#agg houly
df_van_rain_hourly = df_van_rain[['year','month','dayname','weather2','day','hour']].groupby(['year','month','dayname','day','weather2']).agg({'hour':'count'}).reset_index()
df_van_rain_hourly = df_van_rain_hourly[df_van_rain_hourly['weather2'] == 'rain']

#agg daily
df_van_rain_daily = df_van_rain_hourly[['month','dayname','hour']].groupby(['month','dayname']).agg({'hour':'mean'}).reset_index()

df_van_rain_daily_pivot = df_van_rain_daily.pivot(index='month', columns='dayname', values='hour')

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.heatmap(df_van_rain_daily_pivot, annot=True, cmap="YlGnBu")

### 3.4.3 Weather Comparison
We will compare count of weather in both cities to know occurement of specific weather on both of cities

In [None]:
df_weat_m = pd.DataFrame(df_miami.groupby(['weather2']).size()).reset_index()
df_weat_m.columns = ['weather','count']
df_weat_m['city'] = 'miami'
df_weat_v = pd.DataFrame(df_vancouver.groupby(['weather2']).size()).reset_index()
df_weat_v.columns = ['weather','count']
df_weat_v['city'] = 'vancouver'

df_weat_mv = pd.concat([df_weat_m, df_weat_v])

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower

sns.catplot(
    data=df_weat_mv, kind="bar",
    x="weather", y="count", hue="city",
)

## 3.5 Explore & Compare Data : Humidity
We will explore differences between Miami and Vancouver based on humidity data and show the distribution and trend 

### 3.5.1 Humidity data distribution

In [None]:
df_hum_m = pd.DataFrame()
df_hum_m['humimdity'] = df_miami['humidity']
df_hum_m['city'] = 'miami'

df_hum_v = pd.DataFrame()
df_hum_v['humimdity'] = df_vancouver['humidity']
df_hum_v['city'] = 'vancouver'

df_hum_mv = pd.concat([df_hum_m,df_hum_v])
df_hum_mv['humimdity'] = df_hum_mv['humimdity'].astype(float)

In [None]:
df_hum_mv.head()

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.histplot(data=df_hum_mv, x="humimdity", hue='city')

### 3.5.2 Annual trend

In [None]:
# Annual 

df_van_ann_hum = pd.DataFrame()
df_van_ann_hum['year'] = df_vancouver['day'].apply(lambda x:str(x)[:4])
df_van_ann_hum['day'] = df_vancouver['day']
df_van_ann_hum['humidity'] = df_vancouver['humidity']
df_van_ann_hum = df_van_ann_hum[df_van_ann_hum['year']=='2015']
df_van_ann_hum_agg = df_van_ann_hum[['day','humidity']].groupby('day').agg({'humidity':'mean'}).reset_index()
df_van_ann_hum_agg['dateime'] = df_van_ann_hum_agg['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_van_ann_hum_agg['city'] = 'vancouver'

df_mia_ann_hum = pd.DataFrame()
df_mia_ann_hum['year'] = df_miami['day'].apply(lambda x:str(x)[:4])
df_mia_ann_hum['day'] = df_miami['day']
df_mia_ann_hum['humidity'] = df_miami['humidity']
df_mia_ann_hum = df_mia_ann_hum[df_mia_ann_hum['year']=='2015']
df_mia_ann_hum_agg = df_mia_ann_hum[['day','humidity']].groupby('day').agg({'humidity':'mean'}).reset_index()
df_mia_ann_hum_agg['dateime'] = df_mia_ann_hum_agg['day'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df_mia_ann_hum_agg['city'] = 'miami'

df_ann_hum = pd.concat([df_van_ann_hum_agg,df_mia_ann_hum_agg],axis=0)

sns.lineplot(data=df_ann_hum, x="dateime", y="humidity",hue='city')

### 3.5.3 Boxplot to compare humidity's distribution and skewness for both Miami and Vancouver

In [None]:
sns.boxplot(x="type", y="value", hue= 'city', data=df_van_miami_2[df_van_miami_2['type']=='humidity'])

## 3.6 Explore & Compare Data : Wind Speed
We will explore differences between Miami and Vancouver based on wind speed data and show the distribution and trend 

### 3.6.1 Wind speed data distribution

In [None]:
df_wind_m = pd.DataFrame()
df_wind_m['wind_speed'] = df_miami['wind_speed']
df_wind_m['city'] = 'miami'

df_wind_v = pd.DataFrame()
df_wind_v['wind_speed'] = df_vancouver['wind_speed']
df_wind_v['city'] = 'vancouver'

df_wind_mv = pd.concat([df_wind_m,df_wind_v])
df_wind_mv['wind_speed'] = df_wind_mv['wind_speed'].astype(float)

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.histplot(data=df_wind_mv, x="wind_speed", hue='city')

### 3.6.2 Boxplot to compare wind speed's distribution and skewness for both Miami and Vancouver

In [None]:
sns.boxplot(x="type", y="value", hue= 'city', data=df_van_miami_2[df_van_miami_2['type']=='wind_speed'])

## 3.7 Explore & Compare Data : Pressure
We will explore differences between Miami and Vancouver based on wind pressure and show the distribution and trend 

### 3.7.1 Pressure data distribution

In [None]:
df_press_m = pd.DataFrame()
df_press_m['pressure'] = df_miami['pressure']
df_press_m['city'] = 'miami'

df_press_v = pd.DataFrame()
df_press_v['pressure'] = df_vancouver['pressure']
df_press_v['city'] = 'vancouver'

df_press_mv = pd.concat([df_press_m,df_press_v])
df_press_mv['pressure'] = df_press_mv['pressure'].astype(float)

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
sns.histplot(data=df_press_mv, x="pressure", hue='city')

### 3.7.2 Boxplot to compare pressure's distribution and skewness for both Miami and Vancouver

In [None]:
sns.boxplot(x="type", y="value", hue= 'city', data=df_van_miami_2[df_van_miami_2['type']=='pressure'])

## 3.8 Explore & Compare Data : Weather distribution based on temperature and humidity 
We want to explore weather distribution if we use temperature and humidity to get information on which which value of temperature and humidity that affect accurrence of weather phenomenon


In [None]:
df_temp_hum_mi = df_miami[(df_miami['day'] >= 20150101) & (df_miami['day'] <= 20151231)][['day','temperature','humidity','weather2']]
df_temp_hum_mi['city'] = 'miami'
df_temp_hum_van = df_vancouver[(df_vancouver['day'] >= 20150101) & (df_vancouver['day'] <= 20151231)][['day','temperature','humidity','weather2']]
df_temp_hum_van['city'] = 'vancouver'

df_temp_hum_vanmi = pd.concat([df_temp_hum_mi,df_temp_hum_van])
# df_temp_hum_vanmi = df_temp_hum_vanmi[(df_temp_hum_vanmi['day'] >= 20150101) & (df_temp_hum_vanmi['day'] <= 20151231)]

In [None]:
sns.scatterplot(data=df_temp_hum_mi, x="temperature", y="humidity", hue = 'weather2')

In [None]:
sns.scatterplot(data=df_temp_hum_van, x="temperature", y="humidity", hue = 'weather2')

## 3.9 Explore & Compare Data : Weather distribution based on pressure and temperature 
We want to explore weather distribution if we use temperature and humidity to get information on which which value of temperature and pressure that affect accurrence of weather phenomenon

In [None]:
df_temp_pres_mi = df_miami[(df_miami['day'] >= 20150101) & (df_miami['day'] <= 20151231)][['day','temperature','pressure','weather2']]
df_temp_pres_mi['city'] = 'miami'
df_temp_pres_van = df_vancouver[(df_vancouver['day'] >= 20150101) & (df_vancouver['day'] <= 20151231)][['day','temperature','pressure','weather2']]
df_temp_pres_van['city'] = 'vancouver'

df_temp_pres_vanmi = pd.concat([df_temp_pres_mi,df_temp_pres_van])
df_temp_pres_vanmi = df_temp_pres_vanmi[(df_temp_pres_vanmi['day'] >= 20150101) & (df_temp_pres_vanmi['day'] <= 20151231)]

In [None]:
sns.scatterplot(data=df_temp_pres_mi, x="temperature", y="pressure", hue = 'weather2')

In [None]:
sns.scatterplot(data=df_temp_pres_van, x="temperature", y="pressure", hue = 'weather2')

## 3.10 Explore & Compare Data : Weather distribution based on pressure and humidity 
We want to explore weather distribution if we use temperature and humidity to get information on which which value of pressure and humidity that affect accurrence of weather phenomenon

In [None]:
df_hum_pres_mi =  df_miami[(df_miami['day'] >= 20150101) & (df_miami['day'] <= 20151231)][['day','humidity','pressure','weather2']]
df_hum_pres_mi['city'] = 'miami'
df_hum_pres_van = df_vancouver[(df_vancouver['day'] >= 20150101) & (df_vancouver['day'] <= 20151231)][['day','humidity','pressure','weather2']]
df_hum_pres_van['city'] = 'vancouver'

df_hum_pres_vanmi = pd.concat([df_hum_pres_mi,df_hum_pres_van])
df_hum_pres_vanmi = df_hum_pres_vanmi[(df_hum_pres_vanmi['day'] >= 20150101) & (df_hum_pres_vanmi['day'] <= 20151231)]

In [None]:
sns.scatterplot(data=df_hum_pres_mi, x="humidity", y="pressure", hue = 'weather2')

In [None]:
sns.scatterplot(data=df_hum_pres_van, x="humidity", y="pressure", hue = 'weather2')

# *That's It,* Let's wrap it up and make some insight ! 