# Metro Interstate Traffic Volume

**Data Set Information:**

Hourly Interstate 94 Westbound traffic volume for MN DoT ATR station 301, roughly midway between Minneapolis and St Paul, MN. Hourly weather features and holidays included for impacts on traffic volume.


**Attribute Information:**

**- holiday:** Categorical US National holidays plus regional holiday, Minnesota State Fair;<br>
**- temp:** Numeric Average temp in kelvin;<br>
**- rain_1h:** Numeric Amount in mm of rain that occurred in the hour;<br>
**- snow_1h:** Numeric Amount in mm of snow that occurred in the hour;<br>
**- clouds_all:** Numeric Percentage of cloud cover;<br>
**- weather_main:** Categorical Short textual description of the current weather;<br>
**- weather_description:** Categorical Longer textual description of the current weather;<br>
**- date_time:** DateTime Hour of the data collected in local CST time;<br>
**- traffic_volume:** Numeric Hourly I-94 ATR 301 reported westbound traffic volume.

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Get the Data

In [None]:
data = pd.read_csv('MetroInterstateTrafficVolume.csv')

In [None]:
data

### Basic Data Information

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.describe(include = 'object')

### Check missing values

In [None]:
print(data.isnull().sum(axis=0))

### Exploratory Data Analysis

**Traffic Volume**

*Histogram of Traffic Volume distribution*

In [None]:
fig = plt.figure(figsize = (20,5))
sns.set_style('darkgrid')
bins = np.arange(0, 7500, 250).tolist()
data['traffic_volume'].hist(bins=bins)
plt.xticks(bins)
plt.xlabel('traffic_volume')

**Traffic Volume vs Holiday**

*Check holidays included in the dataset*

In [None]:
data['holiday'].unique()

*Box plot of Traffic Volume vs Holiday distribution*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data['holiday'], data['traffic_volume'])
plt.show()

*Distribution only with the holidays*

In [None]:
data_holidays = data.loc[(data['holiday'] != 'None')]
data_holidays.index = np.arange(1, len(data_holidays) + 1)
data_holidays

*Box plot of Traffic Volume vs Holiday distribution (only holidays included)*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data_holidays['holiday'], data_holidays['traffic_volume'])
plt.show()

**Traffic Volume vs Temperature**

*Plot of Traffic Volume vs Temperature distribution*

In [None]:
fig = sns.jointplot(data['temp'], data['traffic_volume'], kind='reg')

*Removing outliers*

In [None]:
outliers = data[(data['temp'] <= 50)]
data = data.drop(outliers.index)
data.index = np.arange(1, len(data) + 1)
outliers

*Plot of Traffic Volume vs Temperature distribution (without outliers)*

In [None]:
fig = sns.jointplot(data['temp'], data['traffic_volume'], kind='reg')

**Traffic Volume vs Rain**

*Plot of Traffic Volume vs Rain distribution*

In [None]:
fig = plt.figure(figsize = (25,15))
ax1 = fig.add_subplot(2,3,1)
ax1.scatter(data['rain_1h'], data['traffic_volume'])

*Removing outliers*

In [None]:
outliers = data[(data['rain_1h'] >= 1000)]
data = data.drop(outliers.index)
data.index = np.arange(1, len(data) + 1)
outliers

*Plot of Traffic Volume vs Rain distribution (without outliers)*

In [None]:
fig = sns.jointplot(data['rain_1h'], data['traffic_volume'], kind='reg')

*Distribution only with rainy days*

In [None]:
data_rainy = data.loc[(data['rain_1h'] > 0)]
#data_rainy = data.loc[(data['weather_main'] == "Rain")]
data_rainy.index = np.arange(1, len(data_rainy) + 1)
data_rainy

*Plot of Traffic Volume vs Rain distribution (only rainy days included)*

In [None]:
fig = sns.jointplot(data_rainy['rain_1h'], data_rainy['traffic_volume'], kind='reg')

**Traffic Volume vs Snow**

*Plot of Traffic Volume vs Snow distribution*

In [None]:
fig = plt.figure(figsize = (25,15))
ax1 = fig.add_subplot(2,3,1)
ax1.scatter(data['snow_1h'], data['traffic_volume'])

*Distribution only with snowy days*

In [None]:
data_snowy = data.loc[(data['snow_1h'] > 0)]
#data_snowy = data.loc[(data['weather_main'] == "Snow")]
data_snowy.index = np.arange(1, len(data_snowy) + 1)
data_snowy

*Plot of Traffic Volume vs Snow distribution (only snowy days included)*

In [None]:
fig = sns.jointplot(data_snowy['snow_1h'], data_snowy['traffic_volume'], kind='reg')

**Traffic Volume vs Cloud cover**

*Plot of Traffic Volume vs Cloud cover distribution*

In [None]:
fig = plt.figure(figsize = (25,15))

ax1 = fig.add_subplot(2,3,1)
ax1.scatter(data['clouds_all'], data['traffic_volume'])

In [None]:
#data['clouds_all'].unique()
#x = data['clouds_all'].value_counts()
#y = list(x[:10].index)

#plt.figure(figsize=(20, 8))
#sns.boxplot(data['clouds_all'], data['traffic_volume'])
#plt.show()

**Traffic Volume vs Current weather**

*Box plot of Traffic Volume vs Current weather distribution*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data['weather_main'], data['traffic_volume'])
plt.show()

**Traffic Volume vs Date time**

*Separation of the date elements*

In [None]:
data

In [None]:
data[['year','month','day','hour','minutes','seconds']] = data['date_time'].str.extract(r'(\d+)-(\d+)-(\d+)\s*(\d+):(\d+):(\d+)', expand=True)
data = data.drop(['date_time'], axis=1)

*Dataset with new labels*

In [None]:
data

*Box plot of Traffic Volume vs Year*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data['year'], data['traffic_volume'])
plt.show()

*Box plot of Traffic Volume vs Month*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data['month'], data['traffic_volume'])
plt.show()

*Box plot of Traffic Volume vs Hour*

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(data['hour'], data['traffic_volume'])
plt.show()