# Westbound traffic on the I-94 Interstate Highway

The goal of this analysis is to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, etc.

In [None]:
#Read dataset
import pandas as pd
traffic = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
print(traffic.head(5))
print(traffic.tail(5))

In [None]:
traffic.info()

From the above, we can see that there are 9 columns in the dataset.

# Analyzing traffic data

We will plot a histogram to visualize the distribution of the traffic volume column.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

traffic['traffic_volume'].plot.hist()

In [None]:
traffic['traffic_volume'].describe()

Based on the distribution above:

1. About 25% of the time, there were 1,193 cars or fewer passing the station each hour — this probably occurs during the night, or when a road is under construction.
2. About 75% of the time, the traffic volume was 4,933 cars or fewer. The remaining 25% of the time, the traffic volume exceeded 4,933 cars.

# Traffic volume: Day vs Night

Previously, we have determined that there is a possibility that nighttime and daytime might influence traffic volume. We'll start by dividing the dataset into two parts:

1. Daytime data: hours from 7 a.m. to 7 p.m. (12 hours) 
2. Nighttime data: hours from 7 p.m. to 7 a.m. (12 hours)

In [None]:
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
print(traffic['date_time'])

In [None]:
traffic['date_time'].dt.hour

In [None]:
daytime = traffic[(traffic['date_time'].dt.hour >= 7) & (traffic['date_time'].dt.hour < 19)]
print(daytime)

In [None]:
nighttime = traffic[((traffic['date_time'].dt.hour >= 19) & (traffic['date_time'].dt.hour <= 23)) | ((traffic['date_time'].dt.hour >= 0) & (traffic['date_time'].dt.hour < 7))]
print(nighttime)

In [None]:
#Plot histograms fro both day and night
plt.figure(figsize=(10,4))

plt.subplot(1, 2, 1)
plt.hist(daytime['traffic_volume'])
plt.title('Daytime Traffic')
plt.xlabel('Traffic Volume')
plt.ylabel('Frequency')

plt.subplot(1, 2, 2)
plt.hist(nighttime['traffic_volume'])
plt.title('Nighttime Traffic')
plt.xlabel('Traffic Volume')
plt.ylabel('Frequency')

Based on both plots:

1. Both plots have a skewed distribution.
2. Traffic at night is light, but our goal is to find indicators of heavy traffic, therefore, nighttime data is not reliable.

# Time indicators

One of the possible indicators of heavy traffic is time. There might be more people on the road in a certain month, on a certain day, or at a certain time of the day.

We're going to look at a few line plots showing how the traffic volume changed according to the following parameters:

1. Month
2. Day of the week
3. Time of day

In [None]:
daytime['traffic_volume'].describe()

In [None]:
nighttime['traffic_volume'].describe()

From the above, we can clearly see that traffic volume is much lower during the night.

In [None]:
daytime['month'] = daytime['date_time'].dt.month #traffic volume for each month 
by_month = daytime.groupby('month').mean() #groups traffic volume by month 
by_month['traffic_volume']

In [None]:
by_month['traffic_volume'].plot.line(x='Month',y='Traffic Volume')
plt.show()

The line plot shows that the highest traffic volume was during the months of August and November and experienced the lowest volume in December.

In [None]:
daytime['dayofweek'] = daytime['date_time'].dt.dayofweek #traffic volume for each day
by_dayofweek = daytime.groupby('dayofweek').mean()
by_dayofweek['traffic_volume']  # 0 is Monday, 6 is Sunday

In [None]:
by_dayofweek['traffic_volume'].plot.line(x='Day',y='Traffic Volume')
plt.show()

Traffic volume is higher during business days compared to weekends. This is because on business days people are commuting to and from work whereas on weekends people don't work as often.

In [None]:
daytime['hour'] = daytime['date_time'].dt.hour #hourly traffic volume
bussiness_days = daytime.copy()[daytime['dayofweek'] <= 4] # 4 == Friday
weekend = daytime.copy()[daytime['dayofweek'] >= 5] # 5 == Saturday
by_hour_business = bussiness_days.groupby('hour').mean()
by_hour_weekend = weekend.groupby('hour').mean()

print(by_hour_business['traffic_volume'])
print(by_hour_weekend['traffic_volume'])

In [None]:
#Plot line plots for both weekdays and weekends
plt.figure(figsize=(10,4))

plt.subplot(1,2,1)
plt.plot(by_hour_business['traffic_volume'])
plt.title('Weekday Traffic')
plt.xlabel('Hour')
plt.ylabel('Traffic Volume')

plt.subplot(1,2,2)
plt.plot(by_hour_weekend['traffic_volume'])
plt.title('Weekend Traffic')
plt.xlabel('Hour')
plt.ylabel('Traffic Volume')

We can now summarize that traffic volume is higher during business days rather than on weekends. Furthermore on weekdays traffic volume is highest at 7am and 4pm when people are commuting to work in the morning and leaving in the evening.

In [None]:
daytime.corr()['traffic_volume']

# Weather indicators

Another possible indicator of heavy traffic is weather. The dataset provides us with a few useful columns about weather: temp, rain_1h, snow_1h, clouds_all, weather_main, weather_description.

In [None]:
daytime.plot.scatter(x='traffic_volume',y='temp')

There is no reliable indicator of heavy traffic based on this scatter plot. To see if we can find more useful data, we'll look next at the categorical weather-related columns: weather_main and weather_description.

# Weather types

We're going to calculate the average traffic volume associated with each unique value in these two columns. We've already calculated the values for you — we grouped the data by weather_main and weather_description while using the mean as an aggregate function.

In [None]:
by_weather_main = daytime.groupby('weather_main').mean()
by_weather_description = daytime.groupby('weather_description').mean()

In [None]:
by_weather_main['traffic_volume'].plot.barh()

In [None]:
by_weather_description['traffic_volume'].plot.barh(figsize=(7,10))

# Conclusion

In this project, we tried to find a few indicators of heavy traffic on the I-94 Interstate highway. We managed to find two types of indicators:

1. Time indicators

   a) The traffic is usually heavier during warm months (March–October) compared to cold months (November–February).
   b) The traffic is usually heavier on business days compared to the weekends.
   c) On business days, the rush hours are around 7 and 16.

2. Weather indicators

   a) Shower snow
   b) Light rain and snow
   c) Proximity thunderstorm with drizzle