# Westbound I-94 Traffic 

## Table of Contents

1. [**Introduction**](#1)
    - Project Description
    - Data Description
2. [**Acquiring and Loading Data**](#2)
	- Importing Libraries and Notebook Setup
    - Loading Data
    - Basic Data Exploration
    - Areas to Fix
3. [**Data Proprocessing**](#3)
4. [**Exploratory Data Analysis**](#4)
5. [**Conclusion**](#5)
    - Insights
    - Suggestions
    - Possible Next Steps
6. [**Epilogue**](#6) 
    - References
    - Versioning

---

# 1

## Introduction

Insert Image Here (by dragging it)

### Project Description

**Goal/Purpose:** 

The goal of this project is to determine indicators of heavy traffic on I-94. 

<p>&nbsp;</p>

**Questions to be Answered:**

- How does weather impact traffic? 
- What are the seasonal impacts on traffic?
- What is the average impact to travel during heavy commute periods?

<p>&nbsp;</p>

**Assumptions/Methodology/Scope:** 

Briefly describe assumptions,processing steps, and the scope of this project.

<p>&nbsp;</p>

### Data Description

**Content:** 

This dataset is a csv file about Minneapolis-St.Paul traffic. The dataset lasts from 2012-2018 and contains hourly information about westbound traffic on I-94, including weather and holidays. 

<p>&nbsp;</p>

**Description of Attributes:** 

Here you can describe what each column represents.
| Column  | Description |
| :------ | :---------- |
| holiday | Categorical US National holidays plus regional holiday, Minnesota State Fair |
| temp | Numeric Average temp in kelvin |
| rain_1h | Numeric Amount in mm of rain that occurred in the hour |
| snow_1h | Numeric Amount in mm of snow that occurred in the hour  |
| clouds_all | Numeric Percentage of cloud cover |
| weather_main | Categorical Short textual description of the current weather |
| weather_description | Categorical Longer textual description of current weather |
| date_time | DateTime Hour of the data collected in local CST time |
| traffic_volume | Numeric Hourly I-94 ATR 301 reported westbound traffic  |
<p>&nbsp;</p>

**Acknowledgements:** 

This dataset is provided by John Hogue, Social Data Science & General Mills,  and the original source can be found on [Metro Interstate Traffic Volume Data Set - UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume#).

---

# 2

## Acquiring and Loading Data
### Importing Libraries and Notebook Setup

In [None]:
# Ignore warnings if needed
import warnings
warnings.filterwarnings('ignore')

# Data manipulation
import datetime
import numpy as np
import pandas as pd
import pandas.api.types as ptypes

# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Pandas settings
pd.options.display.max_columns = None
pd.options.display.max_colwidth = 60
pd.options.display.float_format = '{:,.3f}'.format

# Visualization settings
from matplotlib import rcParams
plt.style.use('fivethirtyeight')
rcParams['figure.figsize'] = (16, 5)   
rcParams['axes.spines.right'] = False
rcParams['axes.spines.top'] = False
rcParams['font.size'] = 12
# rcParams['figure.dpi'] = 300
rcParams['savefig.dpi'] = 300
plt.rc('xtick', labelsize=11)
plt.rc('ytick', labelsize=11)
custom_palette = ['#003f5c', '#444e86', '#955196', '#dd5182', '#ff6e54', '#ffa600']
custom_hue = ['#004c6d', '#346888', '#5886a5', '#7aa6c2', '#9dc6e0', '#c1e7ff']
custom_divergent = ['#00876c', '#6aaa96', '#aecdc2', '#f1f1f1', '#f0b8b8', '#e67f83', '#d43d51']
sns.set_palette(custom_palette)
%config InlineBackend.figure_format = 'retina'

### Loading Data

In [None]:
# # Load DataFrame
# file = 'file.csv'
# df = pd.read_csv(file)

### Basic Data Exploration

#### Number of Rows and Columns

In [None]:
# # Show rows and columns count
# print(f"Rows count: {df.shape[0]}\nColumns count: {df.shape[1]}")

#### Display First and Last Rows

In [None]:
# # Look at first 5 rows
# df.head()

In [None]:
# # Look at last 5 rows
# df.tail()

#### Check Data Types

In [None]:
# # Show data types
# df.info()

- `column1`, `column2`, `column3` are **strings**.
- `column4` and `column5` are **floats**.
- `column6` is an **integer**.

`column3` should be a **datetime** type instead.

#### Check Missing Data

In [None]:
# # Print percentage of missing values
# missing_percent = df.isna().mean().sort_values(ascending=False)
# print('---- Percentage of Missing Values (%) -----')
# if missing_percent.sum():
#     print(missing_percent[missing_percent > 0] * 100)
# else:
#     print(None)

#### Check for Duplicate Rows

In [None]:
# # Show number of duplicated rows
# print(f"No. of entirely duplicated rows: {df.duplicated().sum()}")

# # Show duplicated rows
# df[df.duplicated()]

#### Check Uniqueness of Data

In [None]:
# # Print the number of unique values
# num_unique = df.nunique().sort_values()
# print('---- Number of Unique Values -----')
# print(num_unique)

#### Check Data Range

In [None]:
# # Print summary statistics
# df.describe(include='all')

### Areas to Fix
**Data Types**
- [ ] Issue 1

**Missing Data**
- [ ] 

**Duplicate Rows**
- [ ]

**Uniqueness of Data**
- [ ]

**Data Range**
- [ ]

---

# 3

## Data Preprocessing

Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

### Rename Columns

In [None]:
# # Rename columns
# columns_to_rename = {}
# df.rename(columns=columns_to_rename, inplace=True)

In [None]:
# # Verify columns are renamed
# df.columns

### Drop Redundant Columns

In [None]:
# # Check the proportion of the most frequent value in each column
# print('---- Frequency of the Mode (%) -----')
# mode_dict = {col: (df[col].value_counts().iat[0] / df[col].size * 100) for col in df.columns}
# mode_series = pd.Series(mode_dict)
# mode_series

In [None]:
# # Show the value frequency of each column greater than the mode's threshold
# threshold = 80
# for col in mode_series[mode_series > threshold].index:
#     print(df[col].value_counts(dropna=False))
#     print()

In [None]:
# # Drop columns 
# cols_to_drop = []
# df.drop(columns=cols_to_drop, axis=1, inplace=True)

In [None]:
# # Verify columns dropped
# assert all(col not in df.columns for col in cols_to_drop)

### Changing Data Types

In [None]:
# # Convert columns to the right data types
# df[col] = df[col].astype('string')
# df[col] = df[col].astype('int')
# df[col] = pd.to_datetime(df[col], infer_datetime_format=True)

In [None]:
# # Verify conversion
# assert ptypes.is_string_dtype(col)
# assert ptypes.is_numeric_dtype(col)
# cols_to_check = []
# assert all(ptypes.is_datetime64_any_dtype(df[col]) for col in cols_to_check)

### Dropping Duplicates

In [5]:
# # Drop entirely duplicated rows
# df.drop_duplicates(inplace=True, ignore_index=True)

In [2]:
# # Verify rows dropped
# assert df.duplicated().sum()==0

### Handling Missing Values

### Handling Unreasonable Data Ranges

In [3]:
# # Drop affected rows
# df = df.loc[~((df['A'] == 0) | (df['B'] > 100))].reset_index()

In [4]:
# # Verify rows dropped
# len(df)

### Feature Engineering / Transformation

---

# 4

## Exploratory Data Analysis

Here is where your analysis begins. You can add different sections based on your project goals.

### Exploring `Column Name`

In [None]:
# Code and visualization

**Observations**
- Ob 1
- Ob 2
- Ob 3

---

# 5

## Conclusion

### Insights 
State the insights/outcomes of your project or notebook.

### Suggestions

Make suggestions based on insights.

### Possible Next Steps
Areas to expand on:
- (if there is any)

---

# 6

## Epilogue

### References

This is how we use inline citation[<sup id="fn1-back">[1]</sup>](#fn1).

[<span id="fn1">1.</span>](#fn1-back) _subject (date)._ Title. Available at: https://website.com (Accessed: Date). 

> Use [https://www.citethisforme.com/](https://www.citethisforme.com/) to create citations.

### Versioning
Notebook and insights by (author).
- Version: 1.0.0
- Date: 