### Baltimore has recently received access to Waze data through the Connected Citizens program. Using proprietary data from Waze, we are able to access ridership data for every major street segment in Baltimore. Using this ridership data, one of our goals is to understand the relationship between traffic and number of lanes in the city of Baltimore, particularly in understanding how street congestion relates to number of lanes - ie. whether highways tend to be more congested, and if so, which highways have particularly high rates of traffic. Using the AADT ridership data, we seek to perform linear and polynomial regression for the dataset to understand the relationship between traffic and number of lanes - in doing such, we can better understand how to model these datasets for future forecasting of Baltimore traffic and optimization of lane layouts. 

# Import Libraries 

In [75]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt 
import numpy as np

# Import Data

In [14]:
## We are working with five CSV documents 
## In this specific analysis, we will be working with the AADT dataset 

aadt = pd.read_csv('AADTData.csv')
loc_data = pd.read_csv('LocationData.csv')
accident = pd.read_csv('TrafficAccidents.csv')
alert = pd.read_csv('TrafficAlerts.csv')
irreg = pd.read_csv('TrafficIrregularities.csv')

# Clean Data 

In [111]:
####### Write the code for cleaning up the dataset ##############


traffic = aadt['AADT'] # Annual traffic is given by the AADT column in aadt df
num_lanes = aadt['NUM_LANES'] # Number of lanes is given by NUM_LANES column in aadt df 
traffic = [t for t in traffic if str(t) != "nan"] # Remove any entry with NaN 
num_lanes = [n for n in num_lanes if str(n) != "nan"] # Remove any entry with NaN 

trafficmean = aadt.groupby('NUM_LANES', as_index=False)['AADT'].mean() # Took average traffic time within each
                                                                    # number of lane category and found mean
                                                                    # traffic time 
trafficmean.drop(labels = [3, 4]) # Dropped rows 3 and 4 as it had non-integer values 

num = np.asarray(trafficmean['NUM_LANES']) # Convert columns to numpy array for plotting 
traff = np.asarray(trafficmean['AADT']) # Convert columns to numpy array for plotting 



# Build a Data Visualization with Plotly 

In [104]:
# Made a Plotly scatter plot against number of lanes (X-axis), and (Y-axis) 
# This will allow us to understand the initial relationship between number of lanes and traffic 
fig = px.scatter( x = num, y = traff, title = "Relationship between # of Lanes and Traffic Time in Baltimore"
                ,labels = {'NUM_LANES' : 'Number of Lanes', 'AADT': 'Traffic Time (Annual)'})
fig.show()

# What is interesting about the data and what are your next steps? 

### What does this data show?

#### This data shows a particularly logarithmic trend - namely, as the number of lanes increase, the amount of traffic increases logarithmically. While we expected a fairly linear trend before, this observation confirms our expectation an increase in the number of lanes would also increase the amount of traffic. Next steps in the analysis would be to 1) conduct the analysis with respect to ridership (to standardize) 2) perform a quantitative measure of linear and logarithmic regression and further understand the most accurate model, and 3) address the question of whether number of lanes affects number of accidents with respect to ridership. Understanding these relationships would allow us to further understand how Baltimore government can further optimize road patterns to decrease traffic congestion in the city.

#### This chart, overall, is not particularly representative of the actual relationship between number of lanes and traffic patterns. This is because we must adjust our data to ridership - in other words, we need to standardize certain roads by number of Wazers. For example, highways will always have more drivers by virtue of being a highway. Thus we need to adjust for these factors in our preprocessing of data before conducting further analysis. When running the linear and logarithmic regression models in future analyses, we will make sure to complete this preprocessing step before further analysis. 

# Export an interactive visualization for your city department

In [112]:
# Export visualization as an html document
fig.write_html("SiddharthArun_Visualization.html")