# Problem 1

Urban  traffic  management  requires  balancing  multiple  conflicting  objectives,  such  as 
minimizing travel time, reducing fuel consumption, and minimizing air pollution. Your task is 
to apply a Multi-Objective Evolutionary Algorithm (MOEA) to optimize traffic management 
strategies  for  selected  New  York  City  (NYC)  areas.  The  goal  is  to  minimize  conflicting 
objectives, Total Travel Time (TTT) and Fuel Consumption (FC), using real-world traffic data 
from NYC Open Data.

In [35]:
import os
import numpy as np

""" Set up matplotlib and pandas """
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8-darkgrid")

import pandas as pd
pd.set_option("display.max_columns", 200)

""" Set up imports.
importlib is used to reload modules such that
changes to the code are reflected in the notebook.
"""

from importlib import reload

import utils.data_loader
reload(utils.data_loader)
from utils.data_loader import load_data

# Download data

In [36]:
segment_df, traffic_df = load_data(root=os.path.join(os.getcwd(), "data", "raw"))

# Discover & Preprocess data

In this section, we will explore the data to get a better understanding of the data and the relationships between the variables.
We must also clean the data to remove any null values and any duplicates, and ensure that we do not include any data that is not relevant to the analysis. Such as removing any traffic data that is not in the chosen road segments.

### Data Discovery


In [39]:
segment_df.head()

Unnamed: 0,ID,SPEED,TRAVEL_TIME,DATA_AS_OF,LINK_ID,LINK_POINTS,BOROUGH,LINK_NAME
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
1,170,39.76,95,09/24/2024 09:09:12 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
2,170,39.76,94,09/24/2024 09:14:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
3,170,40.38,93,09/24/2024 09:19:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
4,170,39.14,96,09/24/2024 09:24:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway


In [40]:
traffic_df.head()

Unnamed: 0,ID,SegmentID,Roadway Name,From,To,Direction,Date,12:00-1:00 AM,1:00-2:00AM,2:00-3:00AM,3:00-4:00AM,4:00-5:00AM,5:00-6:00AM,6:00-7:00AM,7:00-8:00AM,8:00-9:00AM,9:00-10:00AM,10:00-11:00AM,11:00-12:00PM,12:00-1:00PM,1:00-2:00PM,2:00-3:00PM,3:00-4:00PM,4:00-5:00PM,5:00-6:00PM,6:00-7:00PM,7:00-8:00PM,8:00-9:00PM,9:00-10:00PM,10:00-11:00PM,11:00-12:00AM
0,1,15540,BEACH STREET,UNION PLACE,VAN DUZER STREET,NB,01/09/2012,20.0,10.0,11.0,14.0,13.0,20.0,34.0,66.0,100.0,52.0,68.0,85.0,85.0,94.0,104.0,105.0,147.0,120.0,91.0,83.0,74.0,49.0,42.0,42.0
1,2,15540,BEACH STREET,UNION PLACE,VAN DUZER STREET,NB,01/10/2012,21.0,16.0,8.0,6.0,13.0,13.0,31.0,70.0,67.0,45.0,57.0,67.0,73.0,95.0,102.0,98.0,133.0,131.0,95.0,73.0,70.0,63.0,42.0,35.0
2,3,15540,BEACH STREET,UNION PLACE,VAN DUZER STREET,NB,01/11/2012,27.0,14.0,6.0,5.0,12.0,16.0,34.0,75.0,69.0,71.0,67.0,70.0,90.0,89.0,115.0,115.0,130.0,143.0,106.0,89.0,68.0,64.0,56.0,43.0
3,4,15540,BEACH STREET,UNION PLACE,VAN DUZER STREET,NB,01/12/2012,22.0,7.0,7.0,8.0,11.0,12.0,33.0,75.0,89.0,66.0,70.0,60.0,105.0,103.0,71.0,127.0,122.0,144.0,122.0,76.0,64.0,58.0,64.0,43.0
4,5,15540,BEACH STREET,UNION PLACE,VAN DUZER STREET,NB,01/13/2012,31.0,17.0,7.0,5.0,13.0,28.0,29.0,68.0,84.0,64.0,83.0,89.0,88.0,113.0,113.0,126.0,133.0,135.0,102.0,106.0,58.0,58.0,55.0,54.0


In [45]:
print(f"Traffic Data Time Range: {traffic_df['Date'].min()} - {traffic_df['Date'].max()}")
print(f"Segment Data Time Range: {segment_df['DATA_AS_OF'].min()} - {segment_df['DATA_AS_OF'].max()}")

Traffic Data Time Range: 01/08/2012 - 12/13/2020
Segment Data Time Range: 09/24/2024 01:04:03 PM - 09/27/2024 12:59:09 AM


One quick observation is that the traffic_df has a lot more data than the segment_df. This is because the traffic_df contains data for all the roads in NYC, while the segment_df only contains data for the selected road segments. Additionally, the traffic_df has a lot more data because i

In [28]:
segment_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4323 entries, 0 to 836
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           4323 non-null   int64  
 1   SPEED        4323 non-null   float64
 2   TRAVEL_TIME  4323 non-null   int64  
 3   DATA_AS_OF   4323 non-null   object 
 4   LINK_ID      4323 non-null   int64  
 5   LINK_POINTS  4323 non-null   object 
 6   BOROUGH      4323 non-null   object 
 7   LINK_NAME    4323 non-null   object 
dtypes: float64(1), int64(3), object(4)
memory usage: 304.0+ KB


In [41]:
traffic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42756 entries, 0 to 42755
Data columns (total 31 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             42756 non-null  int64  
 1   SegmentID      42756 non-null  int64  
 2   Roadway Name   42756 non-null  object 
 3   From           42756 non-null  object 
 4   To             42756 non-null  object 
 5   Direction      42756 non-null  object 
 6   Date           42756 non-null  object 
 7   12:00-1:00 AM  42752 non-null  float64
 8   1:00-2:00AM    42752 non-null  float64
 9   2:00-3:00AM    42752 non-null  float64
 10  3:00-4:00AM    42752 non-null  float64
 11  4:00-5:00AM    42752 non-null  float64
 12  5:00-6:00AM    42752 non-null  float64
 13  6:00-7:00AM    42752 non-null  float64
 14  7:00-8:00AM    42752 non-null  float64
 15  8:00-9:00AM    42752 non-null  float64
 16  9:00-10:00AM   42752 non-null  float64
 17  10:00-11:00AM  42753 non-null  float64
 18  11:00-

In [37]:
traffic_df.isnull().sum()

ID                 0
SegmentID          0
Roadway Name       0
From               0
To                 0
Direction          0
Date               0
12:00-1:00 AM      4
1:00-2:00AM        4
2:00-3:00AM        4
3:00-4:00AM        4
4:00-5:00AM        4
5:00-6:00AM        4
6:00-7:00AM        4
7:00-8:00AM        4
8:00-9:00AM        4
9:00-10:00AM       4
10:00-11:00AM      3
11:00-12:00PM      1
12:00-1:00PM     253
1:00-2:00PM      253
2:00-3:00PM      253
3:00-4:00PM      253
4:00-5:00PM      253
5:00-6:00PM      253
6:00-7:00PM      253
7:00-8:00PM      253
8:00-9:00PM      253
9:00-10:00PM     253
10:00-11:00PM    253
11:00-12:00AM    253
dtype: int64