# Problem 1

Urban  traffic  management  requires  balancing  multiple  conflicting  objectives,  such  as 
minimizing travel time, reducing fuel consumption, and minimizing air pollution. Your task is 
to apply a Multi-Objective Evolutionary Algorithm (MOEA) to optimize traffic management 
strategies  for  selected  New  York  City  (NYC)  areas.  The  goal  is  to  minimize  conflicting 
objectives, Total Travel Time (TTT) and Fuel Consumption (FC), using real-world traffic data 
from NYC Open Data.

In [17]:
import os
import numpy as np
import datetime

""" Set up matplotlib and pandas """
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8-darkgrid")

import pandas as pd
pd.set_option("display.max_columns", 200)

""" Set up imports.
importlib is used to reload modules such that
changes to the code are reflected in the notebook.
"""

from importlib import reload

import utils.data_loader
reload(utils.data_loader)
from utils.data_loader import load_data

# Download data

In [18]:
speed_df, volume_df = load_data(root=os.path.join(os.getcwd(), "data", "raw"))

# Discover & Preprocess data

In this section, we will explore the data to get a better understanding of the data and the relationships between the variables.
We must also clean the data to remove any null values and any duplicates, and ensure that we do not include any data that is not relevant to the analysis. Such as removing any traffic data that is not in the chosen road segments.

### Data Discovery


In [19]:
speed_df.head()

Unnamed: 0,ID,SPEED,TRAVEL_TIME,DATA_AS_OF,LINK_ID,LINK_POINTS,BOROUGH,LINK_NAME
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
1,170,39.76,95,09/24/2024 09:09:12 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
2,170,39.76,94,09/24/2024 09:14:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
3,170,40.38,93,09/24/2024 09:19:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway
4,170,39.14,96,09/24/2024 09:24:10 AM,4616356,"40.6665206,-73.76246 40.66738,-73.77021 40.667...",Queens,Belt Pkwy W 182nd St - JFK Expressway


In [20]:
volume_df.head()

Unnamed: 0,RequestID,Boro,Yr,M,D,HH,MM,Vol,SegmentID,WktGeom,street,fromSt,toSt,Direction
0,37697,Brooklyn,2024,6,10,22,45,154,28962,POINT (990590.8197336476 188336.94708352597),FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB
1,37697,Brooklyn,2024,6,10,23,0,154,28962,POINT (990590.8197336476 188336.94708352597),FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB
2,37697,Brooklyn,2024,6,10,23,15,141,28962,POINT (990590.8197336476 188336.94708352597),FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB
3,37697,Brooklyn,2024,6,10,23,30,154,28962,POINT (990590.8197336476 188336.94708352597),FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB
4,37697,Brooklyn,2024,6,10,23,45,121,28962,POINT (990590.8197336476 188336.94708352597),FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB


In [21]:
speed_df = speed_df.assign(LINK_POINTS=speed_df['LINK_POINTS'].str.split(' ')).explode('LINK_POINTS')
speed_df[['Latitude', 'Longitude']] = speed_df['LINK_POINTS'].str.split(',', expand=True)
speed_df['Location'] = speed_df.apply(lambda x : f"{x['Latitude']}, {x['Longitude']}", axis=1)
speed_df.drop(columns=['Latitude', 'Longitude', 'LINK_POINTS'], inplace=True)

In [22]:
speed_df.head()

Unnamed: 0,ID,SPEED,TRAVEL_TIME,DATA_AS_OF,LINK_ID,BOROUGH,LINK_NAME,Location
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,Queens,Belt Pkwy W 182nd St - JFK Expressway,"40.6665206, -73.76246"
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,Queens,Belt Pkwy W 182nd St - JFK Expressway,"40.66738, -73.77021"
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,Queens,Belt Pkwy W 182nd St - JFK Expressway,"40.66751, -73.77209"
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,Queens,Belt Pkwy W 182nd St - JFK Expressway,"40.66752, -73.772861"
0,170,39.14,96,09/24/2024 09:04:11 AM,4616356,Queens,Belt Pkwy W 182nd St - JFK Expressway,"40.66749, -73.775591"


In [23]:
import pyproj

volume_df['Date'] = volume_df.apply(lambda x : datetime.datetime(year=int(x['Yr']), month=x['M'], day=x['D'], hour=x['HH'], minute=x['MM']).strftime('%m/%d/%Y %I:%M:%S %p'), axis=1)

# Extract coordinates from WktGeom
volume_df[['x', 'y']] = volume_df['WktGeom'].str.extract(r'POINT \((.+) (.+)\)')
volume_df['x'] = volume_df['x'].astype(float)
volume_df['y'] = volume_df['y'].astype(float)

# Define the coordinate transformation
# Assuming the input is NY State Plane Long Island (EPSG:2263)
# and we want to convert to WGS84 (EPSG:4326)
transformer = pyproj.Transformer.from_crs("EPSG:2263", "EPSG:4326", always_xy=True)

# Apply the transformation
volume_df['Longitude'], volume_df['Latitude'] = transformer.transform(volume_df['x'], volume_df['y'])

# Create the Location column in the desired format
volume_df['Location'] = volume_df.apply(lambda row: f"{row['Latitude']:.5f}, {row['Longitude']:.5f}", axis=1)

volume_df.drop(columns=['Yr', 'M', 'D', 'HH', 'MM', 'WktGeom', 'x', 'y', 'Latitude', 'Longitude'], inplace=True)

print(f"Volume Data Time Range: {volume_df['Date'].min()} - {volume_df['Date'].max()}")
print(f"Speed Data Time Range: {speed_df['DATA_AS_OF'].min()} - {speed_df['DATA_AS_OF'].max()}")

Volume Data Time Range: 01/06/2024 01:00:00 AM - 06/10/2024 12:45:00 PM
Speed Data Time Range: 09/24/2024 01:04:03 PM - 09/27/2024 12:59:09 AM


In [34]:
print(f"Loc: [V: {volume_df['Location'][0]}, S: {speed_df['Location'][0]}]")


Loc: [V: 40.68362, -73.97714, S: 0        40.6665206, -73.76246
0          40.66738, -73.77021
0          40.66751, -73.77209
0         40.66752, -73.772861
0         40.66749, -73.775591
0          40.66722, -73.78108
0         40.66673, -73.786471
0       40.8240706, -73.874311
0           40.8247, -73.86959
0        40.8251906, -73.86596
0          40.82536, -73.86426
0          40.82587, -73.85961
0        40.8266006, -73.85424
0        40.8271806, -73.84994
0           40.61632, -74.0263
0          40.61928, -74.02375
0       40.6237206, -74.019951
0           40.6248, -74.01925
0         40.62583, -74.018741
0       40.6279506, -74.017891
0          40.63065, -74.01698
0         40.63237, -74.016321
0       40.6332305, -74.016151
0    40.73744001, -73.85188001
0                       , None
0      40.737015, -73.85373001
0                       , None
0       40.73673, -73.85543001
0                       , None
0      40.736526, -73.85631001
0                       , None
0     

In [24]:
volume_df.head()

Unnamed: 0,RequestID,Boro,Vol,SegmentID,street,fromSt,toSt,Direction,Date,Location
0,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 10:45:00 PM,"40.68362, -73.97714"
1,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:00:00 PM,"40.68362, -73.97714"
2,37697,Brooklyn,141,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:15:00 PM,"40.68362, -73.97714"
3,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:30:00 PM,"40.68362, -73.97714"
4,37697,Brooklyn,121,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:45:00 PM,"40.68362, -73.97714"


In [25]:
merged_df = pd.merge(speed_df, volume_df, on=['Location'])

In [26]:
merged_df.head()

Unnamed: 0,ID,SPEED,TRAVEL_TIME,DATA_AS_OF,LINK_ID,BOROUGH,LINK_NAME,Location,RequestID,Boro,Vol,SegmentID,street,fromSt,toSt,Direction,Date


In [27]:
volume_df.head()

Unnamed: 0,RequestID,Boro,Vol,SegmentID,street,fromSt,toSt,Direction,Date,Location
0,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 10:45:00 PM,"40.68362, -73.97714"
1,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:00:00 PM,"40.68362, -73.97714"
2,37697,Brooklyn,141,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:15:00 PM,"40.68362, -73.97714"
3,37697,Brooklyn,154,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:30:00 PM,"40.68362, -73.97714"
4,37697,Brooklyn,121,28962,FLATBUSH AVENUE,Atlantic Avenue,Eastern Parkway Line,NB,06/10/2024 11:45:00 PM,"40.68362, -73.97714"


One quick observation is that the traffic_df has a lot more data than the segment_df. This is because the traffic_df contains data for all the roads in NYC, while the segment_df only contains data for the selected road segments. Additionally, the traffic_df has a lot more data because i

In [28]:
speed_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 38622 entries, 0 to 836
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           38622 non-null  int64  
 1   SPEED        38622 non-null  float64
 2   TRAVEL_TIME  38622 non-null  int64  
 3   DATA_AS_OF   38622 non-null  object 
 4   LINK_ID      38622 non-null  int64  
 5   BOROUGH      38622 non-null  object 
 6   LINK_NAME    38622 non-null  object 
 7   Location     38622 non-null  object 
dtypes: float64(1), int64(3), object(4)
memory usage: 2.7+ MB


In [29]:
volume_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37920 entries, 0 to 37919
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   RequestID  37920 non-null  int64 
 1   Boro       37920 non-null  object
 2   Vol        37920 non-null  int64 
 3   SegmentID  37920 non-null  int64 
 4   street     37920 non-null  object
 5   fromSt     37920 non-null  object
 6   toSt       37920 non-null  object
 7   Direction  37920 non-null  object
 8   Date       37920 non-null  object
 9   Location   37920 non-null  object
dtypes: int64(3), object(7)
memory usage: 2.9+ MB


In [30]:
volume_df.isnull().sum()

RequestID    0
Boro         0
Vol          0
SegmentID    0
street       0
fromSt       0
toSt         0
Direction    0
Date         0
Location     0
dtype: int64