# Project Phase II
## Goal
Finding the features that can affect the number of accident in Calgary City.
All data provided is based on https://data.calgary.ca/browse

## Features
### Road Features
#### 1. Road Speed
    https://data.calgary.ca/Health-and-Safety/Speed-Limits-Map/rbfp-3tic
#### 2. Average Traffic Volume
    2018 (Traffic_Volumes_for_2018.csv)
#### 3. Road Signals
    a. Traffic Signals (Traffic_Signals.csv)
    b. Traffic Signs (Traffic_Signs.csv)
    c. Traffic cameras (Traffic_Camera_Locations.csv)
### Weather Features
    ● Temperature
    ● Visibility
    Ref: climate.weather.gc.ca
## Marking
    ● Analysing Data- (Visualization: 10 Marks + Conclusion: 5 Marks) (15 Marks)
    ● Visualizing speed limit (5 Marks)
    ● Visualizing Traffic heatmap (5 Marks)
    ● Project Demo (5 Marks)
    ● Total Mark: 30 Marks
## Due date 
To upload the report(Presentation Slides) and source code: 13-Aug 11:59 midnight.

## 1. Data Preparation

### 1.1 Data Cleaning
    ● Read the calgary boundary from City_Boundary_layer.csv.
    ● Draw a rectangle on Calgary map that shows the boundary of Calgary City.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# load boundary from csv and store long/lat separately in df
import re
city_boundary_df = pd.read_csv('City_Boundary_layer.csv')
geom = city_boundary_df.iloc[0]['the_geom']
g=re.split("POLYGON", geom)[1].strip()
temp = pd.DataFrame(re.sub('[()]', '', g).split(', '))
boundary_coordinates_df=temp[0].str.split(" ", n = 1, expand = True).astype(float)
boundary_coordinates_df.columns=['Longitude', 'Latitude']
#boundary_coordinates_df.describe()

In [3]:
from shapely.geometry import Polygon
# boundaries in four directions w, e, n, s
w = boundary_coordinates_df['Longitude'].max()
e = boundary_coordinates_df['Longitude'].min()
n = boundary_coordinates_df['Latitude'].max()
s = boundary_coordinates_df['Latitude'].min()
polygon_geom = Polygon([(w, n), (w, s), (e, s), (e, n)])
crs = 'epsg:4326'
polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])       
print(polygon.geometry)

0    POLYGON ((-113.85990 51.21243, -113.85990 50.8...
Name: geometry, dtype: geometry


In [4]:
import folium


map = folium.Map(location=[51.03011, -114.08529], zoom_start = 10)
folium.GeoJson(polygon).add_to(map)
folium.LatLngPopup().add_to(map)
map
# map.choropleth(geo_data=city_boundary_gdf['geometry'])

### 1.2 Data Merging
    ● Divide calgary to a 10x10 matrix of areas. 
      You need to investigate each area according to different features.

In [5]:

# generate 11*1 1-d array, listing longitude from w to e
x = np.linspace(w, e, num=11)[::-1]
# generate 1*11 1-d array, listing latitude from n to s
y = np.linspace(n, s, num=11)
# generate 11*11 2-d array, listing longitude and latitude separately in xv and yv
xv, yv = np.meshgrid(x, y, indexing='ij')
# https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html
# grids are named following the pattern "grid" + x + y, 
# e.g. northwest corner is named grid00 and southeast corner grid is named grid99
grid_names=[]
west_boundary=[]
east_boundary=[]
north_boundary=[]
south_boundary=[]
polygons=[]
for i in range(10):
    for j in range(10):
        # write grid names into df
        grid_names.append('grid{}{}'.format(i,j))
        # write 4 boundaries into df
        west_boundary.append(x[i])
        east_boundary.append(x[i+1])
        north_boundary.append(y[j])
        south_boundary.append(y[j+1])
        # find the nw, sw, se, ne corner coordinates and store in df
        grid_corners=[
            (xv[i][j], yv[i][j]), 
            (xv[i][j+1], yv[i][j+1]), 
            (xv[i+1][j+1], yv[i+1][j+1]), 
            (xv[i+1][j], yv[i+1][j])
        ]    
        polygons.append(Polygon(grid_corners))

grid_df = pd.DataFrame({'Grid Names':grid_names, 
                       'West Boundary':west_boundary, 
                       'East Boundary':east_boundary, 
                       'North Boundary':north_boundary, 
                       'South Boundary':south_boundary})

polygon_gdf = gpd.GeoDataFrame(crs='epsg:4326', geometry=polygons) 
grid_df = pd.concat([grid_df, polygon_gdf['geometry']], axis=1)
folium.GeoJson(polygon_gdf).add_to(map)
map
grid_df

Unnamed: 0,Grid Names,West Boundary,East Boundary,North Boundary,South Boundary,geometry
0,grid00,-114.315796,-114.270207,51.212425,51.175465,"POLYGON ((-114.31580 51.21243, -114.31580 51.1..."
1,grid01,-114.315796,-114.270207,51.175465,51.138504,"POLYGON ((-114.31580 51.17546, -114.31580 51.1..."
2,grid02,-114.315796,-114.270207,51.138504,51.101544,"POLYGON ((-114.31580 51.13850, -114.31580 51.1..."
3,grid03,-114.315796,-114.270207,51.101544,51.064584,"POLYGON ((-114.31580 51.10154, -114.31580 51.0..."
4,grid04,-114.315796,-114.270207,51.064584,51.027624,"POLYGON ((-114.31580 51.06458, -114.31580 51.0..."
...,...,...,...,...,...,...
95,grid95,-113.905494,-113.859905,51.027624,50.990663,"POLYGON ((-113.90549 51.02762, -113.90549 50.9..."
96,grid96,-113.905494,-113.859905,50.990663,50.953703,"POLYGON ((-113.90549 50.99066, -113.90549 50.9..."
97,grid97,-113.905494,-113.859905,50.953703,50.916743,"POLYGON ((-113.90549 50.95370, -113.90549 50.9..."
98,grid98,-113.905494,-113.859905,50.916743,50.879782,"POLYGON ((-113.90549 50.91674, -113.90549 50.8..."


## 2. Data Aggregation
For Each area (grid) calculate the following features: (15 Marks)

    ● Average speed limit
    ● Average Traffic volume
    ● Average number of traffic cameras
    ● Number of Traffic Signals
    ● Number of Traffic Signs
    ● Daily Weather Condition
        ○ Temperature
        ○ Visibility
    ● Target: Average number of Traffic accidents
    ● Analyse the data and interpret what is the relation between the number of
    accidents and the above feature in 2018. (Use different techniques of visualizing
    data like histogram, scatter plot, line graph, heatmap to interpret your answer)

### 2.1 Analysing a specific group of data

In [6]:
speed_df = pd.read_csv('Speed_Limits.csv')  # coord in MULTILINESTRING
volume_df = pd.read_csv('Traffic_Volumes_for_2018.csv')  # coord in MULTILINESTRING
cameras_df = pd.read_csv('Traffic_Camera_Locations.csv')  # coord in separate longitude and latitude columns
signals_df = pd.read_csv('Traffic_Signals.csv')  # coord in separate longitude and latitude columns
signs_df = pd.read_csv('Traffic_Signs.csv')  # coord in POINT columns
incident_df = pd.read_csv('Traffic_Incidents.csv')  # coord in separate longitude and latitude columns
# TODO weather_df = 
cameras_df

Unnamed: 0,Camera Location,Quadrant,Camera URL,longitude,latitude
0,Stoney Trail / Deerfoot Trail SE,SE,http://trafficcam.calgary.ca/loc86.jpg,-113.976606,50.900726
1,Memorial Drive / 52 Street E,NE,http://trafficcam.calgary.ca/loc3.jpg,-113.955818,51.053253
2,Crowchild Trail / Shaganappi Trail NW,NW,http://trafficcam.calgary.ca/loc37.jpg,-114.149379,51.098849
3,Crowchild Trail / Sarcee Trail NW,NW,http://trafficcam.calgary.ca/loc126.jpg,-114.178204,51.111255
4,Airport Trail / Barlow Trail NE,NE,http://trafficcam.calgary.ca/loc114.jpg,-114.001451,51.139352
...,...,...,...,...,...
121,Memorial Drive / Edmonton Trail NE,NE,http://trafficcam.calgary.ca/loc30.jpg,-114.050136,51.050802
122,Glenmore Trail / Barlow Trail SE,SE,http://trafficcam.calgary.ca/loc98.jpg,-113.981495,50.979446
123,Glenmore Trail / Stoney Trail SE,SE,http://trafficcam.calgary.ca/loc128.jpg,-113.929263,50.979635
124,5 Avenue / 5 Street SW,SW,http://trafficcam.calgary.ca/loc122.jpg,-114.073644,51.048677


#### 2.1.1 Average speed limit
##### Several options to interpret coordinates in MULTILINESTRING:
1. Most complicated way: Parse all coordinates inside the MULTILINESTRING, calculate the weighted speed/volumn in each sections in grids
2. Simplist way: Only use start or end or middle point of MULTILINESTRING. 
3. A little complicated way: parse whole MULTILINESTRING, count in every grid that those coordinates lays in.

#### 2.1.2 Average Traffic volume
##### Several options to interpret coordinates in MULTILINESTRING:
1. Most complicated way: Parse all coordinates inside the MULTILINESTRING, calculate the weighted speed/volumn in each sections in grids
2. Simplist way: Only use start or end or middle point of MULTILINESTRING. 
3. A little complicated way: parse whole MULTILINESTRING, count in every grid that those coordinates lays in.

#### 2.1.3 Total Number of Traffic Cameras

In [13]:
# check if points are inside of grid polygon
from shapely.geometry import Point, Polygon
grid_df.assign(Camera_Count=np.nan)
idx=0
for polygon in grid_df['geometry']:
    count=0
    for long, lat in zip(cameras_df['longitude'], cameras_df['latitude']):
        point = Point(long, lat)
        if point.within(polygon):
            count+=1
    grid_df.at[idx, 'Camera_Count'] = count
    idx+=1
grid_df['Camera_Count']

0     0.0
1     0.0
2     0.0
3     0.0
4     0.0
     ... 
95    0.0
96    0.0
97    0.0
98    0.0
99    0.0
Name: Camera_Count, Length: 100, dtype: float64

#### 2.1.4 Total Number of Traffic Signals

#### 2.1.5 Total Number of Traffic Signs

### 2.2 Traffic Accidents Analysis
#### 2.2.1 Average number of Traffic Accidents

#### 2.2.2 Daily Weather Conditions

#### 2.2.3 Correlation Analysis between features and Traffic Accidents

## 3. Visualization

### 3.1 Visualize the speed limit according to the roads. (5 Marks)

### 3.2 Show traffic heatmap of 2018. (5 Marks)