# Project Phase II
## Goal
Finding the features that can affect the number of accident in Calgary City.
All data provided is based on https://data.calgary.ca/browse

## Features
### Road Features
#### 1. Road Speed
    https://data.calgary.ca/Health-and-Safety/Speed-Limits-Map/rbfp-3tic
#### 2. Average Traffic Volume
    2018 (Traffic_Volumes_for_2018.csv)
#### 3. Road Signals
    a. Traffic Signals (Traffic_Signals.csv)
    b. Traffic Signs (Traffic_Signs.csv)
    c. Traffic cameras (Traffic_Camera_Locations.csv)
### Weather Features
    ● Temperature
    ● Visibility
    Ref: climate.weather.gc.ca
## Marking
    ● Analysing Data- (Visualization: 10 Marks + Conclusion: 5 Marks) (15 Marks)
    ● Visualizing speed limit (5 Marks)
    ● Visualizing Traffic heatmap (5 Marks)
    ● Project Demo (5 Marks)
    ● Total Mark: 30 Marks
## Due date 
To upload the report(Presentation Slides) and source code: 13-Aug 11:59 midnight.

## 1. Data Preparation

### 1.1 Data Cleaning
    ● Read the calgary boundary from City_Boundary_layer.csv.
    ● Draw a rectangle on Calgary map that shows the boundary of Calgary City.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# load boundary from csv and store long/lat separately in df
import re
city_boundary_df = pd.read_csv('City_Boundary_layer.csv')
geom = city_boundary_df.iloc[0]['the_geom']
g=re.split("POLYGON", geom)[1].strip()
temp = pd.DataFrame(re.sub('[()]', '', g).split(', '))
boundary_coordinates_df=temp[0].str.split(" ", n = 1, expand = True).astype(float)
boundary_coordinates_df.columns=['Longitude', 'Latitude']
#boundary_coordinates_df.describe()

In [3]:
from shapely.geometry import Polygon
# boundaries in four directions w, e, n, s
w = boundary_coordinates_df['Longitude'].max()
e = boundary_coordinates_df['Longitude'].min()
n = boundary_coordinates_df['Latitude'].max()
s = boundary_coordinates_df['Latitude'].min()
polygon_geom = Polygon([(w, n), (w, s), (e, s), (e, n)])
crs = 'epsg:4326'
polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])       
print(polygon.geometry)

0    POLYGON ((-113.85990 51.21243, -113.85990 50.8...
Name: geometry, dtype: geometry


In [4]:
import folium


map = folium.Map(location=[51.03011, -114.08529], zoom_start = 10)
folium.GeoJson(polygon).add_to(map)
folium.LatLngPopup().add_to(map)
map
# map.choropleth(geo_data=city_boundary_gdf['geometry'])

### 1.2 Data Merging
    ● Divide calgary to a 10x10 matrix of areas. 
      You need to investigate each area according to different features.

In [5]:

# generate 11*1 1-d array, listing longitude from w to e
x = np.linspace(w, e, num=11)[::-1]
# generate 1*11 1-d array, listing latitude from n to s
y = np.linspace(n, s, num=11)
# generate 11*11 2-d array, listing longitude and latitude separately in xv and yv
xv, yv = np.meshgrid(x, y, indexing='ij')
# https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html
# grids are named following the pattern "grid" + x + y, 
# e.g. northwest corner is named grid00 and southeast corner grid is named grid99
grid_names=[]
west_boundary=[]
east_boundary=[]
north_boundary=[]
south_boundary=[]
polygons=[]
for i in range(10):
    for j in range(10):
        # write grid names into df
        grid_names.append('grid{}{}'.format(i,j))
        # write 4 boundaries into df
        west_boundary.append(x[i])
        east_boundary.append(x[i+1])
        north_boundary.append(y[j])
        south_boundary.append(y[j+1])
        # find the nw, sw, se, ne corner coordinates and store in df
        grid_corners=[
            (xv[i][j], yv[i][j]), 
            (xv[i][j+1], yv[i][j+1]), 
            (xv[i+1][j+1], yv[i+1][j+1]), 
            (xv[i+1][j], yv[i+1][j])
        ]    
        polygons.append(Polygon(grid_corners))

grid_df = pd.DataFrame({'Grid Names':grid_names, 
                       'West Boundary':west_boundary, 
                       'East Boundary':east_boundary, 
                       'North Boundary':north_boundary, 
                       'South Boundary':south_boundary})

polygon_gdf = gpd.GeoDataFrame(crs='epsg:4326', geometry=polygons) 
grid_df = pd.concat([grid_df, polygon_gdf['geometry']], axis=1)
folium.GeoJson(polygon_gdf).add_to(map)
map

## 2. Data Aggregation
For Each area (grid) calculate the following features: (15 Marks)

    ● Average speed limit
    ● Average Traffic volume
    ● Average number of traffic cameras
    ● Number of Traffic Signals
    ● Number of Traffic Signs
    ● Daily Weather Condition
        ○ Temperature
        ○ Visibility
    ● Target: Average number of Traffic accidents
    ● Analyse the data and interpret what is the relation between the number of
    accidents and the above feature in 2018. (Use different techniques of visualizing
    data like histogram, scatter plot, line graph, heatmap to interpret your answer)

### Analysing a specific group of data

In [6]:
speed_df = pd.read_csv('Speed_Limits.csv')
# speed limit uses multilinestring, which represents an entire road section, not a point.
# how to categorized if in both zone
# duplicate count in both zone?
volume_df = pd.read_csv('Traffic_Volumes_for_2018.csv')
cameras_df = pd.read_csv('Traffic_Camera_Locations.csv')
signals_df = pd.read_csv('Traffic_Signals.csv')
signs_df = pd.read_csv('Traffic_Signs.csv')
incident_df = pd.read_csv('Traffic_Incidents.csv')
# TODO weather_df = 
speed_df.iloc[0]['multiline']

'MULTILINESTRING ((-114.073657541927 50.913577283979, -114.073651712573 50.913632013001, -114.073643732584 50.913725187947, -114.073634587776 50.91381558214, -114.073631723247 50.913905387125), (-114.073631723247 50.913905387125, -114.073628017171 50.914021535851, -114.073629382726 50.914151554647, -114.073636539247 50.914271837283, -114.073643667251 50.914375828372, -114.073657814012 50.914492017071), (-114.073657814012 50.914492017071, -114.073664433685 50.914548703432, -114.07367509928 50.914618951254, -114.073687008866 50.914690768832, -114.07370139649 50.914759444658, -114.073715157308 50.914823803911, -114.073730161507 50.914889339159, -114.073737038939 50.914919163422, -114.073751400655 50.914971355311, -114.073758898767 50.91500157204, -114.073763902379 50.915020896341), (-114.073763902379 50.915020896341, -114.073778875428 50.915070244323, -114.073795110735 50.915129106452, -114.073816943444 50.915194245408, -114.073837530373 50.915256637824, -114.07386311831 50.915340220843, 

## 3. Visualization

### 3.1 Visualize the speed limit according to the roads. (5 Marks)

### 3.2 Show traffic heatmap of 2018. (5 Marks)