# ToDo: 
- quantify simplify function
- [DONE] build tool to "zoom in" on roads (sjoin)
- [DONE] create network of all roads, with weights
- investigate algorithm for filling up links randomly with weight threshold

# Road network of Milan

## Data description

The following code takes data from 2020 shapefiles of the milan road network, in order to perform studies on the width of the streets and to simulate the implementation of dedicated bus lanes (DBLs).
Datasets used include the following: 
- AC_VEI_AC_VEI_SUP_SR.shp: all parts of the city that vehicles have access to i.e roads, parking etc. 
- AC_PED_AC_PED_SUP_SR.shp (pedestrian access --> sidewalk network)
- AC_CIC_AC_CIC_SUP_SR.shp (Cycling network). all of these can be composed into AR_STR_AR_STR_SUP_SR.shp, which also comprises objects of transportation infrastructure (e.g road bumps and traffic dividers).
- EL_TRV_EL_TRV_TRA_SG.shp elements of the road that have tram infrastructure.

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import contextily as cx
import matplotlib.pyplot as plt
import networkx as nx

In [None]:
vehicle_path = "C:/Users/rickb/Documents/scuola/THESIS/datasets/Milan/DBT_2020/SHAPE/AC_VEI_AC_VEI_SUP_SR.shp"
gdf = gpd.read_file(vehicle_path)

Here's an example of what sort of info is contained inside our shapefiles

In [None]:
gdf.head(10)

We have
- NOME: name of the region. if it's a street it will be "via ...", but it could also be "Parcheggio", or others.

- SUBREGID is a unique identifier for the polygon.
- AC_VEI_FON is either 01 (paved street) or 02 (non paved)
- AC_VEI_LIV is either 01 (in an underpass) or 02(not in an underpass)
- AC_VEI_SED is either 01 (street level), 02 (on a bridge of sorts), 03 (in a gallery), 04 (in a dam)
- AC_VEI_ZON classifies the type of region according to whether it's a road, a roundabout, a parking lot etc. This is useful for filtering data the way we need to.

Let's see some of the possible labels for street names, besides the most common ones

In [None]:
x = gdf[~gdf['NOME'].str.contains('VIA|CORSO|PIAZZ|STRADA|LARGO', regex = True)] #excluding common street designations

In [None]:
x['NOME'].unique()

These tags could be useful if we decide to exclude certain data entries based on the NOME field.  
For now, there seem to be better ways to subdivide our data (see below).  

In [None]:
#cleaning up the dataset, and making a copy to work on

gdf.drop(['AC_VEI_FON', 'AC_VEI_LIV', 'AC_VEI_SED', 'CLASSREF'],axis = 1, inplace = True)
gdf.rename(columns={'SUBREGID':'ID', 'NOME': 'NAME', 'AC_VEI_ZON': 'TYPE'}, inplace = True)
gdf_tot = gdf.copy()

The first thing to do is to remove Tangenziali (urban highways, which are difficult to treat for now) from our dataset, and also only consider roads and intersections, and not other things such as parking spaces etc.


In [None]:
pattern1 = ('01','02')
 # portions of road (e.g not intersections or parking lots) start with 01 in TYPE
 # intersections, squares, and roundabouts start with 02 in TYPE
gdf_tot = gdf_tot[~gdf_tot['NAME'].str.contains('TANGENZIALE', regex = False)] #removing tangenziali
gdf_tot = gdf_tot[gdf_tot.TYPE.str.startswith(pattern1)]

In [None]:
pattern = '01' # portions of road (e.g not intersections or parking lots) start with 01 in AC_VEI_ZON
gdf_roads = gdf[gdf['AC_VEI_ZON'].str.startswith(pattern)]
pattern2 = ('01','0206','0204')
pattern3 = ('0206')
pattern4 = ('0204')
gdf_roads_piaz = gdf[gdf['AC_VEI_ZON'].str.startswith(pattern2)]
gdf_round = gdf[gdf['AC_VEI_ZON'].str.startswith(pattern3)]
gdf_piaz = gdf[gdf['AC_VEI_ZON'].str.startswith(pattern4)]

## Areas and perimeters of streets
We now try to plot areas and perimeters of streets, with and without using the simplify method to see if there are significant differences between the two.
our variables are gdf_roads, gdf_piaz, gdf_round, or all together in gdf_roads_piaz.


NB for the moment we're excluding tangenziali because some of the blocks are very large and make visualization difficult

In [None]:
OSM_crs = 3857
gdf_tot.to_crs(epsg=OSM_crs, inplace = True)

gdf_tot['Perimeter'] = gdf_tot.length
gdf_tot['Area'] = gdf_tot.area

In [None]:
x = gdf_tot.Area.idxmax()

In [None]:
gdf_tot.loc[x] #see largest road/intersection, to see if there's any "glitches"

Piazza della Repubblica is a large square in Milan, so it makes sense.

In [None]:
bins = range(0,10000,40)
#bins = range(np.floor(gdf_roads_piaz.Area.min()).astype(int), np.floor(gdf_roads_piaz.Area.max()).astype(int)+1,40)
fig, ax = plt.subplots(1, figsize=(4,4))
gdf_tot.Area.hist(bins = bins, ax = ax, color = 'green', alpha = 0.5)
plt.title('Areas of the streets of Milan')
#ax.set_xscale('log')
std = gdf_tot.Area.std()
mean = gdf_tot.Area.mean()
ax.set_xlabel('Area $(m^2)$')
ax.set_ylabel('Counts')
plt.annotate(f'Mean: {mean:.2f} $m^2$', xy=(0.5, 0.95), xycoords='axes fraction')
plt.annotate(f'std: {std:.2f} $m^2$', xy=(0.5, 0.85), xycoords='axes fraction')

plt.show()

In [None]:
bins = list(range(0, 2000,40))
fig, ax = plt.subplots(1, figsize=(4,4))
gdf_tot.Perimeter.hist(bins = bins, ax = ax, color = 'green', alpha = 0.5)
plt.title('Perimeters of the streets of Milan')
#ax.set_xscale('log')
std = gdf_tot.Perimeter.std()
mean = gdf_tot.Perimeter.mean()
ax.set_xlabel('Perimeter $(m^2)$')
ax.set_ylabel('Counts')
plt.annotate(f'Mean: {mean:.2f} $m^2$', xy=(0.5, 0.95), xycoords='axes fraction')
plt.annotate(f'std: {std:.2f} $m^2$', xy=(0.5, 0.85), xycoords='axes fraction')
plt.show()

### Simplified road network


In [None]:
gdf_tot['SimpArea1'] = gdf_tot.geometry.simplify(1).area
gdf_tot['SimpArea2'] = gdf_tot.geometry.simplify(5).area
gdf_tot['SimpArea3'] = gdf_tot.geometry.simplify(10).area
gdf_tot['SimpPeri1'] = gdf_tot.geometry.simplify(1).length
gdf_tot['SimpPeri2'] = gdf_tot.geometry.simplify(5).length
gdf_tot['SimpPeri3'] = gdf_tot.geometry.simplify(10).length

In [None]:
bins = list(range(0, 3000,40))
fig, axs = plt.subplots(1,4, figsize=(12,4))
simps = [0,1,5,10]
gdf_tot['Perimeter'].hist(bins = bins, ax=axs[0])
gdf_tot.SimpPeri1.hist(bins = bins, ax=axs[1], color = 'green', alpha = 0.5)
gdf_tot.SimpPeri2.hist(bins = bins, ax=axs[2], color = 'red', alpha = 0.5)
gdf_tot.SimpPeri3.hist(bins = bins, ax=axs[3], color = 'yellow', alpha = 0.5)

plt.title('perimeters of the streets of Milan')

std0 = gdf_tot.Perimeter.std()
mean0 = gdf_tot.Perimeter.mean()
axs[0].annotate(f'Mean: {mean0:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[0].annotate(f'std: {std0:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[0].set_xlabel('unsimplified')

mean1 = gdf_tot.SimpPeri1.mean()
std1 = gdf_tot.SimpPeri1.std()
axs[1].annotate(f'Mean: {mean1:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[1].annotate(f'std: {std1:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[1].set_xlabel('simplified 1')

mean2 = gdf_tot.SimpPeri2.mean()
std2 = gdf_tot.SimpPeri2.std()
axs[2].annotate(f'Mean: {mean2:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[2].annotate(f'std: {std2:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[2].set_xlabel('simplified 5')

mean3 = gdf_tot.SimpPeri3.mean()
std3 = gdf_tot.SimpPeri3.std()
axs[3].annotate(f'Mean: {mean3:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[3].annotate(f'std: {std3:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[3].set_xlabel('simplified 10')

In [None]:
bins = list(range(0, 10000,40))
fig, axs = plt.subplots(1,4, figsize=(12,4))
gdf_tot['Area'].hist(bins = bins, ax=axs[0])
gdf_tot.SimpArea1.hist(bins = bins, ax=axs[1], color = 'green', alpha = 0.5)
gdf_tot.SimpArea2.hist(bins = bins, ax=axs[2], color = 'red', alpha = 0.5)
gdf_tot.SimpArea3.hist(bins = bins, ax=axs[3], color = 'yellow', alpha = 0.5)

std0 = gdf_tot.Area.std()
mean0 = gdf_tot.Area.mean()
axs[0].annotate(f'Mean: {mean0:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[0].annotate(f'std: {std0:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[0].set_xlabel('unsimplified')

mean1 = gdf_tot.SimpArea1.mean()
std1 = gdf_tot.SimpArea1.std()
axs[1].annotate(f'Mean: {mean1:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[1].annotate(f'std: {std1:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[1].set_xlabel('simplified 1')

mean2 = gdf_tot.SimpArea2.mean()
std2 = gdf_tot.SimpArea2.std()
axs[2].annotate(f'Mean: {mean2:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[2].annotate(f'std: {std2:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[2].set_xlabel('simplified 5')

mean3 = gdf_tot.SimpArea3.mean()
std3 = gdf_tot.SimpArea3.std()
axs[3].annotate(f'Mean: {mean3:.2f} $m^2$', xy=(0.4, 0.95), xycoords='axes fraction')
axs[3].annotate(f'std: {std3:.2f} $m^2$', xy=(0.4, 0.85), xycoords='axes fraction')
axs[3].set_xlabel('simplified 10')

plt.show()

interestingly, area increases at first then decreases. why is this?

Can we show percentage change in area as a function of area, for each simplification?
$\frac{SimpArea-Area}{Area}$ times 100

In [None]:
simps = gdf_tot[['SimpArea1','SimpArea2','SimpArea3']]
fig, axs = plt.subplots(1,3, figsize=(9,4))
change1 = ((gdf_tot.SimpArea1-gdf_tot.Area)/gdf_tot.Area)*100
change2 = ((gdf_tot.SimpArea2-gdf_tot.Area)/gdf_tot.Area)*100
change3 = ((gdf_tot.SimpArea3-gdf_tot.Area)/gdf_tot.Area)*100
axs[0].plot(gdf_tot.Area,change1, 'bo', markersize = 3)
axs[1].plot(gdf_tot.Area,change2, 'bo', markersize = 3)
axs[2].plot(gdf_tot.Area,change3, 'bo', markersize = 3)
#fig, axs = plt.subplots(1,3, figsize=(9,4))

mean1 = change1.mean()
std1 = change1.std()
mean2 = change2.mean()
std2 = change2.std()
mean3 = change3.mean()
std3 = change3.std()

#axs[0].text(mean1, plt.ylim()[1]*0.9, f'Mean: {mean1:.2f} $m^2$', ha='center', va='center', color='red')
#plt.text(mean1, plt.ylim()[1]*0.85, f'Std: {std1:.2f} $m^2$', ha='center', va='center', color='red')

legend_labels = [ f'Average: {mean1:.2f} %', f'std: {std1:.2f} $m^2$']
axs[0].legend(labels=legend_labels, handlelength = 0)

legend_labels = [ f'Average: {mean2:.2f} %', f'std: {std2:.2f} $m^2$']
axs[1].legend(labels=legend_labels, handlelength = 0)
legend_labels = [ f'Average: {mean3:.2f} %', f'std: {std3:.2f} $m^2$']
axs[2].legend(labels=legend_labels, handlelength = 0)

axs[0].set_title('simplified 1')
axs[1].set_title('simplified 5')
axs[2].set_title('simplified 10')

plt.show()

## Correlation plots

correlate length of streets to simplified length

In [None]:
fig,axs = plt.subplots(1,3, figsize=(14,8))
axs[0].plot(gdf_tot.Area, gdf_tot.SimpArea1, 'bo', markersize = 3)
axs[1].plot(gdf_tot.Area, gdf_tot.SimpArea2, 'bo', markersize = 3)
axs[2].plot(gdf_tot.Area, gdf_tot.SimpArea3, 'bo', markersize = 3)
plt.title('area vs simplified area correlation')
#ax.plot(range(max(gdf_tot.Area)),'r')
for a in axs:
    
    lims = [
        np.min([a.get_xlim(), a.get_ylim()]),  # min of both axes
        np.max([a.get_xlim(), a.get_ylim()]),  # max of both axes
    ]

# now plot both limits against eachother
    a.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
    a.set_aspect('equal')
    a.set_xlim(lims)
    a.set_ylim(lims)
for a in axs:
    
 a.set_xlabel('area')
axs[0].set_title('simplify(1)')
axs[1].set_title('simplify(5)')
axs[2].set_title('simplify(10)')

plt.show()

Simplifying tends to overestimate area

In [None]:
fig,axs = plt.subplots(1,3, figsize=(8,8))
axs[0].plot(gdf_tot.Perimeter, gdf_tot.SimpPeri1, 'bo', markersize = 3)
axs[1].plot(gdf_tot.Perimeter, gdf_tot.SimpPeri2, 'bo', markersize = 3)
axs[2].plot(gdf_tot.Perimeter, gdf_tot.SimpPeri3, 'bo', markersize = 3)
#ax.plot(range(max(gdf_tot.Area)),'r')
for a in axs:
    
    lims = [
        np.min([a.get_xlim(), a.get_ylim()]),  # min of both axes
        np.max([a.get_xlim(), a.get_ylim()]),  # max of both axes
    ]

# now plot both limits against eachother
    a.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
    a.set_aspect('equal')
    a.set_xlim(lims)
    a.set_ylim(lims)
    a.set_xlabel('perimeter')
axs[0].set_title('simplify(1)')
axs[1].set_title('simplify(5)')
axs[2].set_title('simplify(10)')
plt.show()

simplifying has little effects on perimeter.

### Average width calculation: 
Area is length times width for rectangles
Perimeter is 2(length) + 2(width)
$A = lw$
$P = 2l+2w$
brings us to solve for width as   

$P = 2\frac{A}{w}+2w$ 
so  
$w^2 -\frac{P}{2}w+A = 0$

In [None]:
gdf_tot['temp'] = gdf_tot.Area/gdf_tot.Area # create column of ones
gdf_tot['SemiPeri'] = -gdf_tot.Perimeter/2 # i need it negative for the equation

def calculate_roots(row):
    coefficients = row[['temp', 'SemiPeri', 'Area']].values
    roots = np.roots(coefficients).real
    return roots

#gdf_tot['roots'] = gdf_tot.apply(calculate_roots, axis=1)
gdf_tot['roots'] = gdf_tot[['temp', 'SemiPeri', 'Area']].apply(calculate_roots, axis=1)
gdf_tot[['root1', 'root2']] = pd.DataFrame(gdf_tot['roots'].tolist(), index=gdf_tot.index)
gdf_tot['width'] = gdf_tot['root2']
gdf_tot = gdf_tot.drop(['Perimeter', 'Area', 'temp', 'SemiPeri', 'roots', 'root2'], axis = 1)

In [None]:
gdf_tot.width.hist(bins = np.linspace(0,30,100))
plt.title('Width distribution of roads in Milan')
plt.xlabel('Width (meters)')
plt.ylabel('Counts')
plt.show()

## Visualization

Various visualizations of our data

In [None]:
y = gdf_tot[gdf_tot.TYPE.str.startswith('01')]
ax = y.plot(figsize=(10, 10), alpha=0.5, edgecolor="blue")
z = gdf_tot[gdf_tot.TYPE.str.startswith('02')]
z.plot(ax=ax, edgecolor = 'red')
cx.add_basemap(ax, crs=y.crs, zoom = 13, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
plt.show()

### Color map plot:
use column =  in gdf plot method to colormap based on value of column


In [None]:
plt.close('all')

In [None]:
fig, ax = plt.subplots(1,1, figsize=(12,12))
gdf_tot.plot(ax = ax, cmap = 'viridis', column = 'width', legend = True, vmin = 5, vmax = 30 )
cx.add_basemap(ax, crs=gdf_tot.crs, source=cx.providers.Esri.WorldImagery, alpha =0.3) #providers.Esri.WorldImagery for satellite
plt.show()

In [None]:
for split in [5, 10, 15, 20]:
    fig, axes = plt.subplots(1,2, figsize=(16,8))
    gdf1 = gdf_roads_piaz[gdf_roads_piaz['root2'] <= split]
    gdf2 = gdf_roads_piaz[gdf_roads_piaz['root2'] > split]
    ax = axes[0]
    gdf1.plot(ax = ax, cmap = 'viridis',column = 'root2', legend = True, vmin = 1, vmax = 20 )
    ax.set_title('Smaller streets')
    ax = axes[1]
    gdf2.plot(ax = ax, cmap = 'viridis',column = 'root2', legend = True, vmin = 5, vmax = 30 )
    # cx.add_basemap(ax, crs=gdf2.crs, source=cx.providers.Esri.WorldImagery, alpha =0.3) #providers.Esri.WorldImagery for satellite
    ax.set_title('Larger streets')
    fig.suptitle('Threshold ' + repr(split) + ' meters')
    plt.show()

### Adding zones with sjoin
let's try and get a better division into zones, to have more in depth plots

In [None]:
administrative_path = "C:/Users/rickb/Documents/scuola/THESIS/datasets/Milan/DBT_2020_new/DBT 2020 - SHAPE/Municipi.shp"
gdf2 = gpd.read_file(administrative_path)

In [None]:
fig, ax = plt.subplots(1,1, figsize = (8,8))
gdf2.plot(ax = ax, alpha = 0.1, edgecolor = 'black')
gdf2 = gdf2.to_crs(epsg = OSM_crs)
cx.add_basemap(ax, crs=gdf_tot.crs, source=cx.providers.CartoDB.Positron, alpha =1) #providers.Esri.WorldImagery for satellite
plt.show()

In [None]:
gdf2

In [None]:
from geopandas.tools import sjoin

In [None]:
gdf_zone_tot = gdf_tot.sjoin(gdf2, how = 'inner',predicate = 'intersects') # requires gpd > 0.9

#gdf_zone = gpd.sjoin(gdf_roads_piaz, gdf2, how = 'inner', op = 'intersects') #is equivalent, with older syntax

gdf_zone_tot = gdf_zone_tot.drop(['SimpArea1', 'SimpArea2','SimpArea3', 'SimpPeri1', 'SimpPeri2', 'SimpPeri3', 'index_right'], axis =1)

We can also add neighborhoods to our subdivisions

In [None]:
neighborhood_path = "C:/Users/rickb/Documents/scuola/THESIS/datasets/Milan/Quartieri milano_real/NIL_WM.shp"
gdf_N = gpd.read_file(neighborhood_path)
gdf_N = gdf_N.to_crs(epsg = OSM_crs)
gdf_N = gdf_N.drop(['Valido_dal', 'Fonte', 'Shape_Leng', 'Shape_Area', 'OBJECTID', 'Valido_al'] ,axis=1)


gdf_N_tot = gdf_zone_tot.sjoin(gdf_N, how = 'inner',predicate = 'intersects')
gdf_N_tot = gdf_N_tot.drop(['index_right'], axis = 1)


In [None]:
fig, ax = plt.subplots(1,1, figsize = (8,8))
gdf_N.plot(ax = ax, alpha = 0.1, edgecolor = 'black')
gdf_N = gdf_N.to_crs(epsg = OSM_crs)
cx.add_basemap(ax, crs=gdf_N.crs, source=cx.providers.CartoDB.Positron, alpha =1) #providers.Esri.WorldImagery for satellite
plt.show()

Finally, let's split our main dataframe into two: one with only roads, and one with only intersections.


In [None]:
pattern2 = ('01')
pattern3 = ('02') 
gdf_no_int = gdf_N_tot[gdf_tot.TYPE.str.startswith(pattern2)]
gdf_int = gdf_N_tot[gdf_tot.TYPE.str.startswith(pattern3)]

In [None]:
#Now isolate an example neighborhood
gdf_N_Stadera = gdf_N_tot[gdf_N_tot['NIL'] == 'STADERA - CHIESA ROSSA - Q.RE TORRETTA - CONCA FALLATA']
gdf_int_Stadera = gdf_int[gdf_int['NIL'] == 'STADERA - CHIESA ROSSA - Q.RE TORRETTA - CONCA FALLATA']
gdf_no_int_Stadera = gdf_no_int[gdf_no_int['NIL'] == 'STADERA - CHIESA ROSSA - Q.RE TORRETTA - CONCA FALLATA']
#and a single street in that neighborhood
gdf_N_Volv = gdf_N_Stadera[gdf_N_Stadera['NAME'] == 'VIA VOLVINIO']
gdf_no_int_Volv = gdf_no_int[gdf_no_int['NAME'] == 'VIA VOLVINIO']
gdf_int_Volv = gdf_int[gdf_int['NAME'] == 'VIA VOLVINIO']


Here's an example plot of a neighborhood:

In [None]:
fig, ax = plt.subplots(1,1, figsize = (8,8))
gdf_N_Stadera.plot(ax = ax, alpha = 0.5)
cx.add_basemap(ax, crs=gdf_N.crs, zoom = 15, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
#source=cx.providers.CartoDB.Positron)
plt.title("The Stadera Neighborhood of Milan")
plt.show()

## Creating custom ranges by distance

Here's a function to create a Geodataframe with all roads within a given distance from the road given as input

In [None]:
def within_dist(street, dist, gdf):
    #function creates geodataframe with all streets of gdf within distance dist (in meters) of street.
    #street is a geodataframe, dist is a positive number, and gdf is the geodataframe dataset.
    temp = street.copy()
    temp.geometry = temp.geometry.buffer(dist)
    temp = temp.filter(['geometry']) #so sjoin doesn't give suffixes and i don't have to rename later
    gdf_distanced = gdf.sjoin(temp, how='inner', predicate='intersects')
    gdf_distanced = gdf_distanced.dropna()
    gdf_distanced = gdf_distanced.drop_duplicates(subset=['width'], keep='first') #removes streets within 2 buffers of a polygon
    gdf_distanced = gdf_distanced.iloc[:,:-1] #drops index_R column
    return gdf_distanced

In [None]:
M = 100
gdf_test = within_dist(gdf_N_Volv, M, gdf_tot)


In [None]:
fig, ax = plt.subplots(1,1, figsize = (8,8))

gdf_test.plot(ax = ax, alpha = 0.5)
cx.add_basemap(ax, crs=gdf_N.crs, zoom = 15, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
plt.title(f"roads within {M} meters from Via Volvinio")
plt.show()

Let's try to plot roads color-coding by width

In [None]:
fig, ax = plt.subplots(1,1, figsize=(8,8))
gdf_test.plot(ax = ax, cmap = 'viridis', column = 'width', legend = True, vmin = 1, vmax = 30 )
cx.add_basemap(ax, crs=gdf_test.crs, source=cx.providers.Esri.WorldImagery, alpha =0.3) #providers.Esri.WorldImagery for satellite
plt.title(f"roads within {M} meters from Via Volvinio \n color coded by width")
plt.show()

# The Network
It should be pretty straightforward: make intersections the nodes, and make roads the edges. road width are the weights. 
However, there is a problem with what exactly it means to be an intersection.
NB for now we will consider all streets in the manner which is most convenient, i.e as two-way streets, unless otherwise specified.
### Examining intersections
Let's examine the case of a relatively simple street, Via Volvinio



In [None]:
fig, ax = plt.subplots(1,1, figsize = (6,6))

gdf_no_int_Volv.plot(ax = ax, alpha = 0.5)
cx.add_basemap(ax, crs=gdf_N.crs, zoom = 17, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
plt.show()

The upper part of the street divides in two, because of a barrier, then there is a blank space (considered an intersection), even though the road continues onwards without being intersected. how should this be considered? a new node seems excessive, but then what can we do? maybe after the fact i can say that that node must be eliminated since it connects to only two roads, one of which with a double connection?

In [None]:
fig, ax = plt.subplots(1,1, figsize = (6,6))

gdf_N_Volv.plot(ax = ax, column = 'TYPE', alpha = 0.5)
cx.add_basemap(ax, crs=gdf_N.crs, zoom = 17, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
plt.show()

According to our pdf document, 0205 corresponds to "incrocio". I guess it's ok to consider it as such.

## Actually trying to create the network

remember, my network has:
- intersections as nodes.
- the actual roads as edges.
- their width and name will be weights/edge attributes.


To find the edges of the network, we take an intersection and use the within_dist function with a distance = 1 to find
all roads that are immediately adjacent to the intersection. 
These are the "stubs" of our graph, i.e lines that connect to a node and nothing else.  
When we do this for all intersection, the edges of the network will simply be the common stubs between pairs of nodes.  
Our road network will be a MultiGraph, because some intersections may be connected by two or more different roads.

In [None]:
#we define a variation of the within_dist function. This one keeps duplicate entries because they are useful for finding stubs.

def within_dist_dupes(street, dist, gdf):
    #function creates geodataframe with all streets of gdf within distance dist (in meters) of street.
    #street is a geodataframe, dist is a positive number, and gdf is the geodataframe dataset.
    temp = street.copy()
    temp.geometry = temp.geometry.buffer(dist)
    temp = temp.filter(['geometry']) #so sjoin doesn't give suffixes and i don't have to rename later
    gdf_distanced = gdf.sjoin(temp, how='inner', predicate='intersects')
    gdf_distanced = gdf_distanced.dropna()
    return gdf_distanced

#we define a variation of the within_dist function. This one keeps duplicate entries because they are useful for finding stubs.

def within_dist_dupes(street, dist, gdf):
    #function creates geodataframe with all streets of gdf within distance dist (in meters) of street.
    #street is a geodataframe, dist is a positive number, and gdf is the geodataframe dataset.
    temp = street.copy()
    temp.geometry = temp.geometry.buffer(dist)
    temp = temp.filter(['geometry']) #so sjoin doesn't give suffixes and i don't have to rename later
    gdf_distanced = gdf.sjoin(temp, how='inner', predicate='intersects')
    gdf_distanced = gdf_distanced.dropna()
    return gdf_distanced

In [None]:
def make_edges(gdf_tot):
    #takes dataset with roads and intersections, creates edgelist of nodes with weights of edges
    pattern1, pattern2 = '01', '02'
    no_ints = gdf_tot[gdf_tot.TYPE.str.startswith(pattern1)]
    ints = gdf_tot[gdf_tot.TYPE.str.startswith(pattern2)]
    #we need indices from 0 --> reset
    ints.reset_index(inplace = True, drop = True)
    no_ints.reset_index(inplace = True, drop = True)
    stubs = within_dist_dupes(ints, 1, no_ints) #all stubs, i.e all roads connected to all nodes
    grouped = stubs.groupby('index_right') #one dataframe for each node
    edges = {} # will contain intersections of each node 
    edge_list = pd.DataFrame(columns = ['from','to','weight'])
    for node, group in grouped:
        stubs = stubs[stubs['index_right'] != node] #removing "self" from gdf that we will merge onto, to avoid self connections. also removes redundancies  
        edges[node] = pd.merge(group,stubs, on = 'ID', how = 'inner')
        edge_list_temp = pd.DataFrame({'from': edges[node].index_right_x, 'to': edges[node].index_right_y, 'weight': edges[node].width_x})
        edge_list = pd.concat([edge_list,edge_list_temp])
    return edge_list
#now edges should be a dictionary where each key will have only the nodes it is connected to as values.
#the final step would be to make a list where each key is 

In [None]:
edge_list_t = make_edges(gdf_N_Stadera)
G = nx.from_pandas_edgelist(edge_list_t, 'from', 'to', edge_attr=["weight"] , create_using=nx.MultiGraph())

In [None]:
edge_list_t.head(-10) 

The first two nodes have several connections to a single other node because they are kind of strange, peripheral roads.  
I don't think this is a problem in general, considering most roads are well behaved.

In [None]:
temp = gdf_N_Stadera[gdf_N_Stadera.NAME == 'VIA DEL MARE']
t = within_dist(temp, 5, gdf_N_Stadera)
fig, ax = plt.subplots(1,1, figsize = (6,6))
t.plot(ax = ax, column = 'NAME', alpha = 0.5)
cx.add_basemap(ax, crs=gdf_N.crs, zoom = 16, source=cx.providers.CartoDB.Positron) #providers.Esri.WorldImagery for satellite
plt.title('roads with multiple connections between each other')
plt.show()

In [None]:
pos = nx.random_layout(G)
nx.draw_networkx_nodes(G, pos, node_color = 'r', node_size = 100, alpha = 1)
ax = plt.gca()
for e in G.edges:
    ax.annotate("",
                xy=pos[e[0]], xycoords='data',
                xytext=pos[e[1]], textcoords='data',
                arrowprops=dict(arrowstyle="-", color="0.5",
                                shrinkA=5, shrinkB=5,
                                patchA=None, patchB=None,
                                connectionstyle="arc3,rad=rrr".replace('rrr',str(0.3*e[2])
                                ),
                                ),
                )
plt.axis('off')
plt.show()

# Network with OSMnx package

to be completed

## Pedestrian Access network
We can also visualize and work with the pedestrian network. This is more than just sidewalks, it's all pedestrian exclusive zones 

In [None]:
pedestrian_path = "C:/Users/rickb/Documents/scuola/THESIS/datasets/Milan/DBT_2020/SHAPE/AC_PED_AC_PED_SUP_SR.shp"
gdf2 = gpd.read_file(shapefile_path)

In [None]:
gdf2_crs = gdf2[gdf2['AC_PED_ZON'] != '03'].to_crs(epsg=4326) #removing pedestrian islands in the middle of roads. useless for mobility

## Cycling access network

In [None]:
bicycle_path = "C:/Users/rickb/Documents/scuola/THESIS/datasets/Milan/DBT_2020/SHAPE/AC_PED_AC_PED_SUP_SR.shp"
gdf3 = gpd.read_file(shapefile_path)

In [None]:
gdf3

## Tram infrastructure network