# **Preparation of PIMUS Network for GTAModel**

### **General pipeline overview**

![methology](imgs/pipeline.jpg)

### **Pipeline**

* **Install modules**

In [1]:
!pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu11 dask-cudf-cu11 cuml-cu11 cugraph-cu11 cuspatial-cu11 cuproj-cu11 cuxfilter-cu11 cucim



Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting cudf-cu11
  Using cached https://pypi.nvidia.com/cudf-cu11/cudf_cu11-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (499.7 MB)
Collecting dask-cudf-cu11
  Using cached https://pypi.nvidia.com/dask-cudf-cu11/dask_cudf_cu11-23.8.0-py3-none-any.whl (81 kB)
Collecting cuml-cu11
  Using cached https://pypi.nvidia.com/cuml-cu11/cuml_cu11-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1081.8 MB)
Collecting cugraph-cu11
  Using cached https://pypi.nvidia.com/cugraph-cu11/cugraph_cu11-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1212.4 MB)
Collecting cuspatial-cu11
  Using cached https://pypi.nvidia.com/cuspatial-cu11/cuspatial_cu11-23.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.2 MB)
Collecting cuproj-cu11
  Using cached https://pypi.nvidia.com/cuproj-cu11/cuproj_cu11-23.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)

**Convenience of use RAPIDS**


![speedup](https://developer-blogs.nvidia.com/wp-content/uploads/2023/03/performance-comparison-pandas-cudf-1-625x386.png)

* **Import modules**

In [1]:
import cudf
import cuspatial
import cupy
import geopandas
from glob import glob
import pandas as pd
import numpy as np
from shapely.geometry import *
from shapely import wkt
import zipfile

In [2]:
def cleanByList(df,column: str, values: list):
    for v in values:
        df = df[df[column].str.lower() != v]
    return df

* **Extract shapefiles from each object of the network**

In [3]:
#unzip the Network
!unzip NetModel/BaseVisumProject/Networks.zip
#get a dict with objects paths
objFiles = {(f.split('/')[-1]).split('.')[0]: f for f in glob('Networks/*.csv')}
objFiles

Archive:  NetModel/BaseVisumProject/Networks.zip
   creating: Networks/
  inflating: Networks/Vehiclecombinationitems.csv  
  inflating: Networks/Turns_3.csv    
  inflating: Networks/Demandsegments.csv  
  inflating: Networks/Turns_2.csv    
  inflating: Networks/Faresystems.csv  
  inflating: Networks/Network.csv    
  inflating: Networks/Stoppoints.csv  
  inflating: Networks/Vehiclejourneysections.csv  
  inflating: Networks/Base_Year_2020_network.net  
  inflating: Networks/Turns_1.csv    
  inflating: Networks/Blockitemtypes.csv  
  inflating: Networks/Turns_5.csv    
  inflating: Networks/FaresystemtickettypesbyDSeg.csv  
  inflating: Networks/Stops.csv      
  inflating: Networks/Turns_4.csv    
  inflating: Networks/Timeprofiles.csv  
  inflating: Networks/Turns_6.csv    
  inflating: Networks/Validdays.csv  
  inflating: Networks/Vehiclejourneys.csv  
  inflating: Networks/Faceitems.csv  
  inflating: Networks/Turns_7.csv    
  inflating: Networks/Transferfares.csv  
  inflat

{'Transferwalktimesbetweenstopareas': 'Networks/Transferwalktimesbetweenstopareas.csv',
 'Turns_8': 'Networks/Turns_8.csv',
 'Turns_9': 'Networks/Turns_9.csv',
 'Faceitems': 'Networks/Faceitems.csv',
 'Modes': 'Networks/Modes.csv',
 'Lineroutes': 'Networks/Lineroutes.csv',
 'Operators': 'Networks/Operators.csv',
 'Edges': 'Networks/Edges.csv',
 'Transferfares': 'Networks/Transferfares.csv',
 'Blockitemtypes': 'Networks/Blockitemtypes.csv',
 'Turns_5': 'Networks/Turns_5.csv',
 'Timeprofiles': 'Networks/Timeprofiles.csv',
 'Validdays': 'Networks/Validdays.csv',
 'Links_1': 'Networks/Links_1.csv',
 'Mainzones': 'Networks/Mainzones.csv',
 'Turns_10': 'Networks/Turns_10.csv',
 'Linktypes': 'Networks/Linktypes.csv',
 'Zones': 'Networks/Zones.csv',
 'Calendarperiods': 'Networks/Calendarperiods.csv',
 'Surfaces': 'Networks/Surfaces.csv',
 'Turns_3': 'Networks/Turns_3.csv',
 'User-definedattributes': 'Networks/User-definedattributes.csv',
 'Timeprofileitems': 'Networks/Timeprofileitems.csv',
 '

* **Read and filter links for walk only** 

In [4]:
#Extract keys of links
kLinks = [key for key in objFiles if key.lower().startswith('links')]
links = cudf.read_csv(objFiles[kLinks[0]])
#Import links into a cuDF
for i in range(len(kLinks)-1):
    links = cudf.concat([links, cudf.read_csv(objFiles[kLinks[i+1]])])
print(links.shape)
type(links)

(276474, 278)


cudf.core.dataframe.DataFrame

In [9]:
#cleaning filtering links for only walk
# *************** better to keep walk only links***********************

#removing nulls values
links_c = links [~links['TSYSSET'].isnull()]
print(len(links_c))
#removing only walk links
links_c_f = links_c [links_c['TSYSSET'] != 'W']
print(len(links_c_f))
#displaye head
links_c_f

273417
42869


Unnamed: 0,NO,FROMNODENO,TONODENO,NAME,TYPENO,TSYSSET,USERDIRECTION,LENGTH,NUMLANES,PLANNO,...,TYPE_LINK_CGA,TYPE_LINK_PRV,VEL_CGA,VEL_FLUJO,VEL_PRV,VEL_PUB,VIALIDAD_PROY,VOL_CAP,VOL_TESC,VOL_TPER
1,1,107948,107820,,91,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C,R,W",1,0.503km,2,0,...,99,91,22.0,40,30.0,26.795,,1,0.00,0.0
5,3,2330,2329,,90,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C,R,W",1,0.263km,2,0,...,99,90,19.0,30,28.0,23.822,0.0,0,0.00,0.0
10,6,2335,2336,,90,"Al,Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C,R,W",0,0.332km,1,0,...,99,90,33.0,30,33.0,22.787,,0,0.00,0.0
11,6,2336,2335,,90,"Al,Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C,R,W",1,0.332km,1,0,...,99,90,39.0,30,39.0,12.559,,0,14.24,0.0
14,8,2339,2340,,90,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C,R,W",0,0.169km,2,0,...,99,90,11.0,30,14.0,12.349,0.0,0,0.00,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76313,201549,140382,131119,,0,"Auto_C,R,W",1,0.044km,1,0,...,99,0,0.0,30,,0.0,,0,0.00,0.0
76316,201553,116460,140834,,94,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C",0,11.980km,3,0,...,99,94,60.0,80,60.0,0.0,0.0,1,0.00,0.0
76317,201553,140834,116460,,94,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C",1,11.980km,3,0,...,99,94,60.0,80,60.0,0.0,0.0,1,0.00,0.0
76318,201554,140287,140834,,94,"Auto_C,Ca1_C,Ca2_C,Ca_BD,Cu_BD,Cu_C",0,6.652km,3,0,...,99,94,60.0,80,60.0,0.0,0.0,1,0.00,0.0


* **Filter all the objects of TSys proposals**

In [8]:
# extract objects with the TSys proposals
props = ['tren_aero', 'regional', 'tren', 'corredor','alimentadora corredor']
# get the files to modify where the TSys appear
p = !grep -lR "Corredor"
# remove non csv files and get the key name
kPropFiles = [(v.split('/')[-1]).split('.')[0] for v in p if v.endswith(".csv")]
print(f'[INFO] Objects to clean: {kPropFiles}')
# get the objects to clean
objs2clean = [cudf.read_csv(objFiles[k]) for k in kPropFiles]
# extract codes and no. of the proposals from TransportSystems object
cProps = [list((objs2clean[1][objs2clean[1]['NAME'].str.lower() == p].CODE).to_dict().values()) for p in props]
cProps = [c[0] for c in cProps if c != []]
print(cProps)
nProps = [list((objs2clean[0][objs2clean[0]['NAME'].str.lower() == p].NO).to_dict().values()) for p in props]
nProps = [c[0] for c in nProps if c != []]
print(nProps)
#cleaning objects by TSys proposals names
objs_c = [cleanByList(o,'NAME', props) for o in objs2clean]
objs_c[0]

[INFO] Objects to clean: ['Faresystems', 'Transportsystems']
['RE', 'S', 'C', 'AC']
[13, 12, 11, 10, 9]


Unnamed: 0,NO,NAME,RANK,JOINTFARECOMPUTATION,FAREWEIGHT,INITIALFARE,TSYSSETNONPUTLINE
0,1,Alimentadora,1,0,1.0,12.0,
1,2,Metro,1,0,1.0,4.5,
2,3,Remanente,1,0,1.0,12.0,
3,4,BRT,1,0,1.0,14.6,
4,5,TM,1,0,1.0,4.5,
5,6,MBUS,1,0,1.0,12.0,
6,7,Remanente 15,1,0,1.0,15.0,
7,8,Remanente 17,1,0,1.0,17.0,


In [38]:
#cleaning objects by TSys proposals codes
c = !grep -lR "MBUS"
c


#cleaning fare objects by TSys proposals no.


['Networks/Modes.csv',
 'Networks/Links_1.csv',
 'Networks/Linktypes.csv',
 'Networks/Turns_3.csv',
 'Networks/Stoppoints.csv',
 'Networks/Links_2.csv',
 'Networks/Turns_11.csv',
 'Networks/Faresystems.csv',
 'Networks/Lines.csv',
 'Networks/Turns_4.csv',
 'Networks/Transportsystems.csv',
 'Networks/Links_3.csv',
 'Networks/Faresupplements.csv',
 'Networks/Turns_2.csv',
 'Networks/Base_Year_2020_network.net',
 'Networks/Vehicleunits.csv']

* **Import, clean and prune non-stop nodes without links**

In [102]:
#read nodes and stop points
stop_points = cuspatial.from_geopandas( geopandas.read_file(shp_files[3]))
nodes = cuspatial.from_geopandas( geopandas.read_file(shp_files[1]))

In [94]:
#display nodes
print(len(nodes))
nodes.head()

128975


Unnamed: 0,NO,CODE,NAME,TYPENO,CONTROLT~1,XCOORD,YCOORD,T0PRT,VOLPRT,SCTYPE,geometry
0,2327,,,0,0,-11152520.0,2957484.0,0min,0,,POINT (-11152515.334 2957484.208)
1,2328,,,0,0,-11152520.0,2957432.0,0min,0,,POINT (-11152520.894 2957432.260)
2,2329,,,0,0,-11139560.0,2957192.0,0min,1556,,POINT (-11139556.232 2957191.894)
3,2330,,,0,0,-11139520.0,2957483.0,0min,0,,POINT (-11139520.983 2957482.604)
4,2331,,,0,0,-11134990.0,2954803.0,0min,0,,POINT (-11134987.677 2954803.012)


In [103]:
#display stop points
print(len(stop_points))
stop_points.head()

4842


Unnamed: 0,NO,STOPAREANO,CODE,NAME,TYPENO,DIRECTED,NODENO,FROMNODENO,LINKNO,NUMLINES,PASSBOAR~1,PASSALIG~2,PASSORIG~3,PASSDEST~4,PASSTRAN~5,PASSTHRO~6,PASSTHRO~7,geometry
0,2329,2329,2329,,0,0,2329,,,6,741,0,741,0,0,590,0,POINT (-11139556.232 2957191.894)
1,2330,2330,2330,,0,0,2330,,,3,0,0,0,0,0,0,590,POINT (-11139520.983 2957482.604)
2,2336,2336,2336,,0,0,2336,,,2,0,0,0,0,0,0,51,POINT (-11134932.747 2963393.918)
3,2354,2354,2354,,0,0,2354,,,1,0,0,0,0,0,285,0,POINT (-11133992.816 2925003.157)
4,2363,2363,2363,,0,0,2363,,,12,1686,57,1686,57,0,10073,0,POINT (-11134471.733 2953382.107)


In [97]:
#get the different nodes in the cleaned network
n1 = links_c_f ['FROMNODENO'].unique()
print(len(n1))
n2 = links_c_f ['TONODENO'].unique()
print(len(n2))
#intersection
n = n1.loc [n1.index.intersection(n2.index)]


23615
23986


* **Remove Network folder with csvs**

In [39]:
!rm -rf Networks