>
> # MaaS Sim tutorial
>
> ## Data structures, pandas DataFrames
>
-----
MaasSim uses:
* `pandas` to store, read and load the data,
* `.csv` format whenever we stor the
* python native `list()` and `dict()` whenever speed is needed, sporadicaly `NamedTuple`

## 1. Main containers (data structures)
* `inData` is a nested dictionary of variables (*DotMap - see below*) being input for the simulations
* `params` is a *DotMap* of all the parameters needed to run the simulation, it is defined in `params.ipynb` (see tutorial on Config)
* `sim` is a *DotMap* of all the variables changing during the simulation

# 2. `inData`

In [1]:
import os, sys # add MaaSSim to path (not needed if MaaSSim is already in path)
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from MaaSSim.data_structures import structures as inData
from MaaSSim.simulators import simulate
import MaaSSim.utils

In [4]:
params = MaaSSim.utils.get_config('../../data/config/default.json', set_t0 = True) # load the default
inData = MaaSSim.utils.load_G(inData, params) 
inData = MaaSSim.utils.prep_supply_and_demand(inData, params)

tables in inData

In [5]:
keys = [print('inData.'+key) for key in inData.keys()]

inData.passengers
inData.vehicles
inData.platforms
inData.requests
inData.schedule
inData.G
inData.nodes
inData.skim
inData.stats


In [6]:
inData.passengers.head(3) # passengers with their id, position, and status

Unnamed: 0,pos,event,platforms
0,45020096,,[0]
1,500966010,,[0]
2,44990927,,[0]


In [7]:
inData.vehicles.head(3) # vehicles with their id, position, and status

Unnamed: 0,pos,event,shift_start,shift_end,platform,expected_income
1,44976029,driverEvent.STARTS_DAY,0,86400,0,
2,472323872,driverEvent.STARTS_DAY,0,86400,0,
3,510874621,driverEvent.STARTS_DAY,0,86400,0,


In [8]:
inData.requests.treq = inData.requests.treq.dt.round('1s') # for display only
inData.requests.tarr = inData.requests.tarr.dt.round('1s')
inData.requests.ttrav = inData.requests.ttrav.dt.round('1s')
inData.requests.head().dropna(axis=1)

Unnamed: 0,pax_id,origin,destination,treq,ttrav,tarr,shareable,dist,platform
0,0,45020096,44985747,2020-10-01 10:14:54,00:02:52,2020-10-01 10:17:46,False,1726,0
1,1,500966010,2802457954,2020-10-01 10:15:58,00:04:46,2020-10-01 10:20:44,False,2867,0
2,2,44990927,44970065,2020-10-01 10:21:16,00:02:50,2020-10-01 10:24:06,False,1700,0
3,3,45000745,5040974850,2020-10-01 10:24:36,00:02:10,2020-10-01 10:26:46,False,1301,0
4,4,45022840,45017583,2020-10-01 10:28:37,00:00:21,2020-10-01 10:28:58,False,217,0


In [9]:
print('each request defined through:')
for col in inData.requests.head().dropna(axis=1).columns:
    print(col)

each request defined through:
pax_id
origin
destination
treq
ttrav
tarr
shareable
dist
platform


In [10]:
inData.G # graph (networkX object)

<networkx.classes.multidigraph.MultiDiGraph at 0x127758f90>

In [11]:
inData.nodes.head() # nodes

Unnamed: 0,y,x,osmid,highway
45008896,52.046208,4.390193,45008896,
45035529,52.052429,4.402417,45035529,
662403083,52.038869,4.407402,662403083,
520773643,52.049816,4.385876,520773643,traffic_signals
44998670,52.044315,4.400431,44998670,


In [12]:
inData.skim.head().iloc[:,1:5] #node x node skim matrix (distance)

Unnamed: 0,45009726,45007341,45006058,45004316
45008896,35,121,158,214
45035529,3299,3385,3422,3294
662403083,2453,2539,2576,2448
520773643,741,827,864,735
44998670,1350,1436,1473,1345


In [14]:
inData.skim[MaaSSim.utils.rand_node(inData.nodes)][MaaSSim.utils.rand_node(inData.nodes)] #querying the matrix 
#first we query the row from the pd.dataFrame and the we query the field in the pd.Series
# for more advanced calls see e.g. `sim/interactions/match():
# veh_times = inData.skim[sim.vehicles.loc[sim.vehQ].pos].loc[request.origin]

306

In [15]:
inData.stats # basic network stats needed for the demand

DotMap(center=45002624, radius=1732.0)

####  DotMaps
thanks to the `DotMap` module (or ratehr snippet), we do the following, and keep the nested variables tidy (e.g. in `inData` and `params`

In [16]:
from dotmap import DotMap
foo = DotMap()
foo.name = 'My Name'
foo.myData = [1,2,3,4]

In [17]:
foo.myData

[1, 2, 3, 4]

# Data types

In [23]:
import pandas as pd
import numpy as np

In [24]:
pd.DataFrame(columns=['id','pos','status']).set_index('id') #df - a sql-like table

Unnamed: 0_level_0,pos,status
id,Unnamed: 1_level_1,Unnamed: 2_level_1


In [25]:
inData.requests.loc[3] #single row of df

pax_id                           3
origin                    45000745
destination             5040974850
treq           2020-10-01 10:24:36
tdep                           NaN
ttrav              0 days 00:02:10
tarr           2020-10-01 10:26:46
tdrop                          NaN
shareable                    False
schedule_id                    NaN
dist                          1301
platform                         0
Name: 3, dtype: object

In [26]:
params.t0 = pd.Timestamp.now() #datetime and timedelta
treq = np.random.normal(params.simTime*60*60/2, 
                                params.demand_structure.temporal_dispertion * params.simTime*60*60 /2,
                                params.nP) # apply normal distribution on request times

inData.requests['treq'] = [params.t0.floor('1s')+pd.Timedelta(int(_),'s') for _ in treq]
inData.requests.treq.head()

0   2020-10-01 11:11:01
1   2020-10-01 11:17:28
2   2020-10-01 11:09:17
3   2020-10-01 11:30:28
4   2020-10-01 11:02:56
Name: treq, dtype: datetime64[ns]

In [27]:
pd.to_datetime('08:20')

Timestamp('2020-10-01 08:20:00')

In [28]:
pd.Timedelta('30m')

Timedelta('0 days 00:30:00')

---
(c) Rafał Kucharski, Delft, 2020