# VencoPy Tutorial 2

This tutorial aims to give a more in depth overview into the dataParser class and showcases some features that can be customised.

In [6]:
import os, sys
import pandas as pd
import numpy as np
import yaml
import pathlib
from ruamel.yaml import YAML

path = '../..'
os.chdir(path)

from classes.dataParsers import DataParser
from classes.tripDiaryBuilders import TripDiaryBuilder
from classes.gridModelers import GridModeler
from classes.flexEstimators import FlexEstimator
from classes.evaluators import Evaluator

print("Current working directory: {0}".format(os.getcwd()))

Current working directory: C:\8_Work\VencoPy\VencoPy_internal\vencopy


In [7]:
pathGlobalConfig = pathlib.Path.cwd() / 'config' / 'globalConfig.yaml'
with open(pathGlobalConfig) as ipf:
    globalConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathLocalPathConfig = pathlib.Path.cwd()  / 'config' / 'localPathConfig.yaml'
with open(pathLocalPathConfig) as ipf:
    localPathConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathParseConfig = pathlib.Path.cwd()  / 'config' / 'parseConfig.yaml'
with open(pathParseConfig) as ipf:
    parseConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathTripConfig = pathlib.Path.cwd()  / 'config' / 'tripConfig.yaml'
with open(pathTripConfig) as ipf:
    tripConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathGridConfig = pathlib.Path.cwd()  / 'config' / 'gridConfig.yaml'
with open(pathGridConfig) as ipf:
    gridConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathEvaluatorConfig = pathlib.Path.cwd()  / 'config' / 'evaluatorConfig.yaml'
with open(pathEvaluatorConfig) as ipf:
    evaluatorConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
pathFlexConfig = pathlib.Path.cwd()  / 'config' / 'flexConfig.yaml'
with open(pathFlexConfig) as ipf:
    flexConfig = yaml.load(ipf, Loader=yaml.SafeLoader)
    
    
# Set reference dataset 
datasetID = 'MiD17'

# Modify the localPathConfig file to point to the .csv file in the sampling folder in the tutorials directory where the dataset for the tutorials lies.
localPathConfig['pathAbsolute'][datasetID] = pathlib.Path.cwd() / 'tutorials' / 'data_sampling'

# Assign to vencoPyRoot the folder in which you cloned your repository
localPathConfig['pathAbsolute']['vencoPyRoot'] = pathlib.Path.cwd()

# Similarly we modify the datasetID in the global config file
globalConfig['files'][datasetID]['tripsDataRaw'] = datasetID + '.csv'

# We also modify the parseConfig by removing some of the columns that are normally parsed from the MiD, which are not available in our semplified test dataframe
del parseConfig['dataVariables']['hhID'] 
del parseConfig['dataVariables']['personID'] 

## DataParser config file

The DataParser config file defines which variables are to be parsed (i.e. the ones needed to create trip diaries and calculate fleet flexibility) and sets some filtering options, such as the conditions for trips to be included of excluded from the parsing.

<div class="alert alert-block alert-danger"><b>Warning:</b> The list is very long.</div>

In [8]:
yaml.dump(parseConfig, sys.stdout)

dataVariables:
  datasetID:
  - MiD08
  - MiD17
  hhPersonID:
  - NA
  - HP_ID_Reg
  isMIVDriver:
  - pkw_f
  - W_VM_G
  travelTime:
  - wegmin_k
  - wegmin_imp1
  tripDistance:
  - wegkm_k
  - wegkm
  tripEndClock:
  - en_time
  - W_AZ
  tripEndHour:
  - en_std
  - W_AZS
  tripEndMinute:
  - en_min
  - W_AZM
  tripEndNextDay:
  - en_dat
  - W_FOLGETAG
  tripID:
  - wid
  - W_ID
  tripIsIntermodal:
  - NA
  - weg_intermod
  tripPurpose:
  - w04
  - zweck
  tripScaleFactor:
  - NA
  - W_HOCH
  tripStartClock:
  - st_time
  - W_SZ
  tripStartHour:
  - st_std
  - W_SZS
  tripStartMinute:
  - st_min
  - W_SZM
  tripStartMonth:
  - stich_m
  - ST_MONAT
  tripStartWeek:
  - stichwo
  - ST_WOCHE
  tripStartWeekday:
  - stichtag
  - ST_WOTAG
  tripStartYear:
  - stich_j
  - ST_JAHR
  tripWeight:
  - w_gew
  - W_GEW
encryptionPW: PW
filterDicts:
  MiD08:
    exclude:
      tripEndClock:
      - '301:00'
      tripEndHour:
      - 301
      tripEndMinute:
      - 301
      tripPurpose:
      - 9

## _DataParser_ class

Let's first run the class and see the outputs we get.

In [9]:
vpData = DataParser(datasetID=datasetID, parseConfig=parseConfig, globalConfig=globalConfig, localPathConfig=localPathConfig, loadEncrypted=False)

Parsing properties set up
Starting to retrieve local data file from C:\8_Work\VencoPy\VencoPy_internal\vencopy\tutorials\data_sampling\MiD17.csv
Finished loading 2124 rows of raw data of type .csv
Finished harmonization of variables
Starting filtering, applying 8 filters.
The following values were taken into account after filtering:
{'isMIVDriver': 1287,
 'tripDistance': 1948,
 'tripEndClock': 2124,
 'tripEndHour': 2124,
 'tripIsIntermodal': 1682,
 'tripPurpose': 2115,
 'tripStartClock': 2124,
 'tripStartHour': 2124}
All filters combined yielded a total of 950 was taken into account
This corresponds to 44.72693032015066 percent of the original data
Parsing completed


We can see from the print statements in the class that after reading in the initial dataset, which contained 2124 rows, and applying 8 filters, we end up with a database containing 950 suitable entries, which corresponds to about 45% of the initial sample.
These trip respect the condition that they all need to be shorter than 1000km, which is set in the parseConfig under the 'filterDict' key.

Now we can, for example, change in the filters the maximum allowed trip distance from 1000km to 50km and see how this affects the resulting available trips (the extreme case of 50km is only used for the tutorial purpose).

In [20]:
parseConfig['filterDicts']['MiD17']['smallerThan']['tripDistance'] = [50]
yaml.dump(parseConfig, sys.stdout)

dataVariables:
  datasetID:
  - MiD08
  - MiD17
  hhPersonID:
  - NA
  - HP_ID_Reg
  isMIVDriver:
  - pkw_f
  - W_VM_G
  travelTime:
  - wegmin_k
  - wegmin_imp1
  tripDistance:
  - wegkm_k
  - wegkm
  tripEndClock:
  - en_time
  - W_AZ
  tripEndHour:
  - en_std
  - W_AZS
  tripEndMinute:
  - en_min
  - W_AZM
  tripEndNextDay:
  - en_dat
  - W_FOLGETAG
  tripID:
  - wid
  - W_ID
  tripIsIntermodal:
  - NA
  - weg_intermod
  tripPurpose:
  - w04
  - zweck
  tripScaleFactor:
  - NA
  - W_HOCH
  tripStartClock:
  - st_time
  - W_SZ
  tripStartHour:
  - st_std
  - W_SZS
  tripStartMinute:
  - st_min
  - W_SZM
  tripStartMonth:
  - stich_m
  - ST_MONAT
  tripStartWeek:
  - stichwo
  - ST_WOCHE
  tripStartWeekday:
  - stichtag
  - ST_WOTAG
  tripStartYear:
  - stich_j
  - ST_JAHR
  tripWeight:
  - w_gew
  - W_GEW
encryptionPW: PW
filterDicts:
  MiD08:
    exclude:
      tripEndClock:
      - '301:00'
      tripEndHour:
      - 301
      tripEndMinute:
      - 301
      tripPurpose:
      - 9

In [21]:
vpData = DataParser(datasetID=datasetID, parseConfig=parseConfig, globalConfig=globalConfig, localPathConfig=localPathConfig, loadEncrypted=False)

Parsing properties set up
Starting to retrieve local data file from C:\8_Work\VencoPy\VencoPy_internal\vencopy\tutorials\data_sampling\MiD17.csv
Finished loading 2124 rows of raw data of type .csv
Finished harmonization of variables
Starting filtering, applying 8 filters.
The following values were taken into account after filtering:
{'isMIVDriver': 1287,
 'tripDistance': 1892,
 'tripEndClock': 2124,
 'tripEndHour': 2124,
 'tripIsIntermodal': 1682,
 'tripPurpose': 2115,
 'tripStartClock': 2124,
 'tripStartHour': 2124}
All filters combined yielded a total of 914 was taken into account
This corresponds to 43.03201506591337 percent of the original data
Parsing completed


We can see how with a maximum trip distance of 1000km, all filters combined yielded a total of 950 trips, which corresponds to about 45% of the original dataset. By changing this values to 50km, additional 36 trips have been excluded, resulting in 914 trips (43% ofthe initial dataset).

## Next Steps

In the next tutorial, you will learn more in detail the internal workings of the TripDiaryBuilder class and how to customise some settings.