# VencoPy Tutorial 2

This tutorial aims to give a more in depth overview into the DataParser class and showcases some features that can be customised.

In [None]:
import sys
import os
from os import path
from pathlib import Path

sys.path.append((path.dirname(path.dirname(Path.cwd()))))

from vencopy.core.dataparsers import parse_data
from vencopy.utils.utils import load_configs, create_output_folders

print("Current working directory: {0}".format(os.getcwd()))

Current working directory: c:\Thesis\vencopy\vencopy\tutorials


In [2]:
base_path = Path.cwd().parent
configs = load_configs(base_path)
create_output_folders(configs=configs)
    
# Adapt relative paths in config for tutorials
configs['dev_config']['global']['relative_path']['parse_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['parse_output']
configs['dev_config']['global']['relative_path']['diary_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['diary_output']
configs['dev_config']['global']['relative_path']['grid_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['grid_output']
configs['dev_config']['global']['relative_path']['flex_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['flex_output']
configs['dev_config']['global']['relative_path']['aggregator_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['aggregator_output']
configs['dev_config']['global']['relative_path']['processor_output'] = Path.cwd().parent.parent / configs['dev_config']['global']['relative_path']['processor_output']

# Set reference dataset
datasetID = 'MiD17'

# Modify the localPathConfig file to point to the .csv file in the sampling folder in the tutorials directory where the dataset for the tutorials lies.
configs['user_config']['global']['absolute_path'][datasetID] = Path.cwd() /'data_sampling'

# Similarly we modify the datasetID in the global config file
configs['dev_config']['global']['files'][datasetID]['trips_data_raw'] = datasetID + '.csv'

# We also modify the parseConfig by removing some of the columns that are normally parsed from the MiD, which are not available in our semplified test dataframe
del configs['dev_config']['dataparsers']['data_variables']['household_id']
del configs['dev_config']['dataparsers']['data_variables']['person_id']


## DataParser config file

The DataParser config file defines which variables are to be parsed (i.e. the ones needed to create trip diaries and calculate fleet flexibility) and sets some filtering options, such as the conditions for trips to be included of excluded from the parsing.

<div class="alert alert-block alert-danger"><b>Warning:</b> The list is very long.</div>

In [3]:
configs['dev_config']

{'global': {'relative_path': {'parse_output': WindowsPath('c:/Thesis/vencopy/output/dataparser'),
   'diary_output': WindowsPath('c:/Thesis/vencopy/output/diarybuilder'),
   'grid_output': WindowsPath('c:/Thesis/vencopy/output/gridmodeller'),
   'flex_output': WindowsPath('c:/Thesis/vencopy/output/flexestimator'),
   'aggregator_output': WindowsPath('c:/Thesis/vencopy/output/profileaggregator'),
   'processor_output': WindowsPath('c:/Thesis/vencopy/output/postprocessor'),
   'config': './config/'},
  'files': {'MiD17': {'enrypted_zip_file_B1': 'B1_Standard-DatensatzpaketEncrypted.zip',
    'enrypted_zip_file_B2': 'B2_Regional-DatensatzpaketEncrypted.zip',
    'households_data_raw': 'MiD2017_Regional_Haushalte.csv',
    'persons_data_raw': 'MiD2017_Regional_Personen.csv',
    'trips_data_raw': 'MiD17.csv'},
   'MiD08': {'households_data_raw': 'MiD2008_PUF_Haushalte.dta',
    'person_data_raw': 'MiD2008_PUF_Personen.dta',
    'trips_data_raw': 'MiD2008_PUF_Wege.dta'},
   'KiD': {'trips_d

## _DataParser_ class

Let's first run the class and see the outputs we get.

In [4]:
data = parse_data(configs=configDict)
data.process()

Generic file parsing properties set up.
Starting to retrieve local data file from c:\Thesis\vencopy\vencopy\tutorials\data_sampling\MiD17.csv.
Finished loading 2124 rows of raw data of type .csv.
Running in debug mode.
Finished harmonization of variables.
Finished harmonization of ID variables.
Starting filtering, applying 8 filters.
All filters combined yielded that a total of 857 trips are taken into account.
This corresponds to 40.34839924670433 percent of the original data.
Completed park timestamp adjustments.
From 11791.33 km total mileage in the dataset after filtering, 0.0 % were cropped because they corresponded to split-trips from overnight trips.
Finished activity composition with 857 trips and 854 parking activites.
Parsing MiD dataset completed.


Unnamed: 0,index,unique_id,park_id,trip_id,is_driver,household_person_id,trip_weight,trip_scale_factor,trip_purpose,trip_distance,...,timestamp_start,timestamp_end,is_first_activity,is_last_activity,activity_id,next_activity_id,previous_activity_id,is_first_trip,is_first_park_activity,time_delta
0,1926,3,0.0,,True,3,3.430376,919.415872,6,,...,2017-04-08 00:00:00,2017-04-08 13:49:00,True,False,0.0,2.0,,False,True,0 days 13:49:00
1,1927,3,,2.0,True,3,3.430376,919.415872,6,15.20,...,2017-04-08 13:49:00,2017-04-08 14:15:00,False,False,2.0,2.0,0.0,True,False,0 days 00:26:00
2,1928,3,2.0,,True,3,3.430376,919.415872,6,,...,2017-04-08 14:15:00,2017-04-08 14:20:00,False,False,2.0,3.0,2.0,False,False,0 days 00:05:00
3,1929,3,,3.0,True,3,3.430376,919.415872,8,15.20,...,2017-04-08 14:20:00,2017-04-08 14:40:00,False,False,3.0,3.0,2.0,False,False,0 days 00:20:00
4,1930,3,3.0,,True,3,3.430376,919.415872,8,,...,2017-04-08 14:40:00,2017-04-09 00:00:00,False,True,3.0,,3.0,False,False,0 days 09:20:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1975,1103,2075,4.0,,True,2075,0.905471,242.686011,5,,...,2016-10-02 14:05:00,2016-10-02 14:25:00,False,False,4.0,5.0,4.0,False,False,0 days 00:20:00
1976,1104,2075,,5.0,True,2075,0.905471,242.686011,4,6.18,...,2016-10-02 14:25:00,2016-10-02 14:35:00,False,False,5.0,5.0,4.0,False,False,0 days 00:10:00
1977,1105,2075,5.0,,True,2075,0.905471,242.686011,4,,...,2016-10-02 14:35:00,2016-10-02 14:45:00,False,False,5.0,6.0,5.0,False,False,0 days 00:10:00
1978,1106,2075,,6.0,True,2075,0.905471,242.686011,8,11.40,...,2016-10-02 14:45:00,2016-10-02 15:05:00,False,False,6.0,6.0,5.0,False,False,0 days 00:20:00


We can see from the print statements in the class that after reading in the initial dataset, which contained 2124 rows, and applying 8 filters, we end up with a database containing 1980 suitable entries, which corresponds to about 93% of the initial sample.
These trip respect the condition that they all need to be shorter than 1000km, which is set in the parseConfig under the 'filterDict' key.

Now we can, for example, change in the filters the maximum allowed trip distance from 1000km to 50km and see how this affects the resulting available trips (the extreme case of 50km is only used for the tutorial purpose).

In [5]:
# configDict['parseConfig']['filterDicts']['MiD17']['smallerThan']['tripDistance'] = [50]
# configDict['parseConfig']['filterDicts']['MiD17']['smallerThan']['tripDistance']

configDict['dev_config']['dataparsers']['filters'][datasetID]['smaller_than']['trip_distance'] = [50]

In [6]:
data = parse_data(configs=configDict)
data.process()

Generic file parsing properties set up.
Starting to retrieve local data file from c:\Thesis\vencopy\vencopy\tutorials\data_sampling\MiD17.csv.
Finished loading 2124 rows of raw data of type .csv.
Running in debug mode.
Finished harmonization of variables.
Finished harmonization of ID variables.
Starting filtering, applying 8 filters.
All filters combined yielded that a total of 821 trips are taken into account.
This corresponds to 38.653483992467045 percent of the original data.
Completed park timestamp adjustments.
From 7819.38 km total mileage in the dataset after filtering, 0.0 % were cropped because they corresponded to split-trips from overnight trips.
Finished activity composition with 821 trips and 818 parking activites.
Parsing MiD dataset completed.


Unnamed: 0,index,unique_id,park_id,trip_id,is_driver,household_person_id,trip_weight,trip_scale_factor,trip_purpose,trip_distance,...,timestamp_start,timestamp_end,is_first_activity,is_last_activity,activity_id,next_activity_id,previous_activity_id,is_first_trip,is_first_park_activity,time_delta
0,1844,3,0.0,,True,3,3.430376,919.415872,6,,...,2017-04-08 00:00:00,2017-04-08 13:49:00,True,False,0.0,2.0,,False,True,0 days 13:49:00
1,1845,3,,2.0,True,3,3.430376,919.415872,6,15.20,...,2017-04-08 13:49:00,2017-04-08 14:15:00,False,False,2.0,2.0,0.0,True,False,0 days 00:26:00
2,1846,3,2.0,,True,3,3.430376,919.415872,6,,...,2017-04-08 14:15:00,2017-04-08 14:20:00,False,False,2.0,3.0,2.0,False,False,0 days 00:05:00
3,1847,3,,3.0,True,3,3.430376,919.415872,8,15.20,...,2017-04-08 14:20:00,2017-04-08 14:40:00,False,False,3.0,3.0,2.0,False,False,0 days 00:20:00
4,1848,3,3.0,,True,3,3.430376,919.415872,8,,...,2017-04-08 14:40:00,2017-04-09 00:00:00,False,True,3.0,,3.0,False,False,0 days 09:20:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1893,1056,2075,4.0,,True,2075,0.905471,242.686011,5,,...,2016-10-02 14:05:00,2016-10-02 14:25:00,False,False,4.0,5.0,4.0,False,False,0 days 00:20:00
1894,1057,2075,,5.0,True,2075,0.905471,242.686011,4,6.18,...,2016-10-02 14:25:00,2016-10-02 14:35:00,False,False,5.0,5.0,4.0,False,False,0 days 00:10:00
1895,1058,2075,5.0,,True,2075,0.905471,242.686011,4,,...,2016-10-02 14:35:00,2016-10-02 14:45:00,False,False,5.0,6.0,5.0,False,False,0 days 00:10:00
1896,1059,2075,,6.0,True,2075,0.905471,242.686011,8,11.40,...,2016-10-02 14:45:00,2016-10-02 15:05:00,False,False,6.0,6.0,5.0,False,False,0 days 00:20:00


We can see how with a maximum trip distance of 1000km, all filters combined yielded a total of 1980 trips, which corresponds to about 93% of the original dataset. By changing this values to 50km, additional 82 trips have been excluded, resulting in 1898 trips (89% ofthe initial dataset).

## Next Steps

In the next tutorial, you will learn more in detail the internal workings of the TripDiaryBuilder class and how to customise some settings.