Our first resource is a synthetic dataset looking at mode choice for 500 travellers. For each individual, the data contains two revealed preference (RP) inter-city trips, where the possible modes were **car**, **bus**, **air** and **rail**, and where each individual has at least two of these four modes available to them. The journey options are described on the basis of access time (except for car), travel time and cost, with times in minutes, and costs in $. The data then also contains 14 stated preference (SP) tasks per person, using the same alternatives as those available on the RP journey for that person, but with an additional categorical quality of service attribute added in for air and rail, taking three levels, namely *no frills*, *wifi available*, or *food available*. For each individual, the dataset also contains information on gender, whether the journey was a business trip or not, and the individual’s income.

In [1]:
#clean memory
import gc
gc.collect()

#start timer
import time
start_time = time.time()

Importing librairies and the data

In [2]:
import numpy as np
import pandas as pd
import biogeme.database as db
import biogeme.biogeme as bio
from biogeme import models, tools
from biogeme.expressions import Beta, Variable, bioDraws, MonteCarlo, log, Power, exp, Derive
import scipy.stats as st


In [3]:
# Read the CSV file
df = pd.read_csv(r"C:\Users\alexi\Desktop\Semester Project GIT\Semester-project\Data\apollo_modeChoiceData.csv")
df


Unnamed: 0,ID,RP,SP,RP_journey,SP_task,av_car,av_bus,av_air,av_rail,time_car,...,access_air,service_air,time_rail,cost_rail,access_rail,service_rail,female,business,income,choice
0,1,1,0,1.0,,0,0,1,1,0,...,55,0,140,55,5,0,0,0,46705,4
1,1,1,0,2.0,,0,0,1,1,0,...,45,0,170,45,20,0,0,0,46705,4
2,1,0,1,,1.0,0,0,1,1,0,...,55,3,170,35,5,2,0,0,46705,4
3,1,0,1,,2.0,0,0,1,1,0,...,45,1,120,75,5,3,0,0,46705,4
4,1,0,1,,3.0,0,0,1,1,0,...,40,1,155,75,25,2,0,0,46705,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7995,500,0,1,,10.0,1,1,0,1,300,...,0,0,170,35,5,3,0,1,19910,4
7996,500,0,1,,11.0,1,1,0,1,275,...,0,0,130,65,15,3,0,1,19910,4
7997,500,0,1,,12.0,1,1,0,1,345,...,0,0,140,75,5,2,0,1,19910,1
7998,500,0,1,,13.0,1,1,0,1,250,...,0,0,130,65,10,2,0,1,19910,4


In [4]:
#fill nAn values with 0 (same as in Apollo)
df = df.fillna(0) 

#creating biogeme database

database = db.Database('SP_RP_model', df)
globals().update(database.variables) #transform all columns into variables

#only want revealed preferences
exclude = SP != 0
database.remove(exclude)


Defining Parameters

In [5]:
asc_car = Beta('asc_car', 0, None, None, 1) #fixed parameter, to 0
asc_bus = Beta('asc_bus', 0, None, None, 0)
asc_air = Beta('asc_air', 0, None, None, 0)
asc_rail = Beta('asc_rail', 0, None, None, 0)


b_tt_car = Beta('b_tt_car', 0, None, None, 0)
b_tt_bus = Beta('b_tt_bus', 0, None, None, 0)
b_tt_air = Beta('b_tt_air', 0, None, None, 0)
b_tt_rail = Beta('b_tt_rail', 0, None, None, 0)

b_cost = Beta('b_cost', 0, None, None, 0)
b_access = Beta('b_access', 0, None, None, 0)


Defining the Model


In [6]:
V_car = asc_car + b_tt_car* time_car + b_cost* cost_car
V_bus = asc_bus + b_tt_bus * time_bus + b_access * access_bus + b_cost * cost_bus
V_air = asc_air + b_tt_air * time_air + b_access * access_air + b_cost * cost_air 
V_rail = asc_rail + b_tt_rail * time_rail + b_access * access_rail + b_cost * cost_rail

V = {1: V_car, 2: V_bus, 3: V_air, 4: V_rail}

Estimating the Model


In [7]:
# Define the model
logprob_RP = models.loglogit(V, None, choice)

# Estimate the model
biogeme_RP = bio.BIOGEME(database, logprob_RP)
biogeme_RP.modelName = 'RP_Model'
biogeme_RP.generateHtml = False  # Disable HTML file generation
biogeme_RP.generatePickle = False  # Disable PICKLE file generation
biogeme_RP.save_iterations = False  # Disable ITER file generation
results_model_RP = biogeme_RP.estimate()

# Output
print(results_model_RP.getEstimatedParameters())

Obsolete syntax. Use generate_html instead of generateHtml
Obsolete syntax. Use generate_pickle instead of generatePickle


              Value  Rob. Std err  Rob. t-test  Rob. p-value
asc_air   -0.334637      0.201463    -1.661029  9.670768e-02
asc_bus   -0.289902      0.311193    -0.931583  3.515523e-01
asc_rail   0.145366      0.211744     0.686517  4.923874e-01
b_access   0.020796      0.005380     3.865860  1.106983e-04
b_cost    -0.001833      0.002298    -0.797581  4.251139e-01
b_tt_air   0.020019      0.003828     5.230046  1.694674e-07
b_tt_bus   0.003094      0.000776     3.985524  6.733149e-05
b_tt_car   0.007839      0.000561    13.975623  0.000000e+00
b_tt_rail  0.012650      0.001495     8.463738  0.000000e+00


In [8]:
# Retrieve the general statistics from the results
general_stats_model_RP = results_model_RP.getGeneralStatistics()
print(results_model_RP.printGeneralStatistics())

Number of estimated parameters:	9
Sample size:	1000
Excluded observations:	7000
Init log likelihood:	-1386.294
Final log likelihood:	-1162.546
Likelihood ratio test for the init. model:	447.4959
Rho-square for the init. model:	0.161
Rho-square-bar for the init. model:	0.155
Akaike Information Criterion:	2343.093
Bayesian Information Criterion:	2387.263
Final gradient norm:	2.4559E-04
Nbr of threads:	8



In [9]:
end_time = time.time()
elapsed_time = end_time - start_time

print(f"Elapsed Time: {elapsed_time} seconds")

Elapsed Time: 1.523756742477417 seconds
