## Tutorial 0: Debugging common error messages and issues in Python and Biogeme

This tutorial covers some of the most common error messages and issues that may arise when working with Python and Biogeme for discrete choice modelling. We will go through typical mistakes, explain the error messages, and provide solutions to fix them.

In [None]:
# Import the libraries
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# 1. General Python Errors

### Import errors

#### ModuleNotFoundError: No module named 'utils'

In [2]:
# Add the utils folder to the Python path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..', '..')))

# Import the bio_estimation_fcns from the utils folder
from utils.bio_estimation_fcns import print_results

ModuleNotFoundError: No module named 'utils'

In [3]:
# Explanation
print(f"Python looked for the folder named 'utils' in the directory:\n{os.path.abspath(os.path.join(os.getcwd(), '..', '..'))}")

# However, the utils folder is not in this folder
# It is NOT located TWO levels up from the current working directory os.getcwd(), but rather ONE level up.
# Note: '..' means go up one level in the directory tree. So, '..', '..' means go up two levels in the directory tree

print(f"The correct path to look for the utils folde should be:\n{os.path.abspath(os.path.join(os.getcwd(), '..'))}")

# Now we have added the correct path, we can import the bio_estimation_fcns from the utils folder
# We use add os.path.abspath() to ensure we have a clean absolute path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
from utils.bio_estimation_fcns import print_results, estimate_mnl, estimate_mxl, estimate_panel_mxl

Python looked for the folder named 'utils' in the directory:
/Users/sandervancranenburgh/Documents/Repos_and_data/SEN1221
The correct path to look for the utils folde should be:
/Users/sandervancranenburgh/Documents/Repos_and_data/SEN1221/tutorials


#### Could not find a version that satisfies the requirement

In [4]:
pip install whatever-package

[31mERROR: Could not find a version that satisfies the requirement whatever-package (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for whatever-package[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [5]:
# This error occurs because the Python interpreter cannot find the specified package in its search path.
# To fix this, you need to ensure that the package is installed in your Python environment.
# You can install the package using pip or conda, depending on your package manager.
# If you are using pip, you can run the following command in your terminal or command prompt:
# pip install whatever-package

# 1. Pandas errors

#### FileNotFoundError: [Errno 2] No such file or directory:

In [6]:
# Load the choice data
data_path =  Path(f'synthetic_VTTdata_tutorial1.dat')
df = pd.read_csv(data_path)

FileNotFoundError: [Errno 2] No such file or directory: 'synthetic_VTTdata_tutorial1.dat'

In [7]:
# We try to load the data using pandas, but the file does not exist in the specified path.
# To fix this, ensure that the path pointing to the data file is correct.
print(f"Python looked for the data file in the path:\n{data_path.resolve()}")

# However, the file is not located there. It is located in the 'data' folder one level up from the current working directory.
data_path =  Path(f'data/synthetic_VTTdata_tutorial1.dat')

# Now, we can load the data using the correct path
df = pd.read_csv(data_path)

# Show the data
df.head(10)

Python looked for the data file in the path:
/Users/sandervancranenburgh/Documents/Repos_and_data/SEN1221/tutorials/tutorial0/synthetic_VTTdata_tutorial1.dat


Unnamed: 0,RESP,TC1,TT1,TC2,TT2,CHOICE
0,1,4,35,8,25,2
1,1,4,35,8,30,2
2,1,6,40,7,20,2
3,1,5,30,6,25,2
4,1,5,40,7,20,1
5,2,4,35,8,25,1
6,2,4,35,8,30,1
7,2,6,40,7,20,2
8,2,5,30,6,25,2
9,2,5,40,7,20,2


# 3. Biogeme errors

In [8]:
# Import biogeme libraries
import biogeme.database as db
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, Variable, log, exp
import biogeme.expressions

#### AttributeError: 'DataFrame' object has no attribute 'data'

In [9]:
# Give a name to the model    
model_name = 'Linear-additive RUM-MNL'

# We create Variable objects for each of the variables in the data set that we want to use in the model
TT1  = Variable('TT1')
TC1  = Variable('TC1')
TT2  = Variable('TT2')
TC2  = Variable('TC2')
CHOICE = Variable('CHOICE')

# Define the parameters to be estimated
ASC1 = Beta('ASC1', 0, None, None, 1)
ASC2 = Beta('ASC2', 0, None, None, 0)
B_TT = Beta('B_TT', 0, None, None, 0)
B_TC = Beta('B_TC', 0, None, None, 0)

# Define the utility functions
V1 = ASC1 + B_TT * TT1 + B_TC * TC1
V2 = ASC2 + B_TT * TT2 + B_TC * TC2

# Create a dictionary to list the utility functions with the numbering of alternatives
V = {1: V1, 2: V2}
    
# Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
av = {1: 1, 2: 1} 

results_mnl = estimate_mnl(V, av, CHOICE, df, model_name)


AttributeError: 'DataFrame' object has no attribute 'data'

In [10]:
# The error occurs because the function biogeme, which is invoked in estimate_mnl() expects a Biogeme database object, but a pandas DataFrame is being passed instead.

# To fix this, you need to create a Biogeme database object from the pandas DataFrame before passing it to the biogeme function.
# You can do this by using the biogeme.database.Database class to create a database object
biodata = db.Database("my_data", df)

print(f"The object 'biodata' has type {type(biodata)}")

# Now we can pass the 'biodata' object to the estimate_mnl() function
results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

# And print the results
print_results(results_mnl)

The object 'biodata' has type <class 'biogeme.database.Database'>


Results for model Linear-additive RUM-MNL
Nbr of parameters:		3
Sample size:			1000
Excluded data:			0
Null log likelihood:		-693.1472
Final log likelihood:		-516.5406
Likelihood ratio test (null):		353.2132
Rho square (null):			0.255
Rho bar square (null):			0.25
Akaike Information Criterion:	1039.081
Bayesian Information Criterion:	1053.804

       Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC2  1.0996        0.2564         4.29           0.0
B_TC -0.3814        0.0584        -6.53           0.0
B_TT -0.0825        0.0138        -5.99           0.0


#### KeyError: 

In [11]:
# Create a Biogeme database object from the pandas DataFrame 
biodata = db.Database("my_data", df)

# Give a name to the model    
model_name = 'Linear-additive RUM-MNL'

# We create Variable objects for each of the variables in the data set that we want to use in the model
TT1  = Variable('TT1')
TC1  = Variable('TC1')
TT2  = Variable('TT2')
TC2  = Variable('TC2')
CHOICE = Variable('Chosen')

# Define the parameters to be estimated
ASC1 = Beta('ASC1', 0, None, None, 1)
ASC2 = Beta('ASC2', 0, None, None, 0)
B_TT = Beta('B_TT', 0, None, None, 0)
B_TC = Beta('B_TC', 0, None, None, 0)

# Define the utility functions
V1 = ASC1 + B_TT * TT1 + B_TC * TC1
V2 = ASC2 + B_TT * TT2 + B_TC * TC2

# Create a dictionary to list the utility functions with the numbering of alternatives
V = {1: V1, 2: V2}
    
# Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
av = {1: 1, 2: 1} 

results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

KeyError: 'Chosen'

In [12]:
# This error occurs because the variable 'Chosen' does not exist in the Biogeme database object 'biodata'.
# To fix this, ensure that the variable name used in the Variable() function matches the name of the variable in the data set.

# We can print the column names of the DataFrame to check the exact variable names
print("Column names in the DataFrame:")
print(biodata.data.columns)

# We see that the correct variable name is 'CHOICE', not 'Chosen'.
# So, we need to change the line where we create the Variable object for CHOICE
CHOICE = Variable('CHOICE')

# Now we can run the estimation again
results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

# And print the results
print_results(results_mnl)

Column names in the DataFrame:
Index(['RESP', 'TC1', 'TT1', 'TC2', 'TT2', 'CHOICE'], dtype='object')


Results for model Linear-additive RUM-MNL
Nbr of parameters:		3
Sample size:			1000
Excluded data:			0
Null log likelihood:		-693.1472
Final log likelihood:		-516.5406
Likelihood ratio test (null):		353.2132
Rho square (null):			0.255
Rho bar square (null):			0.25
Akaike Information Criterion:	1039.081
Bayesian Information Criterion:	1053.804

       Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC2  1.0996        0.2564         4.29           0.0
B_TC -0.3814        0.0584        -6.53           0.0
B_TT -0.0825        0.0138        -5.99           0.0


#### BiogemeError: Error in the loglikelihood function. Some variables are not inside PanelLikelihoodTrajectory:

In [13]:
# The line informs Biogeme that biodata is panel data, where 'RESP' is the variable in the data that identifies the individuals
biodata.panel('RESP')

# We a simple MNL model (which ignores the panel)
results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

BiogemeError: Error in the loglikelihood function. Some variables are not inside PanelLikelihoodTrajectory: {'TC2', 'TC1', 'CHOICE', 'TT1', 'TT2'} .If the database is organized as panel data, all variables must be used inside a PanelLikelihoodTrajectory. If it is not consistent with your model, generate a flat version of the data using the function `generateFlatPanelDataframe`.

In [14]:
# The error occurs because biogeme expects a model that accounts for the panel structure of the data: it looks for a PanelLikelihoodTrajectory use. 
# But the MNL does not account for panel structure.
# To fix this, you need to avoid providing information about the panel structure of the data when estimating a simple MNL model.
# So, you should not call the biodata.panel('RESP') line before estimating the MNL model.
# Alternatively, we can revert the biodata to a non-panel database using the following line:
biodata.panelColumn = None

# Check if the biodata is indeed non-panel
print(biodata.is_panel())

# Now we can estimate the MNL model again
results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

# And print the results
print_results(results_mnl)

False


Results for model Linear-additive RUM-MNL
Nbr of parameters:		3
Sample size:			1000
Excluded data:			0
Null log likelihood:		-693.1472
Final log likelihood:		-516.5406
Likelihood ratio test (null):		353.2132
Rho square (null):			0.255
Rho bar square (null):			0.25
Akaike Information Criterion:	1039.081
Bayesian Information Criterion:	1053.804

       Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC2  1.0996        0.2564         4.29           0.0
B_TC -0.3814        0.0584        -6.53           0.0
B_TT -0.0825        0.0138        -5.99           0.0


#### BiogemeError: The argument of MonteCarlo must contain a bioDraws:

In [None]:
# Give a name to the model    
model_name = 'Linear-additive Mixed Logit'

# Define the parameters to be estimated
ASC1 = Beta('ASC1', 0, None, None, 1)
ASC2 = Beta('ASC2', 0, None, None, 0)
B_TT = Beta('B_TT', 0, None, None, 0)
B_TC = Beta('B_TC', 0, None, None, 0)
sigma_TC = Beta('sigma_TC', 0.1, 0, None, 0)

# Define the random parameters
B_TC_rnd = B_TC + sigma_TC * biogeme.expressions.bioDraws('B_TC_rnd', 'NORMAL')

# Define the utility functions
V1 = ASC1 + B_TT * TT1 + B_TC * TC1
V2 = ASC2 + B_TT * TT2 + B_TC * TC2

# Create a dictionary to list the utility functions with the numbering of alternatives
V = {1: V1, 2: V2}
    
# Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
av = {1: 1, 2: 1} 

results_mnl = estimate_mxl(V, av, CHOICE, biodata, model_name)

The argument of MonteCarlo must contain a bioDraws: MonteCarlo(exp(_bioLogLogit[choice=CHOICE]U=(1:((Beta('ASC1', 0, None, None, 1) + (Beta('B_TT', 0, None, None, 0) * TT1)) + (Beta('B_TC', 0, None, None, 0) * TC1)), 2:((Beta('ASC2', 0, None, None, 0) + (Beta('B_TT', 0, None, None, 0) * TT2)) + (Beta('B_TC', 0, None, None, 0) * TC2)))av=(1:`1.0`, 2:`1.0`)))


BiogemeError: The argument of MonteCarlo must contain a bioDraws: MonteCarlo(exp(_bioLogLogit[choice=CHOICE]U=(1:((Beta('ASC1', 0, None, None, 1) + (Beta('B_TT', 0, None, None, 0) * TT1)) + (Beta('B_TC', 0, None, None, 0) * TC1)), 2:((Beta('ASC2', 0, None, None, 0) + (Beta('B_TT', 0, None, None, 0) * TT2)) + (Beta('B_TC', 0, None, None, 0) * TC2)))av=(1:`1.0`, 2:`1.0`)))

In [19]:
# Give a name to the model    
model_name = 'Linear-additive Mixed Logit'

# Define the parameters to be estimated
ASC1 = Beta('ASC1', 0, None, None, 1)
ASC2 = Beta('ASC2', 0, None, None, 0)
B_TT = Beta('B_TT', 0, None, None, 0)
B_TC = Beta('B_TC', 0, None, None, 0)
sigma_TC = Beta('sigma_TC', 0.1, 0, None, 0)

# Define the random parameters
B_TC_rnd = B_TC + sigma_TC * biogeme.expressions.bioDraws('B_TC_rnd', 'NORMAL')

# Define the utility functions
V1 = ASC1 + B_TT * TT1 + B_TC_rnd * TC1
V2 = ASC2 + B_TT * TT2 + B_TC_rnd * TC2
# Create a dictionary to list the utility functions with the numbering of alternatives
V = {1: V1, 2: V2}
    
# Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
av = {1: 1, 2: 1} 

results_mnl = estimate_mxl(V, av, CHOICE, biodata, model_name)

# Show the results
print_results(results_mnl)

The number of draws (100) is low. The results may not be meaningful.




Number of estimated parameters:	4
Sample size:	1000
Excluded observations:	0
Init log likelihood:	-692.4261
Final log likelihood:	-513.616
Likelihood ratio test for the init. model:	357.6203
Rho-square for the init. model:	0.258
Rho-square-bar for the init. model:	0.252
Akaike Information Criterion:	1035.232
Bayesian Information Criterion:	1054.863
Final gradient norm:	4.7041E-02
Number of draws:	100
Draws generation time:	0:00:00.081652
Types of draws:	['B_TC_rnd: NORMAL']
Nbr of threads:	10

           Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC2      1.0008        0.2845         3.52          0.00
B_TC     -0.3591        0.0858        -4.18          0.00
B_TT     -0.1096        0.0238        -4.61          0.00
sigma_TC  0.6928        0.2921         2.37          0.02


#### BiogemeError: The following elementary expressions are defined more than once:

In [16]:
# Create a Biogeme database object from the pandas DataFrame 
biodata = db.Database("my_data", df)

# Give a name to the model    
model_name = 'Linear-additive RUM-MNL'

# We create Variable objects for each of the variables in the data set that we want to use in the model
TT1  = Variable('TT1')
TC1  = Variable('TC1')
TT2  = Variable('TT2')
TC2  = Variable('TC2')
CHOICE = Variable('CHOICE')

# Define the parameters to be estimated
ASC1 = Beta('ASC', 0, None, None, 1)
ASC2 = Beta('ASC', 0, None, None, 0)
B_TT = Beta('B_TT', 0, None, None, 0)
B_TC = Beta('B_TC', 0, None, None, 0)

# Define the utility functions
V1 = ASC1 + B_TT * TT1 + B_TC * TC1
V2 = ASC2 + B_TT * TT2 + B_TC * TC2

# Create a dictionary to list the utility functions with the numbering of alternatives
V = {1: V1, 2: V2}
    
# Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
av = {1: 1, 2: 1} 

results_mnl = estimate_mnl(V, av, CHOICE, biodata, model_name)

NameError: name 'df' is not defined

In [17]:
# The error occurs because the ASC parameters are defined with the same name 'ASC'.
# In Biogeme, each parameter must have a unique name.

# This fix this, we need TO RESTART the KERNEL.
# This clears the previous definitions. After the restart we can redefine the ASC parameters with unique names.