In [27]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

In [64]:
data_path = os.path.join('data', 'trip_generation.csv')
df = pd.read_csv(data_path)
df.head(5)

Unnamed: 0,zone,dwelling_unit_type,number_of_persons,number_of_vehicles,licensed_drivers,fulltime_workers,partime_workers,work_at_home_persons,number_of_students,number_of_females,number_of_males,number_of_children,number_of_adults,daily_work_trips,daily_non_work_trips
0,29,2,3,1,3,1,1,0,1,2,1,0,0,2,9
1,6,2,2,1,2,2,0,0,0,1,1,0,0,3,7
2,11,2,1,0,0,0,0,0,0,1,0,0,1,0,0
3,14,2,1,0,1,0,1,0,1,1,0,0,0,0,3
4,24,2,2,0,0,0,0,0,2,1,1,0,0,0,4


## Research Questions

1. What are the key determinants of daily work trip generations?

2. What are the key determinants of daily non-work trip generations?

3. What factors affect the daily trip-generation in a metropolitan area?

In [65]:
df.columns

Index(['zone', 'dwelling_unit_type', 'number_of_persons', 'number_of_vehicles',
       'licensed_drivers', 'fulltime_workers', 'partime_workers',
       'work_at_home_persons', 'number_of_students', 'number_of_females',
       'number_of_males', 'number_of_children', 'number_of_adults',
       'daily_work_trips', 'daily_non_work_trips'],
      dtype='object')

### 1. What are the key determinants of daily work trip generations?
#### Multiple Regression Model 1 - Daily Work Trips

In this model, three variables were selected. 

The variables were the characteristics of the households that included the number of work at home persons, full-time and part-time workers. 

The rationale for selection of number of work at home persons was that it directly influence the amount of work trips. 

People who work remotely will not travel while those who commute to office generates at least two trips in a day. 

The same applies for the type of employment. 

Full time workers have to travel to office daily depending on whether they work remotely or not. 

However, part time workers travel less to their workplace even if the job has no remote option.

In [66]:
y = df['daily_work_trips']
x = df[['fulltime_workers', 'partime_workers', 'work_at_home_persons']]
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:       daily_work_trips   R-squared:                       0.481
Model:                            OLS   Adj. R-squared:                  0.481
Method:                 Least Squares   F-statistic:                     611.9
Date:                Sat, 06 Jul 2024   Prob (F-statistic):          2.52e-281
Time:                        09:37:34   Log-Likelihood:                -2188.8
No. Observations:                1982   AIC:                             4386.
Df Residuals:                    1978   BIC:                             4408.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0512 

### 2. What are the key determinants of daily non-work trip generations?
#### Multiple Regression Model 2 - Daily Non- Work Trips

The objective of this section was to create a model that can be used to estimate the total number of non-work trips. 

An assumption was made that students were the major contributors of non-work trips in a given household. Therefore, it was selected as the first variable. 

Secondly, the students have to be driven to school hence the number of licensed drivers in a household was considered as a vital variable in modelling the number of non-work trips. 

Additionally, the number of persons in a house was considered as the third variable in determining the amount of non-work trips. 

The choice of the variable was based on assumption that the more the people in a household, the higher the chances of having students and licensed drivers that directly affect the number of non-work trips. 

In [40]:
df.columns

Index(['zone', 'dwelling_unit_type', 'number_of_persons', 'number_of_vehicles',
       'licensed_drivers', 'fulltime_workers', 'partime_workers',
       'work_at_home_persons', 'number_of_students', 'number_of_females',
       'number_of_males', 'number_of_children', 'number_of_adults',
       'daily_work_trips', 'daily_non_work_trips'],
      dtype='object')

In [43]:
y = df['daily_non_work_trips']
x = df[['number_of_persons', 'licensed_drivers', 'number_of_students']]
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                             OLS Regression Results                             
Dep. Variable:     daily_non_work_trips   R-squared:                       0.335
Model:                              OLS   Adj. R-squared:                  0.334
Method:                   Least Squares   F-statistic:                     332.1
Date:                  Fri, 05 Jul 2024   Prob (F-statistic):          1.26e-174
Time:                          20:34:47   Log-Likelihood:                -4125.3
No. Observations:                  1982   AIC:                             8259.
Df Residuals:                      1978   BIC:                             8281.
Df Model:                             3                                         
Covariance Type:              nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const           

### 3. What factors affect the daily trip-generation in a metropolitan area?
#### Multiple Regression Model 3 - Total Trips

The trip generation model takes into account both the first two models (work and non-work trip). 

Consequently, it is based on a combination of variable that affect work and non-work trips generated by each household. 

To generate the model, the variables from each of the models were combined and the five with the highest t-test values were selected and used in the final model. 

Therefore, the five variables selected were the number of licensed drivers, number of work at home persons, number of part and full-time workers and the students in each of the households. 

The number of work at home persons and full-time workers affects the work trips while the number of students directly affect the amount of non-work trips. 

The number of licensed drivers affect both the two models since the driver has to be licensed irrespective of whether they are making work or non-work trip. 

In [112]:
y = df['daily_work_trips']
x = df[['fulltime_workers', 'partime_workers','work_at_home_persons' 
        'licensed_drivers', 'number_of_students']]
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:       daily_work_trips   R-squared:                       0.483
Model:                            OLS   Adj. R-squared:                  0.482
Method:                 Least Squares   F-statistic:                     369.6
Date:                Sat, 06 Jul 2024   Prob (F-statistic):          4.00e-280
Time:                        10:09:44   Log-Likelihood:                -2185.2
No. Observations:                1982   AIC:                             4382.
Df Residuals:                    1976   BIC:                             4416.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0507 

In [83]:
df['total_trips'] = ''

In [84]:
df['total_trips'] = df['daily_work_trips'] + df['daily_non_work_trips']

In [85]:
df.head(5)

Unnamed: 0,zone,dwelling_unit_type,number_of_persons,number_of_vehicles,licensed_drivers,fulltime_workers,partime_workers,work_at_home_persons,number_of_students,number_of_females,number_of_males,number_of_children,number_of_adults,daily_work_trips,daily_non_work_trips,total_trips
0,29,2,3,1,3,1,1,0,1,2,1,0,0,2,9,11
1,6,2,2,1,2,2,0,0,0,1,1,0,0,3,7,10
2,11,2,1,0,0,0,0,0,0,1,0,0,1,0,0,0
3,14,2,1,0,1,0,1,0,1,1,0,0,0,0,3,3
4,24,2,2,0,0,0,0,0,2,1,1,0,0,0,4,4


#### Non Work Daily Trips

In [101]:
y = df['total_trips']
x = df[['licensed_drivers', 'fulltime_workers', 'partime_workers', 
        'daily_non_work_trips', 'number_of_children']]
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:            total_trips   R-squared:                       0.935
Model:                            OLS   Adj. R-squared:                  0.935
Method:                 Least Squares   F-statistic:                     5732.
Date:                Sat, 06 Jul 2024   Prob (F-statistic):               0.00
Time:                        10:02:22   Log-Likelihood:                -2179.1
No. Observations:                1982   AIC:                             4370.
Df Residuals:                    1976   BIC:                             4404.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0169 

#### Work Daily Trips

In [107]:
y = df['total_trips']
x = df[['licensed_drivers', 'fulltime_workers', 'partime_workers', 'number_of_students', 
        'daily_work_trips', 'number_of_children', 'work_at_home_persons']]
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:            total_trips   R-squared:                       0.558
Model:                            OLS   Adj. R-squared:                  0.557
Method:                 Least Squares   F-statistic:                     356.5
Date:                Sat, 06 Jul 2024   Prob (F-statistic):               0.00
Time:                        10:04:08   Log-Likelihood:                -4085.6
No. Observations:                1982   AIC:                             8187.
Df Residuals:                    1974   BIC:                             8232.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.7782 