# Demanda de transporte

## `Auxiliar: Redes neuronales artificiales`

**Junio 2024**<br>
**Gabriel Nova & Sander van Cranenburgh** <br>
**G.N.Nova@tudelft.nl** <br>

### `Application: Modelling neighbourhood choices`

In this lab session, we will analyse neighbourhood location choice behaviour. Understanding people's preferences over neighbourhood characteristics is crucial for city planners when they (re)develop neighbourhoods or devise policies to tackle e.g. residential segregation. During this lab session, you will apply discrete choice models to uncover people's preferences over attributes, such as the distance to the city centre and the share of foreigners in their neighbourhood. Also, you will explore whether preferences interact with covariates such as age, gender, home ownership, car ownership and urbanisation level. While doing so, you will test various utility specifications and interpret the modelling outcomes of discrete choice models.

For this study, we use data from a Stated Choice (SC) experiment, which was conducted between 2017 and 2018 in four European cities: Hanover, Mainz, Bern, and Zurich.

![SC](./data/sc_experiment.png)

**`Learning objectives lab session 01A`**

After completing the following lab session, you will be able to:
* Discover choice data
* Estimate RUM-based multinomial logit discrete choice models using the Python package called `Biogeme`
* Interpret the modelling results of RUM-MNL models
* Forecast market shares by applying an estimated discrete choice model


**`This lab consists of 3 parts and has 4 exercises`**

**Part 1**: Load and explore the data set
- Exercise 1: "Representativeness of the sample"

**Part 2**: The linear-additive RUM-MNL model
- Exercise 2: "Interpreting modelling outcomes"
- Excerise 3: "Attribute importance"

**Part 3**: Market share forecasting for Zurich
- Excerise 4: "Forecasting"

#### `Import packages`

To begin, we will import all the Python libraries that we will use in this lab session.

In [1]:
# Biogeme
import biogeme.database as db
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, Variable, log, exp

# General python packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
from pathlib import Path

# Pandas setting to show all columns when displaying a pandas dataframe
pd.set_option('display.max_columns', None)

### `1. Load and explore the data set` <br>

**`Load the data set`** <br>

In [None]:
# Create that path to the data file
data_path =  Path(f'data/choice_data_cleaned.dat')
print(data_path)

Load the choice data, using `read_csv()` from Pandas: 

In [3]:
# Load the data as a pandas dataframe
df = pd.read_csv(data_path, sep='\t')

**`Explore the data set`**<br>

Now, let's explore the data set and examine the variables in the data.<br>
You can use `head()` to look at the first 5 rows of the data set.

In [None]:
df.head()

**Description of variables**<br>

The number concatenated to the variable refers to the alternative. Hence, `STORES1` is the column containing the attribute levels of alternative 1 for attribute STORES.<br>

| Variable       | Description                                                    | Type/Levels |
|-------------|----------------------------------------------------------------|--------------|
| `ID`        | This is the ID number of the respondent                         | Integer      |
| `TASK_ID`   | This is the number of the respondent's task of choice           | Integer      |
| `STORES`    | Distance to grocery store in walking minutes                    | 2 Min., 5 Min., 10 Min., 15 Min.     |
| `TRANSPORT` | Distance to public transportation in walking minutes            | 2 Min., 5 Min., 10 Min., 15 Min.      |
| `CITY`      | Distance to city centre in km                                   | Below 1 km, 1 to 2 km, 3 to 4 km, over 4 km      |
| `NOISE`     | Street traffic noise                                            | 1 = None, 2 = Little, 3 = Meduim, 4 = High      |
| `GREEN`     | Green areas in residential area                                 | 1 = None, 2 = Few, 3 = Some, 4 = Many       |
| `FOREIGN`   | Share of foreigners in residential areas                        | 0.10, 0.20, 0.30, 0.40      |
| `CHOICE`    | Indicates the choice.                                           | Integer  |
| `RESPCITY`  | Indicates the city. 1 = Mainz, 2 = Hanover, 3 = Bern, 4 = Zurich| Categorical  |
| `WOMAN`     | Indicates 1 if woman and 0 otherwise                            | Binary       |
| `AGE`       | Age in years                                                    | Integer      |
| `ENVCONC`   | Environmental concern from 1 to 5, with 5 being the highest degree of concern | Ordinal |
| `EDUYEARS`  | Number of years in education                                    | Numeric      |
| `RESPFOREIGN`| 1 if the respondent is a foreigner, 0 otherwise                | Binary       |
| `HOMEOWNER` | Indicates 1 if the respondent is a home owner and 0 otherwise   | Binary       |
| `CAROWNER`  | Indicates 1 if the respondent is a car owner and 0 otherwise    | Binary       |
| `JOB`       | 1 if the respondent is working, 0 otherwise                     | Binary       |
| `NONWESTERN`| 1 if the respondent is non-western, 0 otherwise                 | Binary       |
| `WESTERN`   | 1 if the respondent is western, 0 otherwise                     | Binary       |

**`Descriptive statistics`**<br>

We can use `describe()` to view descriptive statistics, such as count, mean, std, min, percentiles, and max about the **attribute levels** of the alternatives.

In [None]:
attributes =   ['STORES1', 'TRANSPORT1', 'CITY1', 'NOISE1', 'GREEN1', 'FOREIGN1', 
                'STORES2', 'TRANSPORT2', 'CITY2', 'NOISE2', 'GREEN2', 'FOREIGN2',
                'STORES3', 'TRANSPORT3', 'CITY3', 'NOISE3', 'GREEN3', 'FOREIGN3']
round(df[attributes].describe(),2)

**`Frequency and percentage of choices`**<br>

When modelling choices, we are also interested in the frequency at which alternatives are chosen. In experiments with **unlabelled** alternatives (like this one), this analysis tells us whether the choices are 'balanced'. This means that the alternatives have been chosen in a similar proportion. If the data are not balanced, it may indicate that the experimental design was not sufficiently randomised. (In the lectures by Eric Molin you will learn more about the design of choice experiments).

In [None]:
# Counts the number of times each  alternative is chosen
choice_freq = df['CHOICE'].value_counts()

# Calculate the percentage of the chosen alternatives
choice_percent = round(choice_freq / len(df['CHOICE']) * 100,2)

# Table Summary
choice_table = pd.DataFrame({'Choice': choice_freq.index, 'Frequency': choice_freq.values, 'Percentage':choice_percent.values} )

# Show the table
choice_table

As can be seen, all alternatives attain an (almost) equal share. This shows the design of the experiment was sufficiently randomised, and we do not need to account for artefacts arising from the experimental design (e.g. using constants).

### `Exercise 1: Representativeness of the sample`

When modelling choice behaviour, it is also important to have a good understanding of whether the sample (i.e. the collected data) is representative of the target population. If you are working with a non-representative sample, the results and conclusions can not be generalised to the population. This is particularly important when the objective is to determine e.g. Willingness-to-pay estimates.<br>

To assess whether the sample is representative of our target population, we compare the sample statistics of the socio-demographic variables with statistics of the population. Usually, population statistics are made available by the National Bureaus of Statistics. In the Netherlands, this institute is called CBS (Centraal Bureau voor de Statistiek).

Explore the sample statistics.<br>

`A` Identify the column with socio-demographic variables <br>
`B` Use the describe() to describe the socio-demographic variables, and create histograms for the variables<br>
`C` Reflect on the representativeness of the sample, without comparing them to the population statistics<br>

In [7]:
# your code here

### `2. The linear-additive RUM-MNL model` <br>

Now that we have developed a feeling for our data, we can start with estimating discrete choice models. For this, we will use the Python package called `Biogeme`. 


**`Biogeme database`**<br>
To use this package, we first need to create the data set as a Biogeme database object using `db.Database()`. This object contains the data in a format compatible with the library functions for model estimation in Biogeme.

In [8]:
# db.Database takes as arguments (1) a name (string) and (2) a data set (pandas dataframe)
biodata = db.Database('Neighboorhood_choice_data', df)

**`Biogeme variables`**<br>

Also, we need to create Biogeme objects for all the variables in our data set that we want to use in our model specifications.<br>
The `Variable()` function creates an object that represents the variable values and will allow it to be included in the model estimation function.

In [9]:
# We create Variable objects for each of the variables in the data set that we want to use in the model

# Attributes of alternative 1
STORES1     = Variable('STORES1')
TRANSPORT1  = Variable('TRANSPORT1')
CITY1       = Variable('CITY1')
NOISE1      = Variable('NOISE1')
GREEN1      = Variable('GREEN1')
FOREIGN1    = Variable('FOREIGN1')

# Attributes of alternative 2    
STORES2     = Variable('STORES2')
TRANSPORT2  = Variable('TRANSPORT2')
CITY2       = Variable('CITY2')
NOISE2      = Variable('NOISE2')
GREEN2      = Variable('GREEN2')
FOREIGN2    = Variable('FOREIGN2')
    
# Attributes of alternative 3
STORES3     = Variable('STORES3')
TRANSPORT3  = Variable('TRANSPORT3')
CITY3       = Variable('CITY3')
NOISE3      = Variable('NOISE3')
GREEN3      = Variable('GREEN3')
FOREIGN3    = Variable('FOREIGN3')

# The choice
CHOICE      = Variable('CHOICE')

# Socio-economic variables
AGE         = Variable('AGE')
WOMAN       = Variable('WOMAN')
HOMEOWNER   = Variable('HOMEOWNER')
CAROWNER    = Variable('CAROWNER')
RESPCITY    = Variable('RESPCITY')
JOB         = Variable('JOB')

**`The linear-additive utility specification`**

We start with defining the utility specification of the model that we wish to estimate.<br>

For that, we must define the parameters to be estimated and specify the utility functions.<br>

In the linear-additive RUM-MNL model, the observed utility is *V* for alternative *i* is given by:

$V_i = \beta_1 \cdot \text{x}_{1i} + \beta_2 \cdot \text{x}_{2i} + \ldots + \beta_M \cdot \text{x}_{Mi}  $

Where:
- $\beta_1, \beta_2, \ldots, \beta_M$ denote the marginal utility associated with each attribute $m$.
- $\text{x}_{1i}, \text{x}_{2i}, \ldots, \text{x}_{Mi} $ correspond to the attribute values alternative *i*.

The cell below creates this utility function in Biogeme.

In [10]:
# Give a name to the model    
model_name = 'Linear-additive RUM-MNL'

# Define the model parameters, using the function "Beta()", in which you must define:
# the name of the parameter,
# starting value, 
# lower bound,
# upper bound, 
# 0 or 1, indicating if the parameter must be estimated. 0 means estimated, 1 means fixed to the starting value. 
B_stores    = Beta('B_stores'   , 0, None, None, 0)
B_transport = Beta('B_transport', 0, None, None, 0)
B_city      = Beta('B_city'     , 0, None, None, 0)
B_noise     = Beta('B_noise'    , 0, None, None, 0)
B_green     = Beta('B_green'    , 0, None, None, 0)
B_foreign   = Beta('B_foreign'  , 0, None, None, 0)

# Define the utility functions
V1 = B_stores * STORES1 + B_transport * TRANSPORT1 + B_city * CITY1 + B_noise * NOISE1 + B_green * GREEN1 + B_foreign * FOREIGN1
V2 = B_stores * STORES2 + B_transport * TRANSPORT2 + B_city * CITY2 + B_noise * NOISE2 + B_green * GREEN2 + B_foreign * FOREIGN2
V3 = B_stores * STORES3 + B_transport * TRANSPORT3 + B_city * CITY3 + B_noise * NOISE3 + B_green * GREEN3 + B_foreign * FOREIGN3

**`Estimation function`** 

Now that we have specified the model, we need to estimate it. To do so, we create the following function `estimate_mnl` which we can re-use.

The estimation function takes the following inputs:
* Systematic utilities function (**V1, V2, V3**)
* Chosen alternatives array (**CHOICE**)
* Database which contains the relevant attributes and characteristics (**database**)
* Model name (**"string"**)


In [11]:
# This function estimates the MNL model and returns the estimation results
# input values: utilities for all three alternatives, the choices, the database, and the model name

def estimate_mnl(V1,V2,V3,CHOICE,database,name):
    
    # Create a dictionary to list the utility functions with the numbering of alternatives
    V = {1: V1, 2: V2, 3: V3}
        
    # Create a dictionary called av to describe the availability conditions of each alternative, where 1 indicates that the alternative is available, and 0 indicates that the alternative is not available.
    # This shows that all alternatives were available to all respondents. 
    av = {1: 1, 2: 1, 3: 1} 

    # Define the choice model: The function models.logit() computes the MNL choice probabilities of the chosen alternative given the V. 
    prob = models.logit(V, av, CHOICE)

    # Define the log-likelihood   
    LL = log(prob)
   
    # Create the Biogeme object containing the object database and the formula for the contribution to the log-likelihood of each row using the following syntax:
    biogeme = bio.BIOGEME(database, LL)
    
    # The following syntax passes the name of the model:
    biogeme.modelName = name

    # Some object settings regaridng whether to save the results and outputs 
    biogeme.generate_pickle = False
    biogeme.generate_html = False
    biogeme.save_iterations = False
    

    # Syntax to calculate the null log-likelihood. The null-log-likelihood is used to compute the rho-square 
    biogeme.calculate_null_loglikelihood(av)

    # This line starts the estimation and returns the results object.
    results = biogeme.estimate()
     
    return results

**`Estimation`**

We have created a biogeme database (biodata); we have defined our utility functions; and, we have created an estimation function to estimate MNL models (estimate_mnl).<br> 
Now, we only need to invoke the estimation by bringing these ingredients together. We pass the model specifications and the database to the estimation function. The function `estimate_mnl` returns an object which contains the estimation results.

In [12]:
# Estimate the model
results_MNL = estimate_mnl(V1,V2,V3,CHOICE,biodata,model_name)

**`View estimation results`**<br>

**Estimation statistics**

We can display a summary of the estimation statistics using `results.short_summary()` in which we see: 

* `Number of parameters`: Parameters being estimated.
* `Sample size`: The number of observations in the data set (used for estimating the model).
* `Excluded data`: The number of observations in the data set that were excluded for estimation.
* `Null log-likelihood`: The log-likelihood of the null model.
* `Final log-likelihood`: The log-likelihood of the estimated model.
* `Likelihood ratio test (null)`: A statistical test comparing the null model's likelihood with the likelihood of the estimated model. 
* `Rho square (null)`: Quantifies how well the model explains the data compared to the null model.
* `Rho bar square (null)`: Quantifies how well the model explains the data compared to the null model while penalising for the number of model parameters.
* `Akaike Information Criterion (AIC)`: A measure that shows the goodness of fit of the model, where lower AIC values indicate better models.
* `Bayesian Information Criterion (BIC)`: Similar to AIC, it penalizes model complexity more heavily, with lower values indicating better-fitting models while considering complexity.

**Parameter estimates**

We can display the estimated parameters using `results.getEstimatedParameters()`. Besides maximum likelihood estimates, we also see the associated standard errors, t-test values and p-values. The t-test values and p-values show the significance of the effect, demonstrating the generalisability of the relationship to the population.

In [None]:
# Print the estimation statistics
print(results_MNL.short_summary())

# Get the model parameters in a pandas table and  print it
beta_hat_MNL = results_MNL.get_estimated_parameters()
statistics_MNL = results_MNL.get_general_statistics()
print(beta_hat_MNL)

# Store the LL of the MNL model for later use
LL_MNL = results_MNL.data.logLike

### `Exercise 2:  Interpreting modelling outcomes`<br>

Interpret the outcomes of your MNL model by answering the following questions:

`A` Did the model converge?<br>

`B` Are all estimated parameters of the expected sign?<br>

`C` Are they significant at the 5% level? <br>

`D` Based on the Likelihood Ratio Test: is the estimated model statistically superior to a  model that determines choices by ‘throwing a dice’? (i.e., the Null model). The [Chi-Square Distribution Table](https://github.com/SEN1221TUD/Q2_2024/blob/main/Lab_sessions/Lab_session_01/data/Chi-Square%20Distribution%20Table.pdf) can be found here.<br>

In [14]:
# Your code and answers

**`Attribute importance`**<br>

Next, we explore the importance of each attribute to the choice behaviour. To do so, we assess how much the model fit deteriorates when we fix one of the beta to zero. We do this for all six betas. A large drop in model fit indicates a great importance of attribute. After all, it means that without having access to that attribute, the model is less capable of explaining the choice behaviour.    

In [None]:
# Create a list with the parameter names
param_list = ['B_stores','B_transport','B_city','B_noise','B_green','B_foreign']

# Create an empty dataframe with the parameter names as index (rows) to store the results
df_out = pd.DataFrame(index = param_list + ['LL'])

# Loop over the parameters
for param_fix in param_list:
    
    model_name = f'linear-additive RUM-MNL with {param_fix} fixed to zero'
   
    # Parameters to be estimated
    # Note that int(param_fix == 'B_stores') returns 1 if param_fix is 'B_stores', and 0 otherwise
    B_stores    = Beta('B_stores'   , 0, None, None, int(param_fix == 'B_stores'))
    B_transport = Beta('B_transport', 0, None, None, int(param_fix == 'B_transport'))
    B_city      = Beta('B_city'     , 0, None, None, int(param_fix == 'B_city'))
    B_noise     = Beta('B_noise'    , 0, None, None, int(param_fix == 'B_noise'))
    B_green     = Beta('B_green'    , 0, None, None, int(param_fix == 'B_green'))
    B_foreign   = Beta('B_foreign'  , 0, None, None, int(param_fix == 'B_foreign'))
    
    # Definition of the utility functions
    V1 = B_stores * STORES1 + B_transport * TRANSPORT1 + B_city * CITY1 + B_noise * NOISE1 + B_green * GREEN1 + B_foreign * FOREIGN1
    V2 = B_stores * STORES2 + B_transport * TRANSPORT2 + B_city * CITY2 + B_noise * NOISE2 + B_green * GREEN2 + B_foreign * FOREIGN2
    V3 = B_stores * STORES3 + B_transport * TRANSPORT3 + B_city * CITY3 + B_noise * NOISE3 + B_green * GREEN3 + B_foreign * FOREIGN3

    # Estimate the model
    results = estimate_mnl(V1,V2,V3,CHOICE,biodata,model_name)

    # Store the parameter estimates in a dataframe
    col = param_fix + '_fixed'
    df_out.loc[:,col] = results.get_beta_values()

    # Store the log-likelihood
    df_out.loc['LL',col] = results.get_general_statistics()['Final log likelihood'][0]

# Show the dataframe with the results
df_out

### `Exercise 3: Attribute importance`<br>

`A` List the attributes from most important to least important, based on their impact on the model fit.

`B` Use the Likelihood Ratio Statistic (LRS) to test, for each of the six models, whether the restricted model is statistically preferred over the unrestricted model (i.e. the linear-additive RUM-MNL model with LL = -8403.772 and 6 parameters).<br>
<br>
The LRS is given by: <br>
<br>
$LRS = -2 \left[LL(\beta_R)-LL(\beta_U)\right] $<br>
<br>
Where $\beta_R$ and $\beta_U$ correspond to the parameters estimated using a restricted and unrestricted model, respectively.<br>

You need to compare the LRS with the critical $\chi^2$ value associated with a specific significance level. If $LRS > \chi^2_{df}$ , you can conclude that the unrestricted model better explains the data than the restricted model.<br>

Use α = 0.01 as the critical threshold level of significance. The [Chi-Square Distribution Table](https://github.com/SEN1221TUD/Q2_2024/blob/main/Lab_sessions/Lab_session_01/data/Chi-Square%20Distribution%20Table.pdf) can be found here.<br>

In [16]:
# Your code and answers

### `3. Market share forecasting for Zurich`

Suppose the municipality of Zurich plans to redevelop one of their least accessible neighbourhoods in the North: `Affoltern, Oerlikon, Seebach` (11). Figure 1 shows the neighbourhoods of Zurich. The municipality plans to develop a new shopping area and public transport hub in the neighbourhood. The idea is that this will make the neighbourhood more attractive to live in. But, these plans are costly. Therefore, to make an informed decision, the municipality needs to have a good understanding of the impact of increasing the accessibility in `Affoltern, Oerlikon, Seebach` on the residential demand in that neighbourhood. <br>

The **current** situation - in terms of the attributes we have looked at in this study - is shown in the table below.<br>
**After** the redevelopment, the average distance to grocery stores and public transport is 5 minutes (STORES = 5, TRANSPORT = 5) instead of 15 minutes.


| neighbourhood name                        | ID         | CITY    | FOREIGN  | GREEN  | NOISE  | STORES   | TRANSPORT  |
|-------------------------------------------|------------|---------|----------|--------|--------|----------|------------|
| Altstadt                                  | 1          | 1       | 0.22     | 2      | 1      | 2        | 5          |
| Enge, Wollishofen, Leimbach               | 2          | 4       | 0.16     | 4      | 3      | 5        | 10         |
| Wiedikon                                  | 3          | 3       | 0.23     | 3      | 3      | 10       | 15         |
| Aussersihl                                | 4          | 2       | 0.27     | 1      | 4      | 5        | 5          |
| Industriequartier                         | 5          | 2       | 0.18     | 1      | 4      | 10       | 2          |
| Oberstrass, Unterstrass                   | 6          | 3       | 0.16     | 3      | 3      | 10       | 10         |
| Fluntern, Hottingen, Hirslanden, Witikon  | 7          | 4       | 0.16     | 4      | 1      | 10       | 15         |
| Riesbach                                  | 8          | 3       | 0.18     | 3      | 3      | 5        | 15         |
| Altstetten, Albisrieden                   | 9          | 4       | 0.31     | 3      | 2      | 5        | 10         |
| Wipkingen, Höngg                          | 10         | 4       | 0.25     | 4      | 3      | 15       | 10         |
| Affoltern, Oerlikon, Seebach              | 11         | 4       | 0.33     | 3      | 3      | 15       | 15         |
| Schwamendingen                            | 12         | 4       | 0.36     | 4      | 3      | 5        | 10         |

![Zurich](data/zurich.png)

To inform this policy decision, we will use our estimated choice models. To do so, we take the following steps:
1. We use our estimated MNL model to compute the choice probabilities of the neighbourhoods for the **current situation**
2. We use our estimated MNL model to compute the choice probabilities of the neighbourhoods for the **future situation**
3. We compare the probabilities between the present and future.

In [None]:
# Load the data as a pandas dataframe
data_path =  Path(f'data/zurich_data.csv')
df_zurich = pd.read_csv(data_path, index_col=0)

# Show the the dataframe
df_zurich

In [18]:
# Manually compute the utilities for each neighbourhood alternative in the present situation
# Let's use our benchmark linear-additive MNL model
V_zurich_present =  (beta_hat_MNL['Value']['B_city']      * df_zurich['CITY'] +
                     beta_hat_MNL['Value']['B_foreign']   * df_zurich['FOREIGN'] + 
                     beta_hat_MNL['Value']['B_green']     * df_zurich['GREEN'] + 
                     beta_hat_MNL['Value']['B_noise']     * df_zurich['NOISE'] + 
                     beta_hat_MNL['Value']['B_stores']    * df_zurich['STORES'] + 
                     beta_hat_MNL['Value']['B_transport'] * df_zurich['TRANSPORT'])

# Compute the market shares using the logit formula: Pi = exp(Vi)/sum(exp(Vj)) 
P_present = np.exp(V_zurich_present)/np.sum(np.exp(V_zurich_present))

In [19]:
# Manually compute the market shares for each neighbourhood alternative in the future scenario

# Create a copy of the dataframe, and change the accessibility of stores and transport in the neighbourhoods Affoltern, Oerlikon, and Seebach
df_zurich_future = df_zurich.copy()
df_zurich_future.loc['Affoltern, Oerlikon, Seebach','STORES'] = 5
df_zurich_future.loc['Affoltern, Oerlikon, Seebach','TRANSPORT'] = 5

# Manually compute the utilities for each neighbourhood alternative in the future situation
V_zurich_future = (beta_hat_MNL['Value']['B_city']      * df_zurich_future['CITY'] +
                   beta_hat_MNL['Value']['B_foreign']   * df_zurich_future['FOREIGN'] + 
                   beta_hat_MNL['Value']['B_green']     * df_zurich_future['GREEN'] + 
                   beta_hat_MNL['Value']['B_noise']     * df_zurich_future['NOISE'] + 
                   beta_hat_MNL['Value']['B_stores']    * df_zurich_future['STORES'] + 
                   beta_hat_MNL['Value']['B_transport'] * df_zurich_future['TRANSPORT'])

# Compute the market shares using the logit formula
P_future = np.exp(V_zurich_future)/np.sum(np.exp(V_zurich_future))

In [None]:
# Create a dataframe with the market shares in the present and future scenarios
df_zurich_marketshares = pd.DataFrame({'Present [%]': P_present*100, 'Future [%]': P_future*100})

# Show the dataframe
df_zurich_marketshares.round(2)

The table shows that the market share of `Affoltern, Oerlikon, Seebach` increases substantially: it more than doubles.<br> Hence, the model suggests that increasing accessibility will make the neighbourhood considerably more attractive. 


### `Exercise 4: Forecasting`

`A` Determine which neighbourhoods lose the most market share in (a) absolute and (b) relative terms.<br>

`B` Reflect on the behavioural realism of your results, especially w.r.t. the relative changes in market shares. <br>

`C` Currently, `Affoltern, Oerlikon, Seebach` is a quiet urban area. The creation of a public transport hub there will likely increase the average noise levels. Our earlier analysis showed that noise is an important factor in the residential location choice. Therefore, a change in noise levels needs also to be taken into account. <br>

* Create a plot showing how increasing noise for 1 to 4 leads to a deterioration of the market share of `Affoltern, Oerlikon, Seebach`. Thus, the *x*-axis shows the noise level, and the *y*-axis the market share.<br>

* Based on your results, what would you recommend to the planners of Zurich?

In [21]:
# Your code and answers

## END