In [3]:
# We will import the LinearModels module
# But first we need to make sure that we look for modules one folder up.
from sys import path
path.append('../')
import NonLinearModels_ante as nlm
import LinearModels as lm

In [4]:
import numpy as np
from numpy import linalg as la
from scipy.stats import norm
from scipy import optimize
from tabulate import tabulate

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Problem set 08: Non-linear models
## Labor participation of married women

The goal of this week's problem set is to investigate the labor participation of 
married women, using three different types of binary response models.
Binary response models are relevant when the dependent variable $y$ has two possible outcomes, 
e.g., $y=1$ if a person participates in the labor force, and $y=0$ if she does not.
The three models that you are asked to estimate are the Linear Probability Model (LPM), 
the Probit model and the Logit model. 

_Note:_ This week, most of the code has been created for you - you just need to fill in some blanks in the module `NonLinearModel.py`. To estimate the LPM-model using OLS, we will use the code that we have already used in the course, which is in the `LinearModels.py` file, and is preloaded as `lm`.

## Data

To conduct your analysis, you will use data coming from the following article, and reproduce
some of its results: 

> Michael Gerfin (1996): "Parametric and Semi-Parametric Estimation of the Binary Response Model of Labour Market Participation", _Journal of Applied Econometrics_ , Vol. 11, Issue 3, pp. 321-339, [DOI link](https://doi.org/10.1002/(SICI)1099-1255(199605)11:3%3C321::AID-JAE391%3E3.0.CO;2-K)

This article compares parametric and semiparametric methods for the estimation of binary choice
models, using two different data sets for Swiss and German women. In this assignment, you will
only work with the Swiss data, and implement parametric methods - we will discuss semiparametric 
methods in a later lecture. 

The data set $\texttt{swiss.txt}$ contains information about 872 women, of which 401
participate in the labor market (The data set was obtained from the Journal of Applied Econometrics Data Archive
at http://qed.econ.queensu.ca/jae/1996-v11.3/gerfin/.


The variables are defined in the table below (see, also, section 3 page 326 of the article).

|Var | Definition |
|--|--|
| $\texttt{LFP}     $|  = 1 if in labor force, 0 otherwise | 
| $\texttt{AGE}     $|  age in years (divided by 10) | 
| $\texttt{EDUC}    $|  number of years of formal education | 
| $\texttt{NYC}     $|  number of young children | 
| $\texttt{NOC}     $|  number of older children | 
| $\texttt{NLINC}   $|  logarithm of yearly non-labor income | 
| $\texttt{FOREIGN} $|  = 1 if permanent foreign resident, 0 otherwise |



### The following cells load the data for you.

In [5]:
# Load data
data =  np.loadtxt('swiss.txt')
n = data.shape[0]

lfp = data[:, 0].reshape(-1, 1)
nlinc = data[:, 1].reshape(-1, 1)
age = data[:, 2].reshape(-1, 1)
agesq = np.power(age, 2).reshape(-1, 1)
educ = data[:, 3].reshape(-1, 1)
nyc = data[:, 4].reshape(-1, 1)
noc = data[:, 5].reshape(-1, 1)
foreign = data[:, 6].reshape(-1, 1)

In [6]:
# Declare variables
y = lfp 
ones = np.ones((n, 1))

x = np.hstack((ones, age, agesq, educ, nyc, noc, nlinc, foreign))
k = x.shape[1]

In [7]:
# Declare labels
y_lab = 'lfp'
x_lab = [
    'const', 'age', 'agesq', 'educ', 'nyc', 'noc', 'nlinc', 'foreign'
]

### Question 1: Estimate model using LPM
We model Labour participation of females using an LPM model, which we estimate using OLS. Use the given `lm` module, and print it out in a nice table.

In [10]:
ols_results =  None # Fill in - Use the lm module and estimate y on x using OLS
# Print out in a nice table

### Question 2
Estimate the binary outcome using the probit model, but first, some theoretical foundation.

For binary outcome data the dependent variable takes on the values,

$$
y=\left\{ \begin{array}{c}
\hspace{-1.75em}1\quad\text{with prob. }p,\\
0\quad\text{with prob. }1-p.
\end{array}\right.
$$ 

We can then specify a regression model by
parameterizing the probability $p$ to depend on a regressor vector, $x$,
which is $N\times K$ and a parameter vector, $\beta$, which is
$K\times 1$. The conditional probability is given as,
$$
p_{i}=P\left(y_{i}=1\left|x\right.\right)=G\left(x_{i}^{\prime}\beta\right)
$$


### The Probit Model

For the Probit model the response probability is non-linear,
$$
P\left(y_{i}=1\left|x\right.\right)=G\left(x^{\prime}\beta\right)=\Phi\left(x^{\prime}\beta\right)=\int_{-\infty}^{x^{\prime}\beta}\phi\left(z\right)dz, \tag{1}
$$
where $\Phi\left(\cdot\right)$ is the standard normal cdf, with
derivative,
$$
\phi\left(z\right)=\left(\frac{1}{\sqrt{2\pi}}\right)\exp\left(\frac{-z^{2}}{2}\right)
$$
which is the standard normal density function.

The outcome in binary models is Bernoulli distributed (the binomial
distribution with only one trial). The probability mass function for
$y_{i}$ is
$$
y_{i}=f\left(y_{i}\left|x_{i}\right.\right)=p_{i}^{y_{i}}\left(1-p_{i}\right)^{1-y_{i}}\quad y_{i} \in \{0,1\}.
$$
The log-likelihood contribution for observation $i$ is,
$$
L\left(\beta\right)=y_{i}\log G\left(x_{i}^{\prime}\beta\right)+\left(1-y_{i}\right)\log\left(1-G\left(x_{i}^{\prime}\beta\right)\right)
$$
And the log--likelihood function is, 
$$
\begin{aligned}
L_{N}\left(\beta\right)=\sum_{i=1}^{N}\left\{ y_{i}\log G\left(x_{i}^{\prime}\beta\right)+\left(1-y_{i}\right)\log\left(1-G\left(x_{i}^{\prime}\beta\right)\right)\right\}
\end{aligned}
$$

In [11]:
# Fill in the empty parts in the probit_citerion() function. This takes some best guess beta parameters (in this case the beta parameters from a LPM is a good first guess), and the data values y and x. It returns a vector of likelihood 

# z should be the input of G, what does eq. (1) suggest that z is?
# What does eq. (1) suggets that the functional form of G?
# For G you can use scipy's norm.cdf()
# Finaly, write up the function for the likelihood contribution and return it

# You can check with the cell below that you have written the probit_criterion correctly.

In [12]:
np.isclose(np.sum(nlm.probit_criterion(ols_results['b_hat'], y, x)), -633.1)

TypeError: 'NoneType' object is not subscriptable

In [None]:
# You need to finish the estimate function in the nlm module. Try to do it yourself, but you can look at the previous problem set if you are stuck.

In [13]:
probit_result = nlm.estimate(
    nlm.probit_criterion, ols_results['b_hat'], y, x
)

TypeError: 'NoneType' object is not subscriptable

In [14]:
nlm.print_table(
    (y_lab, x_lab), probit_result, 
    title='Probit results', floatfmt='.3f'
)

NameError: name 'probit_result' is not defined

Your table should look aprox. this:

Probit results <br>
Dependent variable: lfp

|         |   Beta |    Se |   t-values |
|---------|--------|-------|------------|
| const   |  3.749 | 1.495 |      2.508 |
| age     |  2.075 | 0.417 |      4.978 |
| agesq   | -0.294 | 0.051 |     -5.783 |
| educ    |  0.019 | 0.018 |      1.062 |
| nyc     | -0.714 | 0.096 |     -7.417 |
| noc     | -0.147 | 0.050 |     -2.922 |
| nlinc   | -0.667 | 0.137 |     -4.861 |
| foreign |  0.714 | 0.121 |      5.920 |
In 36 iterations and 360 function evaluations.

### Question 3
Compare your results to those published in Gerfin (1996, p. 327 Table I) for the Probit model, see the table below. Interpret and compare the results from the two estimation approaches, both from a statistical and an economic point of view.

| Variable | $\hat{\beta}$ | s.e |
|----|---|---|
| $\texttt{CONST}   $|  3.75	| (1.41)  |
| $\texttt{AGE}     $|  2.08	| (0.41)  |
| $\texttt{AGESQ}   $|  -0.29|& (0.05) | 
| $\texttt{EDUC}    $|  0.02	| (0.02)  |
| $\texttt{NYC}     $|  -0.71|& (0.10) | 
| $\texttt{NOC}     $|  -0.15|& (0.05) | 
| $\texttt{NLINC}   $|  -0.67|& (0.13) | 
| $\texttt{FOREIGN} $|   0.71| & (0.12)|

### Question 4
Estimate the logit model with maximum likelihood, using the same explanatory variables as in
**Question 3**.

### The Logit Model

For the Logit model the response probability is non-linear,
$$
P\left(y_{i}=1\left|x\right.\right)=G\left(x^{\prime}\beta\right)=G\left(x^{\prime}\beta\right)=\frac{\exp\left(x^{\prime}\beta\right)}{1+\exp\left(x^{\prime}\beta\right)} \tag{2}
$$
The outcome in binary models is Bernoulli distributed (the binomial
distribution with only one trial). The probability mass function for
$y_{i}$ is,
$$
y_{i}=f\left(y_{i}\left|x_{i}\right.\right)=p_{i}^{y_{i}}\left(1-p_{i}\right)^{1-y_{i}}\quad y_{i} \in \{0,1\}
$$
with
$p_{i}=G\left(x_i^{\prime}\beta\right)=\frac{\exp\left(x_i^{\prime}\beta\right)}{1+\exp\left(x_i^{\prime}\beta\right)}$.
The log-likelihood contribution for observation $i$ is,
$$
L\left(\beta\right)=y_{i}\log G\left(x_{i}^{\prime}\beta\right)+\left(1-y_{i}\right)\log\left(1-G\left(x_{i}^{\prime}\beta\right)\right)
$$
And the log--likelihood function is, 
$$
\begin{aligned}
L_{N}\left(\beta\right)=\sum_{i=1}^{N}\left\{ y_{i}\log G\left(x_{i}^{\prime}\beta\right)+\left(1-y_{i}\right)\log\left(1-G\left(x_{i}^{\prime}\beta\right)\right)\right\}
\end{aligned}
$$

In [15]:
# Fill in the empty parts in the logit_criterion() function. This takes some best guess beta parameters (in this case the beta parameters from a LPM is a good first guess), and the data values y and x. It returns a vector of likelihood 

# z should be the input of G, what does eq. (3) suggest that z is?
# What does eq. (2) suggets that the functional form of G?
# Finaly, write up the function for the likelihood contribution and return it

# You can check with the cell below that you have written the logit_criterion correctly.

In [16]:
np.isclose(np.sum(nlm.logit_criterion(ols_results['b_hat'], y, x)), -606.5379)

TypeError: 'NoneType' object is not subscriptable

In [13]:
logit_result = nlm.estimate(
    # Fill in
)

TypeError: estimate() missing 4 required positional arguments: 'func', 'theta0', 'y', and 'x'

In [15]:
nlm.print_table(
    (y_lab, x_lab), logit_result, 
    title='Logit results', floatfmt='.3f'
)

Logit results
Dependent variable: lfp

           Beta     Se    t-values
-------  ------  -----  ----------
const     6.196  2.482       2.496
age       3.437  0.709       4.846
agesq    -0.488  0.087      -5.592
educ      0.033  0.030       1.082
nyc      -1.186  0.165      -7.201
noc      -0.241  0.083      -2.894
nlinc    -1.104  0.230      -4.792
foreign   1.168  0.203       5.769
In 45 iterations and 441 function evaluations.


Your table should approx. look like this:

Logit results <br>
Dependent variable: lfp

|         |   Beta |    Se |   t-values |
|---------|--------|-------|------------|
| const   |  6.196 | 2.482 |      2.496 |
| age     |  3.437 | 0.709 |      4.846 |
| agesq   | -0.488 | 0.087 |     -5.592 |
| educ    |  0.033 | 0.030 |      1.082 |
| nyc     | -1.186 | 0.165 |     -7.201 |
| noc     | -0.241 | 0.083 |     -2.894 |
| nlinc   | -1.104 | 0.230 |     -4.792 |
| foreign |  1.168 | 0.203 |      5.769 |
In 45 iterations and 441 function evaluations.

### Question 5
Calculate the ratio between the following beta coefficients:
- $\frac{\hat{\beta}_{Logit}}{\hat{\beta}_{OLS}}$
- $\frac{\hat{\beta}_{Probit}}{\hat{\beta}_{OLS}}$
- $\frac{\hat{\beta}_{Logit}}{\hat{\beta}_{Probit}}$

The explanation for why the coefficients are not equal for the LPM, the
Logit and the Probit model is that these three models use different link
functions for the probabilities and, as mentioned by Cameron & Trivedi,
it makes more sense to compare the marginal effects across the three
models. More specifically, the location and scale are set differently in
these models, leading to different parameter estimates. A rule of thumb
is that, 

$$
\begin{aligned}
\hat{\beta}_{Logit}&\simeq4\hat{\beta}_{OLS}\\
\hat{\beta}_{Probit}&\simeq2.5\hat{\beta}_{OLS}\\
\hat{\beta}_{Logit}&\simeq1.6\hat{\beta}_{Probit}
\end{aligned}
$$

In [16]:
# Use the dictionaries from the three estimations to calculate the ratios.
# The vectors might not line up, and you might have to transpose some of them.
# Then print them out, do they align with the rule of thumb?

[[3.7244563  5.03506263 5.02590696 4.90716746 4.92771395 4.88624976
  5.1873801  4.68091829]]
[[2.2534614  3.04053118 3.0336372  2.88361919 2.96923174 2.98085622
  3.13346871 2.86207081]]
[1.65277128 1.65598125 1.65672644 1.70173908 1.65959224 1.63921015
 1.65547531 1.63550052]


### Question 6
Calculate the marginal effect of taking one additional year of
education on the probability of participating in the labor market for a
woman with the following characteristics (Remember that the variable $\texttt{AGE}$ is divided by 10, so 2.5 does make sense):

$\texttt{AGE} $= 2.5,
$\texttt{EDUC}$ = 10,
$\texttt{NYC} $= 1,
$\texttt{NOC} $= 0,
$\texttt{NLINC}$ = 10,
$\texttt{FOREIGN}$ = 0. 

Consider education as a continuous variable. The marginal effect should be calculated for the LPM, the probit and the logit models.

The partial (also called marginal) effects in the Logit and Probit
models depend upon the regressors, $x_{k}$. For continuous variables the
partial effects are given as,
$$
\frac{\partial P\left(y_{i}=1\left|x_{i}\right.\right)}{\partial x_{ik}}=\frac{\partial p_{i}}{\partial x_{ik}}=\frac{\partial G\left(x_{i}^{\prime}\beta\right)}{\partial x_{ik}^{\prime}\beta}\cdot\frac{\partial x_{i}^{\prime}\beta}{\partial x_{ik}^{\prime}}=g\left(x_{i}^{\prime}\beta\right)\beta_{k} \tag{3}
$$
where $g\left(z\right)=\frac{\partial G\left(z\right)}{\partial z}$ and
where
$g\left(z\right)=\frac{\exp\left(z\right)}{\left(1+\exp\left(z\right)\right)^{2}}$
for the Logit model and
$g\left(z\right)=\frac{1}{\sqrt{2\pi}}\exp\left(\frac{-z^{2}}{2}\right)$
for the Probit model.

In [17]:
# Let us make a vector of the values we want to investigate
x_me = np.array([1.0, 2.5, 2.5**2, 10, 1, 0, 10, 0]).reshape(1, -11)

# Let us get the beta coefficients that we are interested in.
b_pr = probit_result.get('b_hat').reshape(-1, 1)
b_lg = logit_result.get('b_hat').reshape(-1, 1)

# Calculate the marginal effects bot for the logit and probit.
# For the probit, you can use norm.pdf for g
me_educ_pr =  # Use norm.pdf as g() in eq. (3)

# For the logit, g should be straight forward using the given function.
me_educ_lg =  # Use the given g() for the logit formula.

print(me_educ_pr)
print(me_educ_lg)

[[0.00762391]]
[[0.0081144]]


### Question 7
Calculate the marginal effect of being a permanent foreign resident on the probability of participating in the labor market.


For discrete variables the partial effects are given as,
$$
G\left(\beta_{0}+\beta_{1}x_{1}+\cdots+\beta_{k-1}x_{k-1}+\beta_{k}\right)-G\left(\beta_{0}+\beta_{1}x_{1}+\cdots+\beta_{k-1}x_{k-1}\right)
$$

where

$G\left(x^{\prime}\beta\right)=\frac{\exp\left(x^{\prime}\beta\right)}{1+\exp\left(x^{\prime}\beta\right)}$

for the Logit model and

$G\left(x^{\prime}\beta\right)=\Phi\left(x^{\prime}\beta\right)=\int_{-\infty}^{x^{\prime}\beta}\phi\left(z\right)dz$

and

$\phi\left(z\right)=\left(\frac{1}{\sqrt{2\pi}}\right)\exp\left(\frac{-z^{2}}{2}\right)$

for the Probit model.


In [18]:
# We will look at the same values as previously, but we want to look at the difference for foreign = 0 and foreign = 1.
x_me2 = x_me.copy()
x_me2[:, 7] = 1  # Keep everythin the same, but change foregin to 1

# For the probit, calculate the norm.cdf for foreign = 1, and subtract foreign = 0
me_foreign_pr =  # Use norm.cdf first with the vector with foreign = 1, then subtract using norm.cdf with the vector with foreign = 0.

# Do the same for the logit, calculate using the G() function for logit, for foreign = 1, and subtract foreign = 0
me_foreign_lg = (
    # Use the G() function for logit with the vector for foreign = 1
    # Use the G() function for logit with the vector for foreign = 0
)

print(me_foreign_pr)
print(me_foreign_lg)

[[0.26995807]]
[[0.27261031]]


### Calculate the standard errors of the marginal effects.
We use the delta method to calcualte the standard errors, by getting the gradient and using this to calculate the standard errors.

In [17]:
grad_c_pr = norm.pdf(x_me@b_pr)*(np.eye(k) - (b_pr@b_pr.T)@(x_me.T@x_me))
grad_d_pr = norm.pdf(x_me2@b_pr)@x_me2 - norm.pdf(x_me@b_pr)@x_me

NameError: name 'x_me' is not defined

In [18]:
def get_se(grad, cov):
    return np.sqrt(np.diag(grad@cov@grad.T))

se_c_pr = get_se(grad_c_pr, probit_result.get('cov'))
se_d_pr = get_se(grad_d_pr, probit_result.get('cov'))

NameError: name 'grad_c_pr' is not defined

In [19]:
table = [
    ['educ', me_educ_pr, se_c_pr[3]],
    ['foreign', me_foreign_pr, se_d_pr]
]
print('Marginal effects, Probit')
print()
print(tabulate(table, ['Var', 'Coeff', 'se'], floatfmt='.4f'))

NameError: name 'me_educ_pr' is not defined