# Binary Response Models

## 2A. Binary Response Framework

### Preliminaries

- Model assumptions are analogous to MLR.1-MLR.6

    - Model is correctly specified
    
    - Random sample from the population
    
    - Conditional variation in each explanatory variable
    
    - Zero conditional mean
    
    - Homoskedasticity (or robust standard errors)
    
    - Assumption about the error distribution (e.g. normal, logistic)
    
-  Goal: Learn how $x_1, x_2, ..., x_k$ affect probability of making a choice 

- Simplyfying notation:

    - Let $X\beta = \beta_0 + \beta_0x_1 + ... + \beta_0x_k$
    
    - Let $X = 1, x_1, x_2, ..., x_k$

### Latent variable model

- Let $c$ be a latent (i.e. unobserved) continuous choice variable such that:

    - $y=1$ if $c = X\beta + u > 0$
    
    - $y=0$ if $c = X\beta + u \leq 0$
    
    - $u$ is the econometric error

- Let $p$ be the response probability $0 \leq p \leq 1$
    
- The probability that $y = 1$:
$$p = p(y = 1|X)$$
$$p  = p(c > 0 | X)$$
$$p  = p(X\beta + u> 0 | X)$$
$$p = p(u > - X\beta|X)$$
$$p = f(X\beta)$$
    
- The shape of the $f(X\beta)$ function depends on the distribution of $u$

    - Probit model: $u$ ~ standard normal
    
    - Logit model: $u$ ~ standard logistic
    
- The standard logistic distribution is symmetric, like the normal, but with fatter tails.

<img src="images/normal-logistic.png" alt="Distribution: Normal vs Logistic">

## 2B. Binary Probit and Logit Models

### Logit Model:

$$p(y = 1|X) = f(X\beta) = \frac{e^{X\beta}}{1 + e^{X\beta} }$$

Notice:

   - $f\rightarrow 0$ as $X\beta \rightarrow -\infty$
   
   - $f\rightarrow 1$ as $X\beta \rightarrow \infty$
    
<img src="images/logit.png" alt="Logit">

#### Calculating Marginal Effects

$$p(y = 1|X) = f(X\beta) = \frac{e^{X\beta}}{1 + e^{X\beta} }$$

**Example:** Suppose $p$ depends on two variables $x_1$ and $x_2$.

   - The values of $x_1$ and $x_2$. Let's set $x_2$ at its average: $x_2 = \overline{x_2}.$ 
   
   - Let's calculate the **change** in $p$ *increasing* $x_1$ *from $1$ to $2$*.
   
$$\Delta p=  p(y = 1|x_1 = 2, x_2 = \overline{x_2}) -p(y = 1|x_1 = 1, x_2 = \overline{x_2}) $$


$$\Delta p=
\frac{e^{\beta_0 + \beta_12 + \beta_2\overline{x_2}}}{1 + e^{\beta_0 + \beta_12 + \beta_2\overline{x_2}}} = \frac{e^{\beta_0 + \beta_1 + \beta_2\overline{x_2}}}{1 + e^{\beta_0 + \beta_1 + \beta_2\overline{x_2}}}$$

**Note:** the answer would be different for increasing $x_1$ from $2$ to $3$ because of the non linearity.

### Probit Model

<img src="images/probit.png" alt="Probit">

## 2C. Maximum Likelihood Estimation 

- Unlike OLS we must know the distribution of the error term to implement maximum likelihood estimation.

- Intuition for MLE: Choose values for the $\beta's$ that maximize the likelihood of observing outcomes in your data.

- **Likelihood function:** $$f(X_i\beta)^{y_i}[1 - f(X_i\beta)]^{(1- y_i)}$$

- **Log-likelihood function (LLF):**

$$\lambda_i(\beta) = y_iln[f(X_i\beta)] + (1-y_i)ln[1- f(X_i\beta)],$$

where $\lambda_i$ represents a particular value of the function for observation $i$
    
- **Estimation:** Choose $\hat{\beta}$ to maximize sum of the values of log-likelihood function, $L$.

$$L =\sum_{i=1}^{n}\lambda_i(\beta)$$

**Note: Larger values for LLF imply more accurate predictions for $y_i$.**

### Perfect Prediction

#### *Case 1: Perfect Prediction* $y_i = 1 = f(X_i\hat{\beta})$

Suppose the LLF predicts the outcomes of the data that we observe perfectly. That means if an agent choses to participate (the value of data is 1), the binary choice model correctly predicts with probability $1$: 

$$y_i = 1 = f(X_i\hat{\beta})$$

Then LLF becomes


$$\lambda_i(\beta) = y_iln[f(X_i\beta)] + (1-y_i)ln[1- f(X_i\beta)]$$


$$\lambda_i(\beta) = y_i\underbrace{ln[f(X_i\beta)]}_{0} + \underbrace{(1-y_i)}_{0}ln[1- f(X_i\beta)],$$

**That is, LLF takes on a value of zero when the model predicts correctly.**

#### *Case 2: Perfect Prediction* $y_i = 0 = f(X_i\hat{\beta})$

When an agent choses not to participate (the value of data is 0), the binary choice model correctly predicts with probability $1$: 

$y_i = 0 = f(X_i\hat{\beta})$

Then LLF becomes


$$\lambda_i(\beta) = y_iln[f(X_i\beta)] + (1-y_i)ln[1- f(X_i\beta)]$$

$$\lambda_i(\beta) = \underbrace{y_i}_{0}ln[f(X_i\beta)] + (1-y_i)\underbrace{ln[1- f(X_i\beta)]}_{0}$$



**That is, LLF takes on a value of zero when the model predicts correctly.**

**Pefect prediction implies 0 values for LLF: $\lambda_i(\hat{\beta}) = 0$**  

### Imperfect Prediction

#### *Case 1* 

$$y_i = 1 > f(X_i\hat{\beta})$$

- Actual value is $1$

- Model predicts less than $1$


LLF becomes:
$$\lambda_i(\beta) = \underbrace{y_i}_{1}\underbrace{ln[f(X_i\beta)]}_{\text{negative}} + \underbrace{(1-y_i)}_{0}ln[1- f(X_i\beta)]$$



We just have the first term, which is **negative**.

#### *Case 2* 

$$y_i = 0 < f(X_i\hat{\beta})$$

- Actual value is $0$

- Model predicts greater than $0$

LLF becomes:
$$\lambda_i(\beta) = \underbrace{y_i}_{0}ln[f(X_i\beta)] + \underbrace{(1-y_i)}_{1}\underbrace{ln[1- f(X_i\beta)]}_{\text{negative}}$$

We just have the second term, which is **negative**.


### Hypothesis Testing for MLE

- z-tests (similar to OLS t-tests)

- Wald statistics (similar to OLS F-tests)

- Confidene intervals

### Goodness of Fit - pseudo R-squared

- Pseudo $R^2 = 1 - \frac{L_{UR}}{L_0}$
    - where $L_0$ is (constant) for the LFF with intercept only
    
- As $L_{UR}$ increases, $L_{UR} \rightarrow 0$
    - which implies $\frac{L_{UR}}{L_0} \rightarrow 0$ and Pseudo $R^2 \rightarrow 1$

###  Potential Problems

- Heteroskedasticity

- Nonrandom samples

- Violation of the zero conditional mean

- Incorrect assumption about model error

These can cause MLE to be biased and inconsistent.