## Problem #4. Logit

This problem is meant to help draw connections between GMM estimators and maximum likelihood estimators, with a particular focus on the 'logit' model.

The development of a maximum likelihood estimator typically begins with an assumption that some random variable has a (conditional) distribution which is known up a k-vector of parameters $\beta{}$. Consider the case in which we observe N independent realizations of a Bernoulli random variable Y, with $Pr(Y = 1|X) = \sigma{}(\beta{}^{T}X)$, and $Pr(Y = 0|X) = 1 - \sigma{}(\beta{^T}X).$

(1) Show that under this model $E(Y - \sigma{(X\beta{}}|X) = 0.$ Assume that $\sigma{}$ is a known function, and use this fact to develop a GMM estimator of $\beta{}$. Is your estimator just- or over-identified? 

We want to show that $E(Y - \sigma{(X\beta{})}|X) = 0$. 


We can take the $\sigma{X\beta{}}$ out and we get $E(Y|X) - \sigma{(X\beta{})}$. 

The probability of the expectation of Y given X can be written as $P(Y=1|X)\cdot{1} + P(Y=0|X)\cdot{0} - \sigma{(X\beta{})}$. 

The probability of Y = 0 given X multipled by 0 cancels out which gives us, $\sigma{(X\beta{})}\cdot{1} - \sigma{(X\beta{})} = 0$ and will subtract to 0.

Since we have one moment condition and three parameters, our estimator is just-identified.

In [4]:
import numpy as np
from scipy.optimize import minimize

def sigmoid(z):
    return 1/(1+np.exp(-z))

class GMMestimator:
    def __init__(self, endog, exog, instrument):
        self.endog = endog
        self.exog = exog
        self.instrument = instrument
        self.nobs, self.nvar = self.exog.shape
        self.null = np.zeros(self.exog.shape[1])
    
    def fit(self):
        def moments(beta):
            z = self.exog @ beta
            eps_hat = self.endog - sigmoid(z)
            return (self.instrument.T @ eps_hat, eps_hat * sigmoid(z) * (1 - sigmoid(z)))
        
        def gmm_objective(beta):
            mom, _ = moments(beta)
            return mom.T @ mom / self.nobs
        
        results = minimize(gmm_objective, self.null, method='BFGS', options={'disp': False})
        self.beta = results.x
        _, eps_hat = moments(self.beta)
        self.sigma = np.cov(eps_hat.T, ddof=self.nvar)
        return results
        
# Generate data
np.random.seed(123)
nobs = 1000
beta = np.array([1, 2, 3])
x = np.random.normal(size=(nobs, len(beta)))
z = np.random.normal(size=(nobs, 3))
z = np.column_stack((z, np.ones(nobs))) # add intercept to instrument
eps = np.random.normal(scale=0.5, size=nobs)
y = np.random.binomial(n=1, p=sigmoid(x @ beta), size=nobs)

# Instantiate the GMM estimator class and fit the model
gmm = GMMestimator(y, x, z)
results = gmm.fit()

# Print the results
print("Beta estimates:", results.x)

Beta estimates: [0.88955868 0.50268812 1.91392656]


(2) Show that the likelihood conditional on realizations of data (y, X) can be written as 

$$L(\beta{}|y, X) = \prod_{i=1}^N\sigma(\beta^T X_i)^{y_i}\left(1-\sigma(\beta^T X_i)\right)^{1-y_i}.$$ 

For Bernoulli distribution $Pr(Y = 1|X) = \sigma{(\beta{^T}X})$ and $Pr(Y = 0|X) = 1-\sigma{(\beta{^T}X)}$, we can get the individual likelihood function, $L(\beta{}|y_i,X_i) = \sigma{(\beta{^T}X_i)^y_i}{(1-\sigma{(\beta{^T}X_i}))^{1-y_i})}$, and obtain the likelihood function for all observation N by taking the product of all individual likelihoods:

$$ = \prod_{i=1}^N\sigma(\beta^T X_i)^{y_i}\left(1-\sigma(\beta^T X_i)\right)^{1-y_i} $$

This gives us the likelihood function for logistic regression, which we can use to estimate the values of the regression coefficients that maximize the likelihood, given the observed data.

(3) To obtain the maximum likelihood estimator (MLE) one can chose b to maximize log L(b|y,X). When the likelihood is well-behaved, the MLE estimator satisfies the first order conditions (also called the "scores") from this maximization problem, in which case this is called a "type I" MLE. Let $\sigma{(z)} = 1/(1+e^{-z})$ (this
is sometimes called the logistic function, or the sigmoid function), and obtain the scores $S_n(b)$ for this estimation problem. Show that $ES_n(\beta{}) = 0$. Demonstrate that these moment conditions can serve as the basis for a GMM estimator of $\beta{}$, and compare this estimator to the GMM estimator you developed above. Which is more efficient, and why?

In [47]:
import numpy as np
from scipy.optimize import minimize

# Generate some sample data
np.random.seed(123)
n = 100
p = 3
X = np.random.normal(size=(n,p))
beta_true = np.array([1,2,3])
y = np.random.binomial(n=1, p=sigmoid(np.dot(X, beta_true)))

# Define the sigmoid function
def sigmoid(z):
    return 1/(1+np.exp(-z))

# Define the log-likelihood function
def log_likelihood(beta, y, X):
    z = np.dot(X, beta)
    return np.sum(y*np.log(sigmoid(z)) + (1-y)*np.log(1-sigmoid(z)))

def scores(beta, y, X):
    z = np.dot(X, beta)
    return (sigmoid(z).reshape(-1, 1) - y.reshape(-1, 1)) * X

# Define the weight matrix function
def weight_matrix(S):
    W = np.dot(S.T, S)/n
    return W

# Define the moment condition function
def moment_condition(beta, y, X):
    z = np.dot(X, beta)
    return X*(y - sigmoid(z))[:,np.newaxis]

# Calculate the MLE
result = minimize(lambda beta: -log_likelihood(beta, y, X), np.zeros(p))
beta_mle = result.x

# Calculate the scores and GMM estimator
S = scores(beta_mle, y, X)
W = weight_matrix(S)
beta_gmm1 = beta_mle + np.dot(np.linalg.inv(W), np.dot(S.T, y - sigmoid(np.dot(X, beta_mle))))/n

# Calculate the moment conditions and second-step GMM estimator
Z = moment_condition(beta_gmm1, y, X)
G = np.dot(Z.T, Z)/n
beta_gmm2 = beta_gmm1 + np.dot(np.linalg.inv(G), np.dot(Z.T, y - sigmoid(np.dot(X, beta_gmm1))))/n


print("True beta:", beta_true)
print("MLE:", beta_mle)
print("GMM1:", beta_gmm1)
print("GMM2:", beta_gmm2)

True beta: [1 2 3]
MLE: [1.58390826 1.79227628 3.10504348]
GMM1: [1.6478029  1.56213396 2.84750278]
GMM2: [1.56187198 1.80464808 3.09554716]


This GMM estimator is more efficient than the above estimator, since it can make use of additional information in the moment conditions.