In [1]:
library('tidyverse')

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [5]:
# Load the data
data <- read_tsv("podatki_7.txt", col_names = TRUE)

[1mRows: [22m[34m100[39m [1mColumns: [22m[34m3[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[32mdbl[39m (3): weight, mpg, foreign

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [6]:
head(data)

weight,mpg,foreign
<dbl>,<dbl>,<dbl>
4100,18,0
2720,47,0
3760,13,1
2530,19,1
4720,47,0
3210,41,0


### 7.a. Likelihood Function
For each car:
- $Y_i = \text{foreign}_i \in \{0, 1\}$
- $p_i = 1 - exp(-exp(\beta_0 + \beta_1 \text{weight}_i + \beta_2 \text{mpg}_i))$

- $Y_i \sim \text{Bernoulli}(p_i)$

Assuming independent observations, the likelihood function is given by:
$$
L(\beta_0, \beta_1, \beta_2) = \prod_{i=1}^{n} p_i^{Y_i} (1 - p_i)^{1 - Y_i}
$$
where $n$ is the number of cars.

The log-likelihood function is:
$$
\ell(\beta_0, \beta_1, \beta_2) = \sum_{i=1}^{n} \left( Y_i \log(p_i) + (1 - Y_i) \log(1 - p_i) \right)$$

The negative of this function is called the *log-loss* used in neural networks for classification!

### 7.b. Estimate the Parameters using Maximum Likelihood Estimation (MLE)

Since this function is the negative of the classical log-loss, we can fit a logistic regression model, specifying the link function this one. It will automatically find the best $\beta_0, \beta_1, \beta_2$ parameters that maximize the log-likelihood function.


In [10]:
reg.log <- glm(foreign ~ ., data = data, family = binomial(link = "cloglog"))
summary(reg.log)


Call:
glm(formula = foreign ~ ., family = binomial(link = "cloglog"), 
    data = data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.1159225  1.5112535   4.709 2.49e-06 ***
weight      -0.0021187  0.0004258  -4.975 6.51e-07 ***
mpg         -0.0915402  0.0244097  -3.750 0.000177 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 107.855  on 99  degrees of freedom
Residual deviance:  61.529  on 97  degrees of freedom
AIC: 67.529

Number of Fisher Scoring iterations: 7


The estimated coefficients are:
  - $\hat{\beta_0} = 7.1 \pm 1.5$
  - $\hat{\beta_1} = (-2.11 \pm 0.43) {\times} 10^{-3} $
  - $\hat{\beta_2} = (-9.2 \pm 2.4) {\times} 10^{-3}$

For all coefficients, the p-values are $\ll 0.05$, indicating strong evidence that these explanatory variables are associated with the probability of a car being foreign.

### 7.c Fisher's Information Matrix

The Fisher Information Matrix is given by the second derivative of the log-likelihood function with respect to the parameters. The diagonal values show how well each parameter is estimated: larger values mean higher information and hence smaller standard error for the estimate. Off-diagonal values show correlation between estimates: large absolute values mean that changes in one parameter’s estimate tend to go with changes in another.

For each parameter $\beta_j$, the Fisher Information Matrix is given by:

$$
[I(\beta)]_{jk} = -E\left[\frac{\partial^2 \ell(\beta)}{\partial \beta_j \partial \beta_k^T}\right]
$$

This matrix is exactly the inverse of the covariance matrix of the parameter estimates. 

In [11]:
Sigma <- vcov(reg.log)
fisher_information_matrix <- solve(Sigma)
fisher_information_matrix

Unnamed: 0,(Intercept),weight,mpg
(Intercept),18.54474,45300.6,500.2597
weight,45300.59696,118233653.4,1153114.828
mpg,500.25968,1153114.8,15800.0761


**Interpretation:**  
The large value on the diagonal for `weight` (1.18e8) means that the data give us a lot of information about how car weight affects the probability of a car being foreign, so this coefficient is estimated quite precisely. The noticeable off-diagonal entries, like the ones between `weight` and the intercept (45,300), and between `weight` and `mpg` (1.15e6), tell us that the uncertainties for these parameter estimates are related. In other words, if our estimate for one parameter changes, the estimate for the other might also shift in a way that still keeps the model fitting the data well.

This doesn't mean our estimates are unreliable, since all the diagonal values are large, all the coefficients are still estimated accurately. It just means that the uncertainties in some parameters are somewhat connected, which is pretty normal in regression when predictors are not completely independent.


### 7.d. Standard Errors of the Estimates

The standard errors of the estimates are obtained from the square root of the diagonal elements of the inverse of the Fisher Information Matrix. These standard errors can be used to construct confidence intervals for the parameter estimates.

We can check if they match the standard errors obtained from the logistic regression model summary.

In [17]:
variances <- sqrt(diag(Sigma))
print(variances)

 (Intercept)       weight          mpg 
1.5112535414 0.0004258285 0.0244097418 


These standard errors are the same as those obtained from the logistic regression model summary.

### 7.e. Test $\mathcal{H}_0: \beta_1 = \beta_2 = 0$ with a Likelihood Ratio Test

The LRT statistic is given by:

$$
\Lambda = -2 \left( \ell(\hat{\beta}) - \ell(\hat{\beta}_0) \right)$$

where $\ell(\hat{\beta})$ is the log-likelihood of the full model and $\ell(\hat{\beta}_0)$ is the log-likelihood of the reduced model (with $\beta_1 = \beta_2 = 0$).

This statistic is asymptotically distributed as a chi-squared distribution with degrees of freedom equal to the number of constraints in the null hypothesis, which is 2 in this case.

In [18]:
# fit the restricted model
reg.log.restricted <- glm(foreign ~ 1, data = data, family = binomial(link = "cloglog"))
summary(reg.log.restricted)


Call:
glm(formula = foreign ~ 1, family = binomial(link = "cloglog"), 
    data = data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.3418     0.2091  -6.417 1.39e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 107.86  on 99  degrees of freedom
Residual deviance: 107.86  on 99  degrees of freedom
AIC: 109.86

Number of Fisher Scoring iterations: 5


In [21]:
# extract the log-likelihoods of both models
logLik_full <- logLik(reg.log)
logLik_restricted <- logLik(reg.log.restricted)

# calculate the likelihood ratio statistic
likelihood_ratio_statistic <- -2 * (logLik_restricted - logLik_full)

# calculate the p-value
p_value <- pchisq(likelihood_ratio_statistic, df = 2, lower.tail = FALSE)
p_value

'log Lik.' 8.718793e-11 (df=1)

The p-value $\approx 8.72 {\times} 10^{-5} \ll 0.05$ indicates strong evidence against the null hypothesis, suggesting that at least one of the coefficients $\beta_1$ or $\beta_2$ is significantly different from zero.