*By what method the coefficients are estimated in a model of dependent variable
following the Poisson distribution?*

**Answer:** d) Maximizing the log-likelihood function

*Differences between Linear and Logistic Regression with dummy dependent variable*

**Answer:** 
	
**Linear Regression** assumes the dependent variable is **continuous**. 
**Logistic Regression** assumes the dependent variable is **binary** **(0/1)**.

**Linear Regression** sses *Ordinary Least Squares (OLS)* for estimation.
**Logistic Regression** sses *Maximum Likelihood Estimation (MLE)* for estimation. 

### Import Libraries

In [1]:
import pandas as pd
import statsmodels.formula.api as smf
from loguru import logger

In [2]:
df = pd.read_csv("employment_08_09.csv")

df["white"] = (df["race"] == 1).astype(int)
df["age2"] = df["age"] ** 2

In [6]:
model = smf.logit("employed ~ age + age2 + white + female", data=df)
result = model.fit()
logger.success(f"\n{result.summary()}")

[32m2025-10-08 21:27:19.510[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36m<module>[0m:[36m3[0m - [32m[1m
                           Logit Regression Results                           
Dep. Variable:               employed   No. Observations:                 5412
Model:                          Logit   Df Residuals:                     5407
Method:                           MLE   Df Model:                            4
Date:                Wed, 08 Oct 2025   Pseudo R-squ.:                 0.02698
Time:                        21:27:19   Log-Likelihood:                -1979.3
converged:                       True   LL-Null:                       -2034.2
Covariance Type:            nonrobust   LLR p-value:                 8.199e-23
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.6722      0.449     -5.946      0.000      -3.553      -1.791
age          

Optimization terminated successfully.
         Current function value: 0.365730
         Iterations 6


**Intercept:** Negative, meaning very low baseline employment probability. 

**Age:** Positive and highly significant: as age increases, log-odds of employment increase.

 **White:** Positive and significant: being white increases log-odds of employment compared to non-whites.

 **Female:** Negative and statistically significant: being female reduces log-odds of employment compared to males.
  
 **Age^2:** Negative and highly significant: the effect of age is nonlinear. Employment probability rises with age  initially but then slows down and eventually declines.

### c) What is the odd ratio for non-white female of age 30? 

In [7]:
person_30 = pd.DataFrame({
    "age": [30],
    "age2": [30**2],
    "white": [0],   # non-white
    "female": [1]   # female
})

prob_30 = result.predict(person_30)[0]
odds_30 = prob_30 / (1 - prob_30)

logger.success(f"\nOdds ratio (non-white female age 30): {odds_30}")

[32m2025-10-08 21:27:31.361[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36m<module>[0m:[36m11[0m - [32m[1m
Odds ratio (non-white female age 30): 4.749696308251459[0m


### d) What is the probability that the white male of age 35 is employed? 

In [8]:
person_35 = pd.DataFrame({
    "age": [35],
    "age2": [35**2],
    "white": [1],   # white
    "female": [0]   # male
})

prob_35 = result.predict(person_35)[0]
logger.success(f"\nProbability (white male age 35): {prob_35}")

[32m2025-10-08 21:27:41.561[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36m<module>[0m:[36m9[0m - [32m[1m
Probability (white male age 35): 0.9122244260202702[0m
