### C3 p5 
Confirm the partialling out interpretation of OLS estimates by explicitly doing the partialling out for e.g. 3.2 : 
- data : WAGE1
  - est eq : log(wage) = 0.284 + 0.092 educ + 0.0041 exper + 0.022 tenure. 
- First, regressing educ on exper & tenure --> save the risidual r1_hat 
- second, regress log (wage ) on r1_hat 
- compare the coef on r1_hat with the coeff on educ in the regreesion of log(wage) on educ, exper, and tenure. 


In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm

path = '/Users/mouyasushi/Desktop/學校課程/econometrics/DateSets/Excel/wage1.xls'
# Read the data

column_names = {
    'wage': 'average hourly earnings',
    'educ': 'years of education',
    'exper': 'years potential experience',
    'tenure': 'years with current employer',
    'nonwhite': '=1 if nonwhite',
    'female': '=1 if female',
    'married': '=1 if married',
    'numdep': 'number of dependents',
    'smsa': '=1 if live in SMSA',
    'northcen': '=1 if live in north central U.S',
    'south': '=1 if live in southern region',
    'west': '=1 if live in western region',
    'construc': '=1 if work in construc. indus.',
    'ndurman': '=1 if in nondur. manuf. indus.',
    'trcommpu': '=1 if in trans, commun, pub ut',
    'trade': '=1 if in wholesale or retail',
    'services': '=1 if in services indus.',
    'profserv': '=1 if in prof. serv. indus.',
    'profocc': '=1 if in profess. occupation',
    'clerocc': '=1 if in clerical occupation',
    'servocc': '=1 if in service occupation',
    'lwage': 'log(wage)',
    'expersq': 'exper^2',
    'tenursq': 'tenure^2'
}

# Read the Excel file
df = pd.read_excel(path, names=list(column_names.keys()))

In [12]:
# Step 1: Regress educ on exper & tenure
# Create X matrix with constant term
X = df[['exper', 'tenure']]
X = sm.add_constant(X)
y = df['educ']

# Fit the model
model = sm.OLS(y, X).fit()

# Save residuals (r1_hat)
r1_hat = model.resid

# Print regression results
print("Step 1: Regression of Education on Experience and Tenure")
print("=" * 100)
print(model.summary().tables[1])
print("=" * 100)
print("Saved r1_hat: ")
print(r1_hat)

Step 1: Regression of Education on Experience and Tenure
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         13.5861      0.185     73.540      0.000      13.223      13.949
exper         -0.0741      0.010     -7.588      0.000      -0.093      -0.055
tenure         0.0475      0.018      2.593      0.010       0.012       0.084
Saved r1_hat: 
0     -0.050383
1     -2.437860
2     -3.655791
3     -1.162312
4      2.700659
         ...   
520    3.356588
521   -3.437860
522    1.521698
523    2.736978
524    0.594335
Length: 525, dtype: float64


In [13]:
# Step 2: Regress log(wage) on r1_hat
X2 = sm.add_constant(r1_hat)     
y2 = df['lwage']
model2 = sm.OLS(y2, X2).fit()

print("Step 2: Regression of Log(wage) on Residuals (r1_hat)")
print("===================================================")
print(model2.summary().tables[1])

Step 2: Regression of Log(wage) on Residuals (r1_hat)
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6242      0.021     78.489      0.000       1.584       1.665
0              0.0919      0.008     11.650      0.000       0.076       0.107


In [15]:
# Original full regression
X_full = df[['educ', 'exper', 'tenure']]
X_full = sm.add_constant(X_full)
y_full = df['lwage']

model_full = sm.OLS(y_full, X_full).fit()
print("Full Regression: log(wage) on educ, exper, and tenure")
print("=" * 80)
print(model_full.summary().tables[1])

Full Regression: log(wage) on educ, exper, and tenure
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2867      0.104      2.745      0.006       0.082       0.492
educ           0.0919      0.007     12.519      0.000       0.077       0.106
exper          0.0041      0.002      2.367      0.018       0.001       0.007
tenure         0.0221      0.003      7.126      0.000       0.016       0.028


# step3 : compare coef on r1_hat and full coef
- r1_hat coef : 0.0919 
- coef on educ : 0.0919 
- they are identical 

### C3 p6 
Use data set WAGE2 ( all of the following regress contains an intercept ):
- run a simple regress of IQ on educ to obtain the slope coef ( delta1_tilde)
- run simple regress of log(wage) on educ , obtain slope coef ( Beta1_tilde)
- run multiple regress of log(wage) on educ & IQ, obtain slope coef ( beta1_hat, beta2_hat)
- verify that Beta1_tilde = beta1_hat + beta2_hat * delta1+tilde 

In [17]:
path = '/Users/mouyasushi/Desktop/學校課程/econometrics/DateSets/Excel/wage2.xls'

# Column names and descriptions for WAGE2 dataset
column_names = {
    'wage': 'monthly earnings',
    'hours': 'average weekly hours',
    'IQ': 'IQ score',
    'KWW': 'knowledge of world work score',
    'educ': 'years of education',
    'exper': 'years of work experience',
    'tenure': 'years with current employer',
    'age': 'age in years',
    'married': '=1 if married',
    'black': '=1 if black',
    'south': '=1 if live in south',
    'urban': '=1 if live in SMSA',
    'sibs': 'number of siblings',
    'brthord': 'birth order',
    'meduc': "mother's education",
    'feduc': "father's education",
    'lwage': 'natural log of wage'
}

# Using this to read the data
df = pd.read_excel(path, names=list(column_names.keys()))

df.head()

Unnamed: 0,wage,hours,IQ,KWW,educ,exper,tenure,age,married,black,south,urban,sibs,brthord,meduc,feduc,lwage
0,808,50,119,41,18,11,16,37,1,0,0,1,1,.,14,14,6.694562
1,825,40,108,46,14,11,9,33,1,0,0,1,1,2,14,14,6.715384
2,650,40,96,32,12,13,7,32,1,0,0,1,4,3,12,12,6.476973
3,562,40,74,27,11,14,5,34,1,0,0,1,10,6,6,11,6.331502
4,1400,40,116,43,16,14,2,35,1,1,0,1,1,2,8,.,7.244227


In [18]:
# Step1 : Simple regression of IQ on education
X = df['educ']
X = sm.add_constant(X)
y = df['IQ']

model = sm.OLS(y, X).fit()
print("Regression of IQ on Education")
print("=" * 80)
print(model.summary().tables[1])
print(f"\nSlope coefficient (delta1_tilde): {model.params['educ']:.4f}")

Regression of IQ on Education
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         53.7041      2.625     20.457      0.000      48.552      58.856
educ           3.5328      0.192     18.366      0.000       3.155       3.910

Slope coefficient (delta1_tilde): 3.5328


In [19]:
# Simple regression of log(wage) on education
X = df['educ']
X = sm.add_constant(X)
y = df['lwage']

model_wage = sm.OLS(y, X).fit()
print("Regression of Log(wage) on Education")
print("=" * 80)
print(model_wage.summary().tables[1])
print(f"\nSlope coefficient (Beta1_tilde): {model_wage.params['educ']:.4f}")

Regression of Log(wage) on Education
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.9733      0.081     73.341      0.000       5.813       6.133
educ           0.0598      0.006     10.025      0.000       0.048       0.072

Slope coefficient (Beta1_tilde): 0.0598


In [20]:
# Multiple regression of log(wage) on education and IQ
X = df[['educ', 'IQ']]
X = sm.add_constant(X)
y = df['lwage']

model_multi = sm.OLS(y, X).fit()
print("Multiple Regression of Log(wage) on Education and IQ")
print("=" * 80)
print(model_multi.summary().tables[1])
print(f"\nSlope coefficients:")
print(f"beta1_hat (Education): {model_multi.params['educ']:.4f}")
print(f"beta2_hat (IQ): {model_multi.params['IQ']:.4f}")

Multiple Regression of Log(wage) on Education and IQ
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.6585      0.096     58.743      0.000       5.469       5.848
educ           0.0391      0.007      5.716      0.000       0.026       0.053
IQ             0.0059      0.001      5.872      0.000       0.004       0.008

Slope coefficients:
beta1_hat (Education): 0.0391
beta2_hat (IQ): 0.0059


### Verify 
- Beta1_tilda :  0.0598
- beta1_hat : 0.0391
- beta2_hat : 0.0059
- delta1_tilde : 3.5328
result : 0.0598 ≈ 0.0391 + 0.0059 * 3.5328 

### C4