# Application: Heterogeneous Effect of Gender on Wage Using Double Lasso

 We use US census data from the year 2012 to analyse the effect of gender and interaction effects of other variables with gender on wage jointly. The dependent variable is the logarithm of the wage, the target variable is *female* (in combination with other variables). All other variables denote some other socio-economic characteristics, e.g. marital status, education, and experience.  For a detailed description of the variables we refer to the help page.



This analysis allows a closer look how discrimination according to gender is related to other socio-economic variables.



In [1]:
library(hdm)

data(cps2012)

str(cps2012) # info from dataframe

"package 'hdm' was built under R version 3.6.3"

'data.frame':	29217 obs. of  23 variables:
 $ year        : num  2012 2012 2012 2012 2012 ...
 $ lnw         : num  1.91 1.37 2.54 1.8 3.35 ...
 $ female      : num  1 1 0 1 0 0 0 0 0 1 ...
 $ widowed     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ divorced    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ separated   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ nevermarried: num  0 0 0 0 0 0 1 0 0 0 ...
 $ hsd08       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hsd911      : num  0 1 0 0 0 0 0 0 0 0 ...
 $ hsg         : num  0 0 1 1 0 1 1 0 0 0 ...
 $ cg          : num  0 0 0 0 1 0 0 0 1 0 ...
 $ ad          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ mw          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ so          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ we          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ exp1        : num  22 30 19 14 15 23 33 23.5 15 15.5 ...
 $ exp2        : num  4.84 9 3.61 1.96 2.25 ...
 $ exp3        : num  10.65 27 6.86 2.74 3.38 ...
 $ exp4        : num  23.43 81 13.03 3.84 5.06 ...
 $ weight      : num  569 626 264 257 257 ...
 $ 

In [12]:
X <- model.matrix(~ -1 + female + female:(widowed + divorced + separated + nevermarried +
hsd08 + hsd911 + hsg + cg + ad + mw + so + we + exp1 + exp2 + exp3) + +(widowed +
divorced + separated + nevermarried + hsd08 + hsd911 + hsg + cg + ad + mw + so +
we + exp1 + exp2 + exp3)^2, data = cps2012)


# (a+b)^2 = a + b + a*b 

In [13]:
# from 12 to 136 regresors 

dim(X)

In [14]:
colnames(X)

In [15]:
length(colnames(X))

In [16]:
# apply variance to each column , ( 2 means columns )

apply(X, 2, var)



In [17]:
# create the model matrix for the covariates
X <- model.matrix(~-1 + female + female:(widowed + divorced + separated + nevermarried +
hsd08 + hsd911 + hsg + cg + ad + mw + so + we + exp1 + exp2 + exp3) + +(widowed +
divorced + separated + nevermarried + hsd08 + hsd911 + hsg + cg + ad + mw + so +
we + exp1 + exp2 + exp3)^2, data = cps2012)

X <- X[ , which(apply(X, 2, var) != 0)] # exclude all constant variables


demean<- function (x){ x- mean(x)}
X<- apply(X, 2, demean)
dim(X)



In [18]:
# target variables, index.gender specifices coefficients we are interested in

index.gender <- grep("female", colnames(X))
index.gender

In [19]:
y <- cps2012$lnw  # get log-wage 

The parameter estimates for the target parameters, i.e. all coefficients related to gender (i.e. by interaction with other variables) are calculated and summarized by the following commands:



In [20]:
# this cell takes a minute to run

effects.female <- rlassoEffects(x = X, y = y, index = index.gender)

# default partialling 
# post lasso in each step 

In [23]:
typeof(effects.female)

In [28]:
names(effects.female)

In [31]:
effects.female$se

In [32]:
result=summary(effects.female)
result$coef
library(xtable)
print(xtable(result$coef[,c(1,2,4)], type="latex"), digits=3)


Unnamed: 0,Estimate.,Std. Error,t value,Pr(>|t|)
female,-0.154923281,0.050162447,-3.08843149,0.002012161
female:widowed,0.136095484,0.090662629,1.50111997,0.1333245
female:divorced,0.136939386,0.0221817,6.1735297,6.6782e-10
female:separated,0.023302763,0.053211795,0.43792476,0.6614408
female:nevermarried,0.186853483,0.019942393,9.36966209,7.276511e-21
female:hsd08,0.027810312,0.120914496,0.22999982,0.8180919
female:hsd911,-0.11933504,0.051879684,-2.30022682,0.02143537
female:hsg,-0.01288978,0.019223188,-0.6705329,0.5025181
female:cg,0.010138553,0.018326505,0.553218,0.5801141
female:ad,-0.030463745,0.021806103,-1.39702838,0.162405


"package 'xtable' was built under R version 3.6.3"

% latex table generated in R 3.6.1 by xtable 1.8-4 package
% Fri Apr 29 09:48:36 2022
\begin{table}[ht]
\centering
\begin{tabular}{rrrr}
  \hline
 & Estimate. & Std. Error & Pr($>$$|$t$|$) \\ 
  \hline
female & -0.15 & 0.05 & 0.00 \\ 
  female:widowed & 0.14 & 0.09 & 0.13 \\ 
  female:divorced & 0.14 & 0.02 & 0.00 \\ 
  female:separated & 0.02 & 0.05 & 0.66 \\ 
  female:nevermarried & 0.19 & 0.02 & 0.00 \\ 
  female:hsd08 & 0.03 & 0.12 & 0.82 \\ 
  female:hsd911 & -0.12 & 0.05 & 0.02 \\ 
  female:hsg & -0.01 & 0.02 & 0.50 \\ 
  female:cg & 0.01 & 0.02 & 0.58 \\ 
  female:ad & -0.03 & 0.02 & 0.16 \\ 
  female:mw & -0.00 & 0.02 & 0.96 \\ 
  female:so & -0.01 & 0.02 & 0.67 \\ 
  female:we & -0.00 & 0.02 & 0.84 \\ 
  female:exp1 & 0.00 & 0.01 & 0.53 \\ 
  female:exp2 & -0.16 & 0.05 & 0.00 \\ 
  female:exp3 & 0.04 & 0.01 & 0.00 \\ 
   \hline
\end{tabular}
\end{table}


Now, we estimate and plot confident intervals, first "pointwise" and then the joint confidence intervals.

In [11]:
result$coef

Unnamed: 0,Estimate.,Std. Error,t value,Pr(>|t|)
female,-0.154923281,0.050162447,-3.08843149,0.002012161
female:widowed,0.136095484,0.090662629,1.50111997,0.1333245
female:divorced,0.136939386,0.0221817,6.1735297,6.6782e-10
female:separated,0.023302763,0.053211795,0.43792476,0.6614408
female:nevermarried,0.186853483,0.019942393,9.36966209,7.276511e-21
female:hsd08,0.027810312,0.120914496,0.22999982,0.8180919
female:hsd911,-0.11933504,0.051879684,-2.30022682,0.02143537
female:hsg,-0.01288978,0.019223188,-0.6705329,0.5025181
female:cg,0.010138553,0.018326505,0.553218,0.5801141
female:ad,-0.030463745,0.021806103,-1.39702838,0.162405
