<center><h1> EC485: In-Class Case Study</h1></center>

**Author(s):**
1. Belicia Rodriguez (belicia.rodriguez@emory.edu)

**Objectives**: This <ins>case study</ins> aims at
 1. Familiarize you with *real* requests in any entry-level data analyst job;
 2. Use *GitHub* to retrieve and submit computer code for *referece*, *version control*, and *future collaboration*.

**Instructions**:
 1. Please write down your R code and <ins>execute</ins> it in the cell below each question.
 
**Data Source**: [Introductory Econometrics: A Modern Approach](https://cran.r-project.org/web/packages/wooldridge/index.html) by Jeffrey Wooldridge

**Data Description**: 

```
CEOSAL2

salary    age       college   grad      comten    ceoten    sales     profits  
mktval    lsalary   lsales    lmktval   comtensq  ceotensq  profmarg  

  Obs:   177

  1. salary                   1990 compensation, $1000s
  2. age                      in years
  3. college                  =1 if attended college
  4. grad                     =1 if attended graduate school
  5. comten                   years with company
  6. ceoten                   years as ceo with company
  7. sales                    1990 firm sales, millions
  8. profits                  1990 profits, millions
  9. mktval                   market value, end 1990, mills.
 10. lsalary                  log(salary)
 11. lsales                   log(sales)
 12. lmktval                  log(mktval)
 13. comtensq                 comten^2
 14. ceotensq                 ceoten^2
 15. profmarg                 profits as % of sales
 ```

<center><h2> Questions</h2></center>

1. [5 points] Prepare your workspace and load the ```ceosal2``` data set from the ```wooldridge``` R package.

In [2]:
# download wooldridge package
if(!require(wooldridge)) install.packages('wooldridge')

# load ceosal2
data(ceosal2, package = 'wooldridge')

# view dataset
head(ceosal2)

Unnamed: 0_level_0,salary,age,college,grad,comten,ceoten,sales,profits,mktval,lsalary,lsales,lmktval,comtensq,ceotensq,profmarg
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>
1,1161,49,1,1,9,2,6200,966,23200,7.057037,8.732305,10.051908,81,4,15.580646
2,600,43,1,1,10,10,283,48,1100,6.39693,5.645447,7.003066,100,100,16.96113
3,379,51,1,1,9,3,169,40,1100,5.937536,5.129899,7.003066,81,9,23.668638
4,651,55,1,0,22,22,1100,-54,1000,6.478509,7.003066,6.907755,484,484,-4.909091
5,497,44,1,1,8,6,351,28,387,6.20859,5.860786,5.958425,64,36,7.977208
6,1067,64,1,1,7,7,19000,614,3900,6.972606,9.852194,8.268732,49,49,3.231579


<span style="color:blue">Comment: I recommend you create a function that downloads and install all necessary libraries you provide as input to this function.</span>

2. [10 points] Estimate the following model

$$
\begin{aligned}
\texttt{lsalary} = \beta_0+\beta_1\texttt{lsales}+\beta_2\texttt{lmktval}+\beta_3\texttt{profmarg}+\beta_4\texttt{comten}+\beta_5\texttt{comtensq}+\beta_6\texttt{ceoten}+\beta_7\texttt{ceotensq}+\beta_8\texttt{age}+\beta_9\texttt{college}+\beta_{10}\texttt{grad}+e,
\end{aligned}
$$

by the OLS estimator and report a 90% *heteroskedastic robust* (HC1) confidence interval for the average estimated elasticity of a CEO salary with respect to his/her firm size (measured by its sales). **Hint**: Use the ```coefci``` command in the ```lmtest``` R package.

In [23]:
# turn warnings off
options(warn=-1)

# create the model
outcome <- "lsalary"
predictors <- c("lsales", "lmktval", "profmarg", "comten", "comtensq", "ceoten", "ceotensq", "age", "college", "grad") 
m <- as.formula(paste(outcome, paste(predictors, collapse=" + "), sep=" ~ "))

# estimate model using OLS estimator
library(estimatr)
reg <- lm(m, data=ceosal2)
round(coef(reg),6)

# download lmtest
library(lmtest)

# report HC1 confidence interval
round(coefci(reg),6)


Unnamed: 0,2.5 %,97.5 %
(Intercept),3.547507,5.328791
lsales,0.106735,0.264957
lmktval,0.012217,0.206523
profmarg,-0.006762,0.001507
comten,-0.029612,0.017735
comtensq,-0.000598,0.000431
ceoten,0.019617,0.076412
ceotensq,-0.002096,-0.00018
age,-0.010121,0.010971
college,-0.477628,0.434215


<span style="color:blue">comment: Why are you using the ```estimr``` library? You are not specifying what ```vcov=?``` you want to use.</span>

3. Define $\widehat{\beta}_{1,(-i)}$ as the OLS estimator of the parameter $\beta_1$ in this model obtained by erasing the $i$ observation in the sample, i.e., the leave-one-out estimator of $\beta_1$.

    a. [40 points] Calculate $\left\{\widehat{\beta}_{1,(-1)},\widehat{\beta}_{1,(-2)},\dots,\widehat{\beta}_{1,(-176)},\widehat{\beta}_{1,(-177)}\right\}$ and print the standard summary statistics for these 177 values.

<span style="color:blue">comment: Your approach was the correct one, you needed to either create a loop or use matrix algebra. Your mistake was in the ```for()``` loop, it should be ```for(i in 1:nrow(X)){...}```. There was no need to make it descending or do a $n-1$ loop.</span>

In [75]:
# obtain original residual in model
e_hat <- resid(reg)

# calculate leverage values
hii <- hatvalues(reg)

# calculate prediction errors
e_tilde <- e_hat/(1-hii)

# create the design matrix
X <- model.matrix(m, data=ceosal2)

# calculate beta_1 (lsales) without the ith observation
# create an empty vector
beta1_no_i <- rep(NA, nrow(X))

# create for loop to collect the betas
for(i in (nrow(X)-1)) {
    # counter for the last observation
    i_last <- 177 - i
    
    # erase the i_last observation from the X matrix
    # X_erase <- X[]
    
    # calculate the OLS without observation i
    reg_no_i <- coef(reg)-solve(t(X)%*%X)%*%X[i_last,]%*%e_tilde[i_last]
    
    # add beta1 to the collection
    beta1_no_i[i] <- reg_no_i[2]
}
beta1_no_i

   b. [10 points] Is the range of these 177 values contained in the 90% confidence interval you calculated above?

Your **written answer** goes here.

4. [35 points] Calculate a 95% *heteroskedastic robust* (HC1) forecast interval for the average salary of a potential new CEO who is 40 years old, who attended college but not graduate school, who has worked for his/her current employer for 10 years, but has never been its CEO. His/her current company had 500 million USD in sales in 1990, 5% marginal profitability, and has been valued at 400 million USD in 1990.

In [73]:
# put the information about potential new CEO into a dataframe (for clarity)
new_ceo <- data.frame(age=40, college=1, grad=0, comten=10, ceoten=0, sales=500, mktval=400, profmarg=5)

# calculate conditional mean of x given new CEO parameters


<span style="color:blue">comment: This is *partly* correct. The model utilizes variables in natural logarithm like ```lsales```, ```lmktval```, as well as squares of ```comten``` and these were not defined in your evaluation data frame.</span>

In [72]:
head(ceosal2)

Unnamed: 0_level_0,salary,age,college,grad,comten,ceoten,sales,profits,mktval,lsalary,lsales,lmktval,comtensq,ceotensq,profmarg
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>
1,1161,49,1,1,9,2,6200,966,23200,7.057037,8.732305,10.051908,81,4,15.580646
2,600,43,1,1,10,10,283,48,1100,6.39693,5.645447,7.003066,100,100,16.96113
3,379,51,1,1,9,3,169,40,1100,5.937536,5.129899,7.003066,81,9,23.668638
4,651,55,1,0,22,22,1100,-54,1000,6.478509,7.003066,6.907755,484,484,-4.909091
5,497,44,1,1,8,6,351,28,387,6.20859,5.860786,5.958425,64,36,7.977208
6,1067,64,1,1,7,7,19000,614,3900,6.972606,9.852194,8.268732,49,49,3.231579
