# RMF Lab 3 week 06 Solutions

This content is authored by Maria Boutchkova for use in the University of Edinbugh Business School research Methods in Finance course in Spring 2023.

Make sure to have covered the material preceeding this Notebook. This notebook goes with the pdf document lab3_instr.pdf

This notebook covers more advanced commands and functions around:

* Dealing with outliers
* Scatterplots
* Running regressions robust to outliers

The first computational cell below (with In [ ] in front) contains the solution. Go over the command lines, make sure they make sense to you, click inside the cell, it should become surrounded by a green rectangle, press Esc - the rectangle will become blue, now press Shift+Enter - this will execute the cell and produce the results beneath it.
While a cell is being executed it shows an asterisk.

To remove all output in the notebook and start again, go to the Kernel tab above, select Restart and Clear Output.

In this notebook we use the functionality of Stata. If you want to explore its full documetation, see here: https://www.stata.com/bookstore/getting-started-windows/

The comment sign in Stata is *

Magics are programs provided by stata_kernel that enhance the experience of working with Stata in Jupyter. Magics start with %. 
The browse magic simply shows us the data in memory (%browse).

The help command allows you to pull stata documentation inside the current notebook, while in regular desktop Stata it opens the Viewer window. Feel free to do this for every command you see used below. 

## Task 1: Read in data and produce scatterplots

Import the data for this lab using the same command as before (import delimited). Examine the variables see the pdf instructions for their definitions.

In [None]:
import delimited BoardGenderPanel.csv, varn(1) clear

In [None]:
%browse

Task 1.1 Drop the logged (starting with ln) variables for the purposes of dealing with outliers. The command is drop and the way we tell stata: “drop all variables starting with ln” is: drop ln*

In [None]:
drop ln*

Task 1.2 Now we are going to use a trick to keep only the observations that have no missing data in all the variables we have in memory. It involves running a regression on all variables with the prefix quietly (so as not to clutter the screen with pages of output): qui reg roa - ceopay ceopay (all numeric variables starting with roa and ending with ceopay that way they are ordered in the data). Then we are going to tell stata to keep only the observations that were used in the regression we just ran above – which of course all have non-missing data in all variables: keep if e(sample)

In [None]:
qui reg roa - ceopay
keep if e(sample)

Task 1.3 We will employ the useful unab command as follows: unab fin_vars: roa – rd_a and then we will display the contents of the container fin_vars by typing: display "`fin_vars'"

In [None]:
unab fin_vars: roa - rd_a
di "`fin_vars'"

Task 1.4 No we do the same for the board variables bgen - ceopay

In [None]:
unab bd_vars: bgen - ceopay
di "`bd_vars'"

Task 1.4 Produce scatter plots for all variables (financial and board-related) with respect to firm size, proxied by total assets (ta). To do this efficiently in a few command lines, we will use a local macro list utility, i.e utility for manipulating macro lists. First we repeat the declarations of fin_vars and db_vars from tasks 1.3 and 1.4 because local macros exist only while a chunk of code is running and they disappear once it has run. Then we save the variable name ta in a local macro called drop: local drop ta, then we will make anew container named fin_no_ta containing all financial variables without ta: local fin_no_ta: list fin_vars - ta. Last, we produce the scatterplots in a loop over the two containers vars_no_ta and bd_vars:
<br>
<code>foreach v of local vars_no_ta {</code>
<br>
<code>scatter ta `v', name(fin_`v', replace)<code>
<code>}

In [None]:
unab fin_vars: roa - rd_a
unab bd_vars: bgen - ceopay
local drop ta
local vars_no_ta: list fin_vars - drop
di "`vars_no_ta'"

foreach v of local vars_no_ta {
scatter ta `v', name(fin_`v', replace)
}

foreach v of local bd_vars {
scatter ta `v', name(bd_`v', replace)
}

## Task 2: Outlier robust regression

Task 2.1 Generate the two logged variables lnta and lnceopay that we dropped.

In [None]:
gen lnTA = ln(ta)
gen lnCEOPay = ln(ceopay)

Task 2.2 Declare the two macros of fin and board-related variables as in 1.4 and in addition drop the variable ceopay from the bd_vars list as we did with ta in 1.4. And drop both ta and tobinq from fin_vars since tobinq will be our dependent variable and has to be first in the regress command.

In [None]:
unab fin_vars: roa - rd_a
unab bd_vars: bgen - ceopay
local drop ta tobinq
local fin_no_ta_tq: list fin_vars - drop
local drop ceopay
local bd_no_pay: list bd_vars - drop
eststo clear
qui eststo: rreg tobinq lnTA lnCEOPay `fin_no_ta_tq' `bd_no_pay', genwt(outlier_weight)
qui eststo: reg tobinq lnTA lnCEOPay `fin_no_ta_tq' `bd_no_pay', vce(cluster gvkey)

Task 2.3 Report the resutls of the two regressions next to each other to be able to compare the results as we did in lab2. The following options are useful to make it look more informative: se s(N r2_a, label("Obs" "R-sq") fmt(%9.0fc 3 )) mtitles ("Heterosk-robust" "Outlier-robust") star(* 0.1 ** 0.05 *** 0.01) varlabels(\_cons "Intercept")

In [None]:
esttab , se s(N r2_a, label("Obs" "R-sq") fmt(%9.0fc 3 )) mtitles ("Heterosk-robust" "Outlier-robust") star(* 0.1 ** 0.05 *** 0.01) varlabels(_cons "Intercept")