<h1>ECON 140R Class 01</h1>

This is a Jupyter notebook, written with R, and running on [https://datahub.berkeley.edu](https://datahub.berkeley.edu)

The "dad joke" is that this ECON 140 will be taught at least partially in R, and also it is the first letter of your instructor's first name.

Markdown is a superb tool for writing in clean text. It cannot write clear prose for you, but when you write clear prose, it will render it cleanly. And well, that is what we might call heaven on earth. Visually and pedagogically clear.

&nbsp;

Markdown lets us write math type too. Us oldsters and geeks might call it $\LaTeX$. Check this out:

$$y = \alpha + \beta x + \epsilon$$

This is a statement that $y$ is linear in $x$, with some error, $\epsilon$, to the tune of an intercept, $\alpha$, and a slope, $\beta$.

We can get fancy with terms, or we can stay simple. Do whatever feels right to you. These are all roughly equivalent: the partial derivative, $\partial y/\partial x = \beta$; the slope is $\beta$; the <i>effect</i> of $x$ on $y$ might be $\beta$ maybe; the association between $x$ and $y$ is $\beta$; etc.

&nbsp;

Econometrics is all about:

* Thinking of the world in terms of equations like this

* Being clever or realistic about how to measure the "real" $\beta$ — the causal one —in data

&nbsp;

The true magic of the Jupyter notebook comes with the interlacing of code and prose together. Let us jump ahead several classes and examine something we will later understand as <b>Omitted Variable Bias</b> using a useful repository of data from Jeffrey Wooldridge's excellent textbook, <i>Introductory Econometrics, a Modern Approach</i>. 

This appears as Example 9.3 on page 281 of the 6th edition, and it draws on a dataset provided by [Blackburn and Newmark (1992)](https://www-jstor-org.libproxy.berkeley.edu/stable/2118394) on monthly earnings and other characteristics among men in 1980.

(This particular example is unfortunately not found within Florian Heiss's excellent R version of this book at [http://www.urfie.net/read/index.html](http://www.urfie.net/read/index.html))

Helpfully, folks have dumped all Wooldridge's public datasets into an R package for us to use. Here is code that sets that up. Highlight the code snippet with your mouse or trackpad, and hit <tt>SHIFT+ENTER</tt>

In [1]:
install.packages('wooldridge')

Installing package into ‘/opt/r’
(as ‘lib’ is unspecified)



This command digs into that loaded package and retrieves part of it for our data:

In [2]:
data(wage2, package='wooldridge')

There are several ways of probing what it is that we just loaded. One convenient function to call is <tt>head()<tt>:

In [18]:
head(wage2)

Unnamed: 0_level_0,wage,hours,IQ,KWW,educ,exper,tenure,age,married,black,south,urban,sibs,brthord,meduc,feduc,lwage
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>
1,769,40,93,35,12,11,2,31,1,0,0,1,1,2.0,8,8.0,6.645091
2,808,50,119,41,18,11,16,37,1,0,0,1,1,,14,14.0,6.694562
3,825,40,108,46,14,11,9,33,1,0,0,1,1,2.0,14,14.0,6.715384
4,650,40,96,32,12,13,7,32,1,0,0,1,4,3.0,12,12.0,6.476973
5,562,40,74,27,11,14,5,34,1,0,0,1,10,6.0,6,11.0,6.331502
6,1400,40,116,43,16,14,2,35,1,1,0,1,1,2.0,8,,7.244227


&nbsp;

The variables have mnemonic names you can guess. Probably the strangest one is <tt>lwage</tt>, which appears at the far right of the results window (scroll right), and which is the <b>natural logarithm of the monthly wage</b>.

As we will see, in R the <tt>lm()</tt> function fits multivariate linear models conveniently. The syntax takes getting used to, but to estimate this model:
$$y = \alpha + \beta x + \gamma z + \epsilon$$
we  call this code:

<center><tt>lm(y ~ x + z)</tt></center>

Can you see the similarities?

Now bear with me. I am going to call <tt>lm()</tt> three times with different equations, and I am going to assign the output to new structures on the left hand side of the "gets" operator <tt><-</tt>

In [11]:
shortreg <- lm(lwage ~ educ + educ + exper + tenure 
               + married + south + urban + black, data = wage2)

In [12]:
longreg <- lm(lwage ~ educ + educ + exper + tenure 
              + married + south + urban + black + IQ, data = wage2)

In [13]:
auxreg <- lm(IQ ~ educ + educ + exper + tenure 
             + married + south + urban + black, data = wage2)

If you were to surround those three calls with parentheses, then R would immediately spit back something. Or you can wait and call <tt>summary()<tt>

In [14]:
summary(shortreg)


Call:
lm(formula = lwage ~ educ + educ + exper + tenure + married + 
    south + urban + black, data = wage2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.98069 -0.21996  0.00707  0.24288  1.22822 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
educ         0.065431   0.006250  10.468  < 2e-16 ***
exper        0.014043   0.003185   4.409 1.16e-05 ***
tenure       0.011747   0.002453   4.789 1.95e-06 ***
married      0.199417   0.039050   5.107 3.98e-07 ***
south       -0.090904   0.026249  -3.463 0.000558 ***
urban        0.183912   0.026958   6.822 1.62e-11 ***
black       -0.188350   0.037667  -5.000 6.84e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3655 on 927 degrees of freedom
Multiple R-squared:  0.2526,	Adjusted R-squared:  0.2469 
F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16


&nbsp;

This and other output are what we are after. It takes some skill to decipher this, but you will learn to do it as effortlessly as I can. Look at the coefficient on <tt>educ</tt> in particular. Here is R spitting out just that number, in all its raw glory:

In [15]:
shortreg$coefficients["educ"]

In words, it turns out that this is the <i>percentage increase</i> in the hourly wage for each additional year (unit) of education. That's a 6.5% real return, which is not too shabby.

<center><h2>Kids, stay in school</h2></center>

We will revisit these data more later.

As a reminder, <i>you should copy code in this class.</i> Copying code is often how we learn. But you need to copy and then tinker with it, to understand it.

In ECON 140, copy code. Do not copy ideas. Copy code, learn from it, alter it to do what you need, profit. Then look at what you have found, and <u>write about it in your own words</u>.

<div style="text-align: right"> <span style="font-family:Papyrus; ">And they lived happily ever after. The End.</span></div>