# ECON546 Lab03 2018 Spring

## Tutorial

https://www.youtube.com/watch?v=YMt5K68ZvjQ

http://davidbraudt.com/stata-videos/

https://www.youtube.com/watch?v=QaI_a_l2jqo

http://davidbraudt.com/wp-content/uploads/2016/11/Command-List-for-Fall-2015-Workshop.pdf

## Macro

### 1 Macros
A macro is simply a name associated with some text. Macros can be local or global in scope.

Local macros have names of up to 31 characters and are known only in the **current context** (the console, a do file, or a program).

**Warning**: Evaluating a macro that doesn't exist is not an error; it just returns an empty string. So be careful to spell macro names correctly. 


```stata
local age "age20to24 age25to29 age30to34 age35to39 age40to44 age45to49"
local controls \`age' income education
```


ref:

http://data.princeton.edu/stata/programming.html


https://www.youtube.com/watch?v=Ovk0CiTxMRI

https://www.youtube.com/watch?v=WsqvGdqzknY

http://jearl.faculty.arizona.edu/sites/jearl.faculty.arizona.edu/files/Intro%20to%20loops%2C%20Year%202.pdf

https://www.ucl.ac.uk/pcph/research-groups-themes/thin-pub/research_presentations/Collier_looping_STATA

## For loop

### Looping Over Sequences of Numbers
The basic looping command takes the form

```stata
forvalues number = sequence {
    ... body of loop using `number' ...
}
```

Here forvalues is a keyword, number is the name of a local macro that will be set to each number in the sequence, and sequence is a range of values which can have the form

- min/max to indicate a sequence of numbers from min to max in steps of one, for example 1/3 yields 1, 2 and 3, or
- first(step)last which yields a sequence from first to last in steps of size step. For example 15(5)50 yields 15,20,25,30,35,40,45 and 50.

The opening left brace must be the last thing on the first line (other than comments), and the loop must be closed by a matching right brace on a line all by itself. The loop is executed once for each value in the sequence with your local macro number (or whatever you called it) holding the value.

### Looping Over Elements in a List

```stata
foreach item in a-list-of-things {
    ... body of loop using `item' ...
}
```



ref:

http://data.princeton.edu/stata/programming.html

https://www.youtube.com/watch?v=9S54_YLQ7WI

In [1]:
%matplotlib inline
import seaborn as sns
import pandas as pd
import statsmodels.formula.api as smf
import ipystata

IPyStata is loaded in batch mode.


In [2]:
%%stata -o life_df
sysuse lifeexp.dta
summarize


(Life expectancy, 1998)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      region |         68         1.5    .7431277          1          3
     country |          0
   popgrowth |         68    .9720588    .9311918        -.5          3
        lexp |         68    72.27941    4.715315         54         79
       gnppc |         63    8674.857    10634.68        370      39980
-------------+---------------------------------------------------------
   safewater |         40        76.1    17.89112         28        100


In [17]:
%%stata -o labor_df
clear
use "D:\onedrive\One drive\546-2018\lab\lab03\546lab3.dta"
capture log close
log using "D:\onedrive\One drive\546-2018\lab\lab03\546mylab3.log", replace
set seed 1234567
g age2=age^2 // creating the square of age





In [8]:
%%stata -d labor_df --graph 
graph twoway (scatter logwr educ) (lfit logwr educ)
graph save "D:\onedrive\One drive\546-2018\lab\lab03\graph1.gph", replace


(note: file D:\onedrive\One drive\546-2018\lab\lab03\graph1.gph not found)
(file D:\onedrive\One drive\546-2018\lab\lab03\graph1.gph saved)


In [10]:
%%stata -d labor_df

label variable inlf "Women’s participation =1 yes, =0 no"

label variable nclt18 "Number of children between 4 and 18"

label variable age "Age of woman"

notes logwr: this is the natural logarithm of the wage rate





In [11]:
%%stata -d labor_df
sum inlf if age<=30 | age>=50


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        inlf |        908    .5605727    .4965909          0          1


In [12]:
%%stata -d labor_df
g n3=0
replace n3=1 if nclt3 != 0
sum n3


(420 real changes made)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          n3 |      2,339    .1795639    .3839059          0          1


In [13]:
%%stata -d labor_df
g n4 = nclt3 != 0
sum n4


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          n4 |      2,339    .1795639    .3839059          0          1


In [15]:
%%stata -d labor_df
g x1=rnormal()
g x2=rnormal(50,5)
sum x1 x2


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          x1 |      2,339   -.0116636    .9845323  -3.134875   2.885228
          x2 |      2,339    49.94444    4.947548   31.70053   71.63751


In [18]:
%%stata -d labor_df
local variables nclt18 age age2 educ logwr
regress inlf nclt3 `variables'


      Source |       SS           df       MS      Number of obs   =     2,339
-------------+----------------------------------   F(6, 2332)      =  15633.50
       Model |  548.401176         6  91.4001959   Prob > F        =    0.0000
    Residual |  13.6338821     2,332  .005846433   R-squared       =    0.9757
-------------+----------------------------------   Adj R-squared   =    0.9757
       Total |  562.035058     2,338  .240391385   Root MSE        =    .07646

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       nclt3 |  -.0120077   .0041652    -2.88   0.004    -.0201756   -.0038398
      nclt18 |  -.0002429   .0016208    -0.15   0.881    -.0034213    .0029355
         age |  -.0050703   .0014611    -3.47   0.001    -.0079354   -.0022052
        age2 |    .000045   .0000182     2.47   0.

In [20]:
model = smf.ols(formula = 'inlf ~ nclt3 + nclt18+ age+ age2 +educ +logwr',
                data = labor_df)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                   inlf   R-squared:                       0.976
Model:                            OLS   Adj. R-squared:                  0.976
Method:                 Least Squares   F-statistic:                 1.563e+04
Date:                Thu, 18 Jan 2018   Prob (F-statistic):               0.00
Time:                        20:45:58   Log-Likelihood:                 2698.1
No. Observations:                2339   AIC:                            -5382.
Df Residuals:                    2332   BIC:                            -5342.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2064      0.027      7.592      0.0

In [22]:
%%stata -d labor_df

local variables nclt18 age age2 educ logwr

// illustrating a forvalues loop

forvalues loop=0/2 {
regress inlf `variables' if nclt3==`loop'
}




. forvalues loop=0/2 {
      Source |       SS           df       MS      Number of obs   =     1,919
-------------+----------------------------------   F(5, 1913)      =  15207.28
       Model |  453.372139         5  90.6744278   Prob > F        =    0.0000
    Residual |  11.4063914     1,913  .005962567   R-squared       =    0.9755
-------------+----------------------------------   Adj R-squared   =    0.9754
       Total |   464.77853     1,918  .242324573   Root MSE        =    .07722

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      nclt18 |   .0000315   .0017932     0.02   0.986    -.0034854    .0035484
         age |  -.0054028   .0016798    -3.22   0.001    -.0086971   -.0021084
        age2 |   .0000492   .0000207     2.38   0.018     8.60e-06    .0000898
        educ |   -.010529  

### Creating Dummy Variables
creating dummy variables to represent age groups. Stata 11 introduced factor variables and Stata 13 improved the labeling of tables of estimates, drastically reducing the need to "roll your own" dummies, but the code remains instructive.

This will create dummy variables age20to24 to age45to49. The way the loop works is that the local macro bot will take values between 20 and 45 in steps of 5 (hence 20, 25, 30, 35, 40, and 45), the lower bounds of the age groups.

Inside the loop we create a local macro top to represent the upper bounds of the age groups, which equals the lower bound plus 4. The first time through the loop bot is 20, so top is 24. We use an equal sign to store the result of adding 4 to bot.

In [27]:
%%stata -d labor_df
forvalues bot = 20(5)45 {
    local top = `bot' + 4
    gen age`bot'to`top' = age >= `bot' & age <= `top'
}

sum age20to24 age25to29 age30to34 age35to39 age40to44 age45to49



    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   age20to24 |      2,339    .0470286    .2117454          0          1
   age25to29 |      2,339     .148354    .3555266          0          1
   age30to34 |      2,339    .1748611    .3799294          0          1
   age35to39 |      2,339    .1911073    .3932575          0          1
   age40to44 |      2,339    .1564771    .3633848          0          1
-------------+---------------------------------------------------------
   age45to49 |      2,339    .1239846    .3296345          0          1


In [26]:
%%stata -d labor_df
local controls nclt18 age age2 educ logwr

// illustrating a foreach loop

foreach control of local controls {
    display "`control'"
}


. foreach control of local controls {nclt18
age
age2
educ
logwr


In [28]:
%%stata 

// illustrating a foreach loop

foreach year of numlist 1980 1985 1995 {
    display "`year'"
}


. foreach year of numlist 1980 1985 1995 {1980
1985
1995


In [23]:
%%stata -d labor_df
local variables nclt18 age age2 educ logwr

// illustrating a while loop

local k=1
while `k'<=2 {
regress inlf `variables' if nclt3==`k'
local k=`k'+1
}


. local k=1
      Source |       SS           df       MS      Number of obs   =       380
-------------+----------------------------------   F(5, 374)       =   3008.33
       Model |  82.1777607         5  16.4355521   Prob > F        =    0.0000
    Residual |  2.04329198       374  .005463348   R-squared       =    0.9757
-------------+----------------------------------   Adj R-squared   =    0.9754
       Total |  84.2210526       379  .222219136   Root MSE        =    .07391

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      nclt18 |  -.0042745   .0045769    -0.93   0.351    -.0132742    .0047253
         age |  -.0031617   .0062261    -0.51   0.612    -.0154043    .0090808
        age2 |   .0000231   .0001015     0.23   0.820    -.0001765    .0002227
        educ |  -.0143779   .0011104  