Install randomize package 
ssc install randomize

# Miguel & Kremer

First example: From Miguel & Kremer (ECTA, 2004) // 

Note: You can obtain the dataset and replication code from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28038. The .dta file namelist is needed for this exercise. 

Remember to set your working directory correctly using the "cd..." command

In [1]:
cd "~/Documents/Pitt/Year_2/TA - Econ 3080/Recitations/Recitation 3"

/Users/brunokomel/Documents/Pitt/Year_2/TA - Econ 3080/Recitations/Recitation 3


In [2]:
* Start with Namelist data
use namelist.dta, clear 

* Each school is a distinct data point, weighted by number of pupils
    keep if visit==981 
    collapse sex elg98 stdgap yrbirth wgrp* (count) np=pupid, by (sch98v1) 



(521,880 observations deleted)



In [3]:
**** TABLE 1: PANEL A
bys wgrp: summ sex elg98 stdgap yrbirth [aw=np] //bysort treatment group, summarise these variables 

foreach var in sex elg98 stdgap yrbirth { 
    regress `var' wgrp1 wgrp2 [aw=np] 
} 

bys wgrp: summ sex elg98 stdgap yrbirth [aw=np]




--------------------------------------------------------------------------------
-> wgrp = 1

    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
         sex |      25       11639    .5330215   .0274975   .4649681        .58
       elg98 |      25       11639     .885924   .0247143   .8320313   .9418604
      stdgap |      25       11639   -1.972652   .2533067  -2.605882  -1.580786
     yrbirth |      25       11639    1986.192   .5346503   1985.256   1987.464

--------------------------------------------------------------------------------
-> wgrp = 2

    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
         sex |      25       11995    .5095843   .1047281   .0200893   .5714286
       elg98 |      25       11995    .8919841    .024791   .8363096          1
      std

In [4]:
randomize, groups(3) generate(grp)

*Note: We can check the balance of this grp variable as follows: 
bys grp: summ sex elg98 stdgap yrbirth [aw=np] //bysort treatment group, summarise these variables 

gen grp1 = (grp == 1) //creating dummies for each group category
gen grp2 = (grp == 2)

foreach var in sex elg98 stdgap yrbirth { 
	regress `var' grp1 grp2 [aw=np] 
} 


Randomizing 75 records.
Error in the manova test.

Assignment results:

        grp |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         25       33.33       33.33
          2 |         25       33.33       66.67
          3 |         25       33.33      100.00
------------+-----------------------------------
      Total |         75      100.00

Review balance:

Multinomial logistic regression                         Number of obs =     75
                                                        LR chi2(0)    =   0.00
                                                        Prob > chi2   =      .
Log likelihood = -82.395922                             Pseudo R2     = 0.0000

------------------------------------------------------------------------------
         grp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
1            |  (base ou

In [5]:
// Another example: sysuse nlsw88 // 
clear
sysuse nlsw88.dta //another preloaded dataset (similar to auto.dta), but from the National Longitudinal Survey of Women in 88. 

gen black = (race == 2)

randomize, groups(2) generate(grp)
bysort grp: sum age black married collgrad 

randomize, groups(2) block(black) generate(grp_alt)
bysort grp_alt: sum age black married collgrad 



(NLSW, 1988 extract)


Randomizing 2246 records.
Error in the manova test.

Assignment results:

        grp |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      1,123       50.00       50.00
          2 |      1,123       50.00      100.00
------------+-----------------------------------
      Total |      2,246      100.00

Review balance:

Multinomial logistic regression                        Number of obs =   2,246
                                                       LR chi2(0)    =   -0.00
                                                       Prob > chi2   =       .
Log likelihood = -1556.8086                            Pseudo R2     = -0.0000

------------------------------------------------------------------------------
         grp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
1            |  (base outcome)
-------------+--

# RAND Experiment

In [6]:
use rand_initial_sample_2.dta, clear


* Plan types:
/* 
	Plan type 1 = "Free plan"
	Plan type 2 = "Deductible plan"
	Plan type 3 = "Coinsurance plan"
	Plan type 4 = "Catastrophic plan" or "No Insurance"
*/


In [7]:
* Create means for catastrophic plan
matrix means_sd = J(11, 2, .)
local row = 1

foreach var of varlist female blackhisp age educper income1cpi hosp ghindx cholest systol mhi {
	summarize `var' if plantype == 4
	matrix means_sd[`row', 1] = r(mean)
	matrix means_sd[`row', 2] = r(sd)
	local row = `row'+1
}

count if plantype_4 == 1
matrix means_sd[11, 1] = r(N)

matrix rownames means_sd = female blackhisp age educper income1cpi hosp ghindx cholest systol mhi plantype
matrix list means_sd

#d ;
frmttable, statmat(means_sd) substat(1) varlabels sdec(4)
		   ctitle("", "Cata. mean") replace;
#d cr





    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      female |        759    .5599473    .4967206          0          1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   blackhisp |        600    .1716667    .3774051          0          1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         age |        759      32.361    12.92331         14         62

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     educper |        663    12.10483    2.881461          1         22

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+------------------------------------------

In [8]:
* Create regression output
* Column 2: Deductible plan compared to catastrophic plan
matrix deduct_diff = J(11, 2, .)
local row = 1

foreach var of varlist female blackhisp age educper income1cpi hosp ghindx cholest systol mhi {
	reg `var' plantype_1 plantype_2 plantype_3, cl(famid)
	matrix deduct_diff[`row', 1] = _b[plantype_2]
	matrix deduct_diff[`row', 2] = _se[plantype_2]
	local row = `row'+1
}
count if plantype_2 == 1
matrix deduct_diff[11, 1] = r(N)

#d ;
frmttable, statmat(deduct_diff) varlabels sdec(4)
		   ctitle("Deduct - cata.") substat(1) merge;
#d cr





Linear regression                               Number of obs     =      3,957
                                                F(3, 1982)        =       2.14
                                                Prob > F          =     0.0935
                                                R-squared         =     0.0007
                                                Root MSE          =     .49878

                              (Std. err. adjusted for 1,983 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
      female | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -.0379396   .0149991    -2.53   0.012    -.0673552   -.0085239
  plantype_2 |  -.0230574   .0160171    -1.44   0.150    -.0544694    .0083546
  plantype_3 |  -.0247223    .015326    -1.61   0.107     -.054779    .0053345
       _con

                                                R-squared         =     0.0022
                                                Root MSE          =     42.872

                              (Std. err. adjusted for 1,176 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
     cholest | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -5.246336   2.701373    -1.94   0.052    -10.54639    .0537178
  plantype_2 |  -1.420108    2.98409    -0.48   0.634    -7.274848    4.434632
  plantype_3 |  -1.931604   2.758279    -0.70   0.484    -7.343305    3.480097
       _cons |   207.3021   1.990889   104.13   0.000      203.396    211.2082
------------------------------------------------------------------------------

Linear regression                               Number of obs     =      2,292
              

In [9]:
* Column 3: Coinsurance plan compared to catastrophic plan
matrix coins_diff = J(11, 2, .)
local row = 1

foreach var of varlist female blackhisp age educper income1cpi hosp ghindx cholest systol mhi {
	reg `var' plantype_1 plantype_2 plantype_3, cl(famid)
	matrix coins_diff[`row', 1] = _b[plantype_3]
	matrix coins_diff[`row', 2] = _se[plantype_3]
	local row = `row'+1
}

count if plantype_3 == 1
matrix coins_diff[11, 1] = r(N)

#d ;
frmttable, statmat(coins_diff) varlabels sdec(4)
		   ctitle("Coins - cata") substat(1) merge;
#d cr





Linear regression                               Number of obs     =      3,957
                                                F(3, 1982)        =       2.14
                                                Prob > F          =     0.0935
                                                R-squared         =     0.0007
                                                Root MSE          =     .49878

                              (Std. err. adjusted for 1,983 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
      female | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -.0379396   .0149991    -2.53   0.012    -.0673552   -.0085239
  plantype_2 |  -.0230574   .0160171    -1.44   0.150    -.0544694    .0083546
  plantype_3 |  -.0247223    .015326    -1.61   0.107     -.054779    .0053345
       _con

                                                R-squared         =     0.0022
                                                Root MSE          =     42.872

                              (Std. err. adjusted for 1,176 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
     cholest | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -5.246336   2.701373    -1.94   0.052    -10.54639    .0537178
  plantype_2 |  -1.420108    2.98409    -0.48   0.634    -7.274848    4.434632
  plantype_3 |  -1.931604   2.758279    -0.70   0.484    -7.343305    3.480097
       _cons |   207.3021   1.990889   104.13   0.000      203.396    211.2082
------------------------------------------------------------------------------

Linear regression                               Number of obs     =      2,292
              

In [10]:
* Column 4: Coinsurance plan compared to catastrophic plan
matrix free_diff = J(11, 2, .)
local row = 1

foreach var of varlist female blackhisp age educper income1cpi hosp ghindx cholest systol mhi {
	reg `var' plantype_1 plantype_2 plantype_3, cl(famid)
	matrix free_diff[`row', 1] = _b[plantype_1]
	matrix free_diff[`row', 2] = _se[plantype_1]
	local row = `row'+1
}

count if plantype_1 == 1
matrix free_diff[11, 1] = r(N)

#d ;
frmttable, statmat(free_diff) varlabels sdec(4)
		   ctitle("Free - cata.") substat(1) merge;
#d cr






Linear regression                               Number of obs     =      3,957
                                                F(3, 1982)        =       2.14
                                                Prob > F          =     0.0935
                                                R-squared         =     0.0007
                                                Root MSE          =     .49878

                              (Std. err. adjusted for 1,983 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
      female | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -.0379396   .0149991    -2.53   0.012    -.0673552   -.0085239
  plantype_2 |  -.0230574   .0160171    -1.44   0.150    -.0544694    .0083546
  plantype_3 |  -.0247223    .015326    -1.61   0.107     -.054779    .0053345
       _con

                                                R-squared         =     0.0022
                                                Root MSE          =     42.872

                              (Std. err. adjusted for 1,176 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
     cholest | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  plantype_1 |  -5.246336   2.701373    -1.94   0.052    -10.54639    .0537178
  plantype_2 |  -1.420108    2.98409    -0.48   0.634    -7.274848    4.434632
  plantype_3 |  -1.931604   2.758279    -0.70   0.484    -7.343305    3.480097
       _cons |   207.3021   1.990889   104.13   0.000      203.396    211.2082
------------------------------------------------------------------------------

Linear regression                               Number of obs     =      2,292
              

In [11]:
* Column 5: Any insurance plan compared to catastrophic plan
matrix any_diff = J(11, 2, .)
local row = 1

foreach var of varlist female blackhisp age educper income1cpi hosp ghindx cholest systol mhi {
	reg `var' any_ins, cl(famid)
	matrix any_diff[`row', 1] = _b[any_ins]
	matrix any_diff[`row', 2] = _se[any_ins]
	local row = `row'+1
}

count if any_ins == 1
matrix any_diff[11, 1] = r(N)

#d ;
frmttable, statmat(any_diff) varlabels sdec(4)
		   ctitle("Any - cata.") substat(1) merge;
#d cr





Linear regression                               Number of obs     =      3,957
                                                F(1, 1982)        =       5.11
                                                Prob > F          =     0.0240
                                                R-squared         =     0.0005
                                                Root MSE          =     .49869

                              (Std. err. adjusted for 1,983 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
      female | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     any_ins |  -.0296158   .0131065    -2.26   0.024    -.0553198   -.0039118
       _cons |   .5599473   .0117898    47.49   0.000     .5368256     .583069
------------------------------------------------------------------------------

Linear reg

                                                Root MSE          =     16.578

                              (Std. err. adjusted for 1,194 clusters in famid)
------------------------------------------------------------------------------
             |               Robust
      systol | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     any_ins |   1.387623   .9006257     1.54   0.124    -.3793636     3.15461
       _cons |   122.3418   .8033961   152.28   0.000     120.7656     123.918
------------------------------------------------------------------------------

Linear regression                               Number of obs     =      3,817
                                                F(1, 1940)        =       1.07
                                                Prob > F          =     0.3004
                                                R-squared         =     0.0004
              

# Exercise: Recreate Panel A in Table I in Miguel & Kremer

In [12]:
use namelist.dta, clear 

keep if visit==981 
	collapse sex elg98 stdgap yrbirth wgrp* (count) np=pupid, by (sch98v1) 

label var sex "Male"
label var elg98 "Proportion girls"
label var stdgap "Grade"
label var yrbirth "Year of Birth"



(521,880 observations deleted)







In [13]:
matrix drop _all
mata: mata clear

*Columns 1-3
forvalues g = 1/3{

matrix mean_dep_`g' = J(4,2,.)
local i = 1	

foreach var of varlist sex elg98 stdgap yrbirth{
	
	summ `var' [aw=np] if wgrp == `g'
	matrix mean_dep_`g'[`i',1] = r(mean)
	matrix rownames mean_dep_`g' =  sex elg98 stdgap yrbirth
	local i = `i' + 1
}
frmttable, statmat(mean_dep_`g') substat(1) ctitle("","Group `g'")  varlabels merge
}





    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
         sex |      25       11639    .5330215   .0274975   .4649681        .58

    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
       elg98 |      25       11639     .885924   .0247143   .8320313   .9418604

    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
      stdgap |      25       11639   -1.972652   .2533067  -2.605882  -1.580786

    Variable |     Obs      Weight        Mean   Std. dev.       Min        Max
-------------+-----------------------------------------------------------------
     yrbirth |      25       11639    1986.192   .5346503   1985.256   1987.464

                         -------

In [14]:
* Column 4

matrix control_diff_1 = J(4,2,.)
local row = 1

foreach var in sex elg98 stdgap yrbirth { 
	regress `var' wgrp1 wgrp2 [aw=np] 
	matrix control_diff_1[`row',1] = _b[wgrp1]
	matrix control_diff_1[`row',2] = _se[wgrp1]
	local row = `row' + 1
} 

matrix list control_diff_1

frmttable, statmat(control_diff_1) substat(1) ctitle("Group 1 - Group 3") merge




(sum of wgt is 34,792)

      Source |       SS           df       MS      Number of obs   =        75
-------------+----------------------------------   F(2, 72)        =      0.84
       Model |  .007012123         2  .003506062   Prob > F        =    0.4365
    Residual |  .300999144        72  .004180544   R-squared       =    0.0228
-------------+----------------------------------   Adj R-squared   =   -0.0044
       Total |  .308011268        74  .004162314   Root MSE        =    .06466

------------------------------------------------------------------------------
         sex | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       wgrp1 |   .0108639   .0184507     0.59   0.558    -.0259169    .0476447
       wgrp2 |  -.0125733   .0183162    -0.69   0.495     -.049086    .0239394
       _cons |   .5221577   .0131835    39.61   0.000     .4958767    .5484386
------------------------

In [15]:
* Column 5

matrix control_diff_2 = J(4,2,.)
local row = 1

foreach var in sex elg98 stdgap yrbirth { 
	regress `var' wgrp1 wgrp2 [aw=np] 
	matrix control_diff_2[`row',1] = _b[wgrp2]
	matrix control_diff_2[`row',2] = _se[wgrp2]
	local row = `row' + 1
} 

matrix list control_diff_2

frmttable, statmat(control_diff_2) substat(1) ctitle("Group 2 - Group 3")  merge




(sum of wgt is 34,792)

      Source |       SS           df       MS      Number of obs   =        75
-------------+----------------------------------   F(2, 72)        =      0.84
       Model |  .007012123         2  .003506062   Prob > F        =    0.4365
    Residual |  .300999144        72  .004180544   R-squared       =    0.0228
-------------+----------------------------------   Adj R-squared   =   -0.0044
       Total |  .308011268        74  .004162314   Root MSE        =    .06466

------------------------------------------------------------------------------
         sex | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       wgrp1 |   .0108639   .0184507     0.59   0.558    -.0259169    .0476447
       wgrp2 |  -.0125733   .0183162    -0.69   0.495     -.049086    .0239394
       _cons |   .5221577   .0131835    39.61   0.000     .4958767    .5484386
------------------------