<span style="font-size:0.75em;"> *This notebook is based off of a project which was done with Stephen Ayeni under the supervision of Professor Eduardo Souza Rodrigues for a fourth year undergraduate class in applied Econometrics. We were tasked with answering what we believed to be an important economic question of our choosing and use empirical techniques on publically available data to attempt to answer it. As my part of the project I gathered, cleaned and performed analysis on the data along with Stephen Ayeni who helped ensure quality writing quality and making our project ready for submission. Professor Souza-Rodrigues outlined for us the standard of statistical analysis and reporting necessary to ensure the project was approaching the quality of an academic publication. Many thanks to Amanda Mckinley who inspired and gave the economic basis for the project and to Joseph Gorga who provided continous support, advice and guidance thoroughout.* </span>


<table>
    <tr>
        <td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/UN_emblem_blue.svg/512px-UN_emblem_blue.svg.png" title="United Nations Logo" width="150" height="150"/>
    </td><td>
<img src="http://www.euro.who.int/__data/assets/image/0005/243581/who-europe-250.png" title="World Health Organization" width="150" height="150"/>
        </td><td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/The_World_Bank_logo.svg/800px-The_World_Bank_logo.svg.png" title="World Bank" width="200" height="42"/>
        </td><td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/Stata_Logo.svg/800px-Stata_Logo.svg.png" title="STATA" width="200" height="65"/>
        </td></tr>
        </table>

 <span style="color:Red"> Problem: </span> Unobserved Heterogeneity Creating Biased Estimates
===
 
 <span style="color:Green"> Solution: </span> Fixed Effects and First Differences Panel Methods
===

---

Christopher Holiday:
April 10, 2019

---

<par> There are numerous issues which can arise in inaccurately making causal inference with observational data. When standard statistical analysis is made with neive assumptions, the problem often takes the form of *omitted variable bias*. For example, the standard linear model for example will typically take the form of: </par>
$$Y =  \beta _{0} + \beta_{1}x_{1}+...+\beta _{k}x_{k} + \epsilon$$
<par> The parameter of interest being $\beta_{1}$ or additionally any other coefficient estimates in the case of a multi-variate model. In order the model to infer a causal relationship it must fit several important necessary conditions, for there to be unbiased estimates we in particular require </par>
$$ COV(X,\epsilon) = 0 $$
<par> This means that none of the change in variation in the control variables is caused by unobserved variation in the error term. The easiest way to understand what this means in practice is to think about the model applied to observational data and try to infer causality. </par>

<par> In the dataset below I used STATA to merge WHO, World Bank and UN national level. I have included several seemingly related parameters of interest. </par>
* **Countryid:** The name of the country corresponding to each observation
* **Year:** The year which the observation was made
* **MMR:** The maternal mortality ratio, this is calculated as
\[\frac{number of maternal deaths in a year}{number of births in a year}\]
* **cpp:** The percentage of people who have access to any form of contraception
* **cppmod:** the percentage of people who have access to any form of *modern* contraception
* **GDPcap:** GDP per capita adjusted for inflation using purchasing power parity
* **Docs:** The number of trained medical professionals per capita


In [6]:
use Maternalmortality19902015.dta

In [12]:
%head

Unnamed: 0,Countryid,Year,MMR,cpp,cppmod,GDPcap,Docs,yr1990,yr1995,yr2000,yr2005,yr2010,yr2015
1,Afghanistan,2000,1100,5.3,3.6,.,12.4,0,0,1,0,0,0
2,Afghanistan,2005,821,13.6,12.5,20.177277,.,0,0,0,1,0,0
3,Afghanistan,2010,584,21.8,19.9,36.233788,34.299999,0,0,0,0,1,0
4,Albania,2000,43,57.5,17.9,26.813463,99.099998,0,0,1,0,0,0
5,Albania,2005,30,60.1,25.0,58.729275,99.800003,0,0,0,1,0,0
6,Algeria,1995,192,56.9,49.4,142.19202,.,0,1,0,0,0,0
7,Algeria,2000,170,64.0,55.0,107.54095,92.599998,0,0,1,0,0,0
8,Argentina,2005,58,78.9,69.9,4714.0127,99.099998,0,0,0,1,0,0
9,Armenia,2000,40,60.5,22.3,4.2968049,96.800003,0,0,1,0,0,0
10,Armenia,2005,40,53.1,19.5,10.420856,97.800003,0,0,0,1,0,0


In [8]:
summarize MMR cppmod GDPcap Docs


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         MMR |        167    325.6647    413.0887          3       2650
      cppmod |        164    36.65671    22.19743        1.7       82.2
      GDPcap |        161    6502.259    17298.38   .2848354   92364.49
        Docs |        143    72.01608    29.24905        5.6        100


<par> The particular relationship which we may be interested in (and which I examined in my paper) is the relationship between the prevalence of contraception and the maternal mortality ratio. In theory access to contraception could cause a decrease in the amount of maternal deaths due to a mothers ability to make decisions about her fertility and thereby mitagate the risks associated with giving births which are at a high parity or at a time when the mother has contracted an illness which threatens her survival during the birth process. </par>

<par> In order to that such a causal relationship exists, to reject the null hypothesis that the maternal mortality ratio is not effected by contraceptive prevelance, it is necessary to include other factors which may be correlated with maternal mortality and the prevalence of modern contraceptive methods in a country. For these reasons I have included the variables such as GDP per capita adjusted for PPP and doctors per capita. The basis for this is that we expect a postive relationship between contraceptive prevalence, GDP per capita and doctors per capita and a negative relationship between MMR, GDP per capita and doctors per capita. Our model in this case then takes the form of. </par>

$$MMR_{i}= \beta _{0}+\beta _{1}cppmod_{i}+\beta _{2}GDPcap_{i}+\beta _{3}Docs_{i}+\beta _{4}yr1995... +\beta _{8}yr2015+u_{i}, u_{i}\sim N(0,\sigma ^{2})$$


<par> The model treats each sample of each country at each time-period as independent of each other. To remove effects of the passage of time (which creates spurious regression), we include control variables for each of the years where MMR data is collected (1990, 1995, 2000, 2005, 2010, 2015.) We can assume this model is the best unbiased estimator if </par>

1. The sampled observations are independent.
2. There is no correlation between error term across different observations
3. The observations are identically distributed 
4. Inavailability of data from certain countries at certain time periods is uncorrelated to our variables

In [40]:
reg MMR cppmod GDPcap Docs yr1995 yr2000 yr2005 yr2010 yr2015, cluster(Countryid)

. reg MMR cppmod GDPcap Docs yr1995 yr2000 yr2005 yr2010 yr2015, cluster(Countr
> yid)

Linear regression                               Number of obs     =        135
                                                F(8, 89)          =      13.63
                                                Prob > F          =     0.0000
                                                R-squared         =     0.5720
                                                Root MSE          =     289.08

                             (Std. Err. adjusted for 90 clusters in Countryid)
------------------------------------------------------------------------------
             |               Robust
         MMR |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cppmod |  -6.136833   2.257897    -2.72   0.008    -10.62323   -1.650441
      GDPcap |  -.0005056   .0010317    -0.49   0.625    -.0025556    .0015445
      

<par> When we examine our results we can see that in fact the relationship between modern contraceptive prevelance and the maternal mortality ratio is indeed negative with a high level of significance (p value 0.008). Our model appears to allow us to reject with high level of confidence the hypothesis that modern contraceptive prevalence has no effect on maternal mortality ratio. Why not in the precense of this evidence develop large scale programs which improve access to contraceptive methods? </par>

<par>There are certainly many programs in the world where they do exactly this. However often times these programs are focused on preventing the spread of HIV and other sexually transmitted infections with less of an emphasis on maternal health outcomes. The reason being, there is are many other factors associated with fertility decisions. This raises the question of whether or not there is unobserved heterogeneity in our model, or in simple terms, that aspects about a country that we *did not account for in our model* are related to both the maternal mortality ratio and the availability of modern methods of contraception. They differ from country to country but may include the level and types of healthcare available to women and the cultural barriers which may limit access and use of contraception in the first place. In this case our model suffers from ommitted variable bias. In this case the model is no longer accurate because </par>

$$ COV(X,\epsilon) \neq 0 $$

<par> In this specific context, the **problem** is what do we do with these seemingly vast amount of information that is related to our model but is unobservable? Without this we may be making a false assumption that contraceptive prevelance has a negative effect on maternal mortality ratio. </par>

<par> It turns out that we can represent this particular phenemon mathematically in our model as the following </par>

$$MMR_{i,t}= \beta _{0}+\beta _{1}cppmod_{i,t}+\beta _{2}GDPcap_{i,t}+\beta _{3}Docs_{i,t}+a_{i}+u_{i,t}, u_{i}\sim N(0,\sigma ^{2})$$ 

<par> Where $a_{i}$ represents the unobservable time invariant factors of a country which may be correlated with our parameters of interest and our controls. Since this is unobservable our new approach to the problem requires that we both eliminate the term *and* do so verifiably.</par>

<par> Econometricians working with panel data of similar structure to our data have come up with a very creative way to do exactly this without increasing the variance of the variables unneccessarily. The method is known by the name of fixed effects, where the average of each member of the panel data (in this case the country) is subtracted. </par>

$$(MMR_{it} - \overline{MMR}) = \beta_0 +\beta_1(cppmod_{it} + \beta_2(GDPcap_{it} - \overline{GDPcap}) + \beta_3(docs_{it} - \overline{docs}) + (u_{it} - \overline{u})$$

<par> STATA, being a program first developed with econometricians as a target market, can do this with a single line of code (or two if you have not yet defined the data as a panel type). </par>


In [10]:
xtset Countryid Year
xtreg MMR cppmod GDPcap Docs yr1995 yr2000 yr2005 yr2010 yr2015, fe cluster(Countryid)


       panel variable:  Countryid (unbalanced)
        time variable:  Year, 1990 to 2015, but with gaps
                delta:  1 unit


Fixed-effects (within) regression               Number of obs     =        135
Group variable: Countryid                       Number of groups  =         90

R-sq:                                           Obs per group:
     within  = 0.4098                                         min =          1
     between = 0.5441                                         avg =        1.5
     overall = 0.5071                                         max =          5

                                                F(8,89)           =       5.37
corr(u_i, Xb)  = 0.4380                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 90 clusters in Countryid)
------------------------------------------------------------------------------
             |               Robust
         MMR |      Coef.   Std. Err.      t    

<par> Now in the updated model we still see a negative relationship between the prevalence of contraception in a country and it's maternal mortality ratio, however the relationship is now statistically insignificant at a p-value of 0.259. We cannot reject the null hypothesis that contraceptive prevelance has an effect on Maternal Mortality ratio, therefore there is no evidence of such an effect. We can see that in this model, the fraction of variance due to fixed effects is very high (rho value of 0.81024977). This shows that had we not controlled for unobserved hetereogeneity that we would have asserted a causal relationship when there in fact was not any. This is very important to take into account when analyzing observational data in a global health context because often times developing solutions are complex and require understanding many micro level factors that are region specific. </par>

---

<par> This was part of the approach of an organization I worked at called <a href="https://charitysciencehealth.com"> Charity Science Health</a>, they used evidence based information along with an understanding of local factors to improve health outcomes for newborn children. I assisted them in researching interventions to help prevent maternal death in areas of high mortality. Many of these solutions required a complex approach that took into account the existing health infrastructure, it's quality and importantly the information which mothers and families had available to them. If you are interested in supporting an organization which directly helps improve maternal health outcomes, a well studied low-cost approach is vitamin supplementation. An organization which we worked with that has been doing this for many years is <a href="https://vitaminangels.org">Vitamin Angels</a>, I highly recommend looking at their work for examples of effective interventions which improve maternal outcomes. </par>