In [None]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")

## Effect of Traffic-related air pollution on Attention in Primary School Children

A real-world dataset that includes children’s performance on a test of reaction time, levels of nitrogen dioxide (NO2) pollution, the children’s physical and socioeconomic characteristics, and some other environmental factors. The data were collected and analyzed by

<blockquote>
Sunyer, J., E. Suades-González, R. García-Esteban, I. Rivas, J. Pujol, M. Alvarez-Pedrerol, J. Forns, X. Querol, and X. Basagaña. 2017. Traffic-related air pollution and attention in primary school children: Short-term association. <em>Epidemiology</em> 28: 181–189.
</blockquote>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://doi.org/10.1097/EDE.0000000000000603">https://doi.org/10.1097/EDE.0000000000000603</a>

Our interest is in how levels of nitrogen dioxide in the classroom affect the children’s performance on the test, while adjusting for other factors. We will focus on two _outcomes_ from the Attention Network Test (ANT)

👉🏼 Reaction time (continuous)

👉🏼 Omissions (count)

In [None]:
%%stata
use ../Data/breathe
describe

Our goal is to create two lists of control covariates, for example, independent variables. One list will contain continuous control covariates and the other will contain categorical control covariates. Why not just one list? Because we want the categorical variables to enter our model as indicator variables for each level (distinct value) of the categorical variable. To expand a categorical variable into indicator variables for its levels, we must prefix it with an ```i.```, for example, ```i.grade```.

In [None]:
%%stata
vl set

In [None]:
%%stata
display "$vlcategorical"

## Determinants of Wages

The log of married women’s wages (```lwage```) is modeled as a function of their experience (```exper```), the square of their experience, and their years of education (```educ```). Collectively, these are called _exogenous_ covariates.

<blockquote>
Mroz, T. A. 1987. The sensitivity of an empirical model of married women’s hours of work to economic and statistical
assumptions. <em>Econometrica</em> 55: 765–799.    
</blockquote>    

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://doi.org/10.2307/1911029">https://doi.org/10.2307/1911029</a>

As is customary, education is treated as an endogenous variable. The reasoning is that we cannot measure innate ability, and ability is likely to influence both education level and income. Some disciplines refer to this as unobserved confounding rather than endogeneity. Either way, you cannot just run a regression of wages on education and experience and learn anything about the true effect of education on wages.

You need more information from variables that you presume are not affected by the woman’s unmeasured ability — let’s call them __instruments__. And, they also cannot belong in the model for wages. We will use their mothers’ education (```motheduc```), their fathers' education (```fatheduc```), and their husbands’ education (```huseduc```) as instruments for the woman’s education. The instruments are also required to be _exogenous_, but we will just call them instruments.

In [None]:
%%stata
use ../Data/mroz, clear
vl create exogbase = (exper age husage kidslt6 kidsge6 city)
note: $exogbase initialized with 6 variables.
vl create instbase = (motheduc fatheduc huseduc)
note: $instbase initialized with 3 variables.

In [None]:
from pystata import stata
stata.run('describe')