# <div align="center">Dynamic Documents - Jupyter</div>
## <div align="center">Part 2 - Stata</div>
## <div align="center">Data Science Tools Workshop</div>

### <div align="center">Fabien Forge</div>

#### <div align="center">15/10/2021</div>

Install `stata-setup`

* You can either create a dedicated **[Kernel](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)** (see [stata_kernel](https://kylebarron.dev/stata_kernel/getting_started/) for any Stata)

* Or call you favorite language from a Python session (see [IPyStata](https://github.com/TiesdeKok/ipystata) for Stata<17 or [Stata-Setup](https://www.stata.com/new-in-stata/jupyter-notebooks/))

import `stata_setup`

Set the path to where your Stata is

## Call Stata using magic commands

The stata magic is used to execute Stata commands in an IPython environment. In a notebook cell, we put Stata commands underneath the `%%stata` cell magic to direct the cell to call Stata. The following commands load the auto dataset and summarize the mpg variable. The Stata output is displayed underneath the cell.

Use stata magix and use the auto data set to summarize `mpg`

You can use

In [None]:
%%stata

ds

Stata's graphs can also be displayed in the IPython environment. Here we create a scatterplot of car mileage against price by using the `%stata` line magic.



print Hi

Then scatter mpg and price using `%stata` instead of `%%stata`

Next, we load Python data into Stata, perform analyses in Stata, and then pass Stata returned results to Python for further analysis, using the Second National Health and Nutrition Examination Survey (NHANES II; McDowell et al. 1981).

NHANES II, a dataset concerning health and nutritional status of adults and children in the US, contains 10,351 observations and 58 variables and is stored in a CSV file called nhanes2.csv. Among hese variables is an indicator variable for hypertension (highbp) and the continuous variables age and weight.

We use pandas method read_csv() to read the data from the .csv file into a pandas dataframe named nhanes2.

We load the dataframe into Stata by specifying the -d argument of the `%%stata magic`, and then within Stata, we fit a logistic regression model using age, weight, and their interaction as predictors of the probability of hypertension. 

We also push Stata's estimation results displayed by ereturn list, including the coefficient vector e(b) and variance–covariance matrix e(V), into a Python dictionary called myeret by specifying the -eret argument.

We can access e(b) and e(V) by typing myeret['e(b)'] and myeret['e(V)'], respectively, in Python. They are stored in NumPy arrays.

We use [margins](https://www.stata.com/manuals/rmargins.pdf) and [marginsplot](https://www.stata.com/manuals/rmarginsplot.pdf) to graph predictions over age, which more clearly illustrates the relationship between age and the probability of hypertension.

You can pass a data in memory in Stata to Python (Pandas) using -doutd

show the head of `df`

You can also do this without the `%%stata` magic by using `pystata`

import also `seaborn` and `numpy`

Plot the relationship

You can also pass a `Pandas` dataframe to `Stata` in order to say run regressions

Then we use the get_ereturn() method to store the e() results returned by the logistic command in Python as a dictionary named myeret2 and display e(b) and e(V) within it.

You can even run entire do files

Or call dofiles

In [None]:
stata.run('''
do reg_nhanes2
''')