In [1]:
import pandas as pd

# Syntax

The online textbook [Coding for Economists](https://aeturrell.github.io/coding-for-economists/intro.html) has a great [section summarizing the differences in syntax](https://aeturrell.github.io/coding-for-economists/coming-from-stata.html) between Stata and Python. This chapter does not aim to replace the key material included there. Instead, I hope it can serve as a supplement for those transitioning from Stata to Python.

This chapter will aim to describe [`stata2python`](https://pypi.org/project/stata2python/), a Python package where you can write Stata commands and recieve the equivalent code for them in Python.

## stata2python

### Installation and Import

To install `stata2python`, you can simply use the package installer [`pip`](https://pypi.org/project/pip/). An example is shown below. To do so, simply run the following command in your terminal.

    pip install --upgrade stata2python
    
After you have installed the package, you can simply import it into your Jupyter notebook. Alternatively, you can also access a [Python shell](https://www.tutorialsteacher.com/python/python-interective-shell) ( by typing `python3` in your terminal) and begin using it. To import the package in your notebook/shell, the following syntax is encouraged.

    from stata2python import stata2python
   
We import the package below.

In [2]:
from stata2python import stata2python

We also import a sample NBA dataset that will be helpful for demonstrating the features of our package.

In [3]:
nba = pd.read_csv("data/nba.csv")
nba.head()

Unnamed: 0,married,wage,exper,age,coll,games,minutes,guard,forward,center,points,rebounds,assists,draft,allstar,avgmin,black,children
0,1,1002.5,4,27,4,77,2867,1,0,0,16,4,5,19.0,0,37.23,1,0
1,1,2030.0,5,28,4,78,2789,1,0,0,13,3,9,28.0,0,35.76,1,1
2,0,650.0,1,25,4,74,1149,0,0,1,6,3,0,19.0,0,15.53,1,0
3,0,2030.0,5,28,4,47,1178,0,1,0,7,5,2,1.0,0,25.06,1,0
4,0,755.0,3,24,4,82,2096,1,0,0,11,4,3,24.0,0,25.56,1,0


### Usage

Currently, `stata2python` only supports commands necessary to teach an introductory course in econometrics. If you would like to contribute, feel free to create pull requests [here](https://github.com/rohanjha123/data-h195/tree/main/creating_package). Below, we discuss the commands currently supported by the package.

By importing `stata2python` via `from stata2python import stata2python`, you can access a function named `stata2python`. You can enter in any Stata command as a string to this function. If the command is supported by `stata2python`, the function will output Python code equivalent to the Stata command.  

Optionally, you may also specify the name of the DataFrame you're working with via the `df_name` parameter in `stata2python`. The default value for `df_name` is simply `df`. 

Below, we discuss all the features currently supported by `Stata2Python`, along with providing example usages.

#### T-test

This function helps users determine the code for running [t-tests](https://www.jmp.com/en_us/statistics-knowledge-portal/t-test.html) in Python. Examples include:

In [4]:
stata2python("ttest wage, by(guard)")

import numpy as np
from scipy import stats
catvar_vals = np.unique(df['guard'])
if len(catvar_vals) != 2:
    raise ValueError(f'The categorical variable (guard) doesn\'t have 2 groups')
df_1 = df[df['guard'] == catvar_vals[0]]
df_2 = df[df['guard'] == catvar_vals[1]]
ttest = stats.ttest_ind(df_1['wage'], df_2['wage'], equal_var=True, nan_policy='propagate')
t_stat = ttest.statistic
p_val = ttest.pvalue
print(f'T-stat: {t_stat}, P-value: {p_val}')


In [5]:
stata2python("ttest wage, by(guard) unequal", "nba")

import numpy as np
from scipy import stats
catvar_vals = np.unique(nba['guard'])
if len(catvar_vals) != 2:
    raise ValueError(f'The categorical variable (guard) doesn\'t have 2 groups')
df_1 = nba[nba['guard'] == catvar_vals[0]]
df_2 = nba[nba['guard'] == catvar_vals[1]]
ttest = stats.ttest_ind(df_1['wage'], df_2['wage'], equal_var=False, nan_policy='propagate')
t_stat = ttest.statistic
p_val = ttest.pvalue
print(f'T-stat: {t_stat}, P-value: {p_val}')


Assuming you have all the correct packages installed, you can directly copy paste this code to see the output. For example, 

In [8]:
import numpy as np
from scipy import stats
catvar_vals = np.unique(nba['guard'])
if len(catvar_vals) != 2:
    raise ValueError(f'The categorical variable (guard) doesn\'t have 2 groups')
df_1 = nba[nba['guard'] == catvar_vals[0]]
df_2 = nba[nba['guard'] == catvar_vals[1]]
ttest = stats.ttest_ind(df_1['wage'], df_2['wage'], equal_var=False, nan_policy='propagate')
t_stat = ttest.statistic
p_val = ttest.pvalue
print(f'T-stat: {t_stat}, P-value: {p_val}')

T-stat: 2.1432820571177977, P-value: 0.03299634994484977


### Gen

Used for generating new columns. For example,

In [9]:
stata2python("gen degree = (coll >= 4)")

df['degree'] = df['coll'] >= 4


In [10]:
stata2python("gen productivity = points/(minutes/games)","nba")

nba['productivity'] = nba['points']/(nba['minutes']/nba['games'])
