---
execute:
  enabled: false
---

# PyStata Integration for MakeTables

This notebook demonstrates how to integrate Stata results in tables with MakeTables. You need to have a local Stata installation and setup [pystata](https://www.stata.com/python/pystata19/) to run this notebook.

## Basic Usage

In [1]:
import stata_setup

# Adjust the path to your Stata installation
stata_setup.config("C:/Program Files/Stata18", "mp")

import pystata
import maketables as mt

# Run regression in Stata 
pystata.stata.run('''
    sysuse auto, clear
    regress mpg weight length foreign
''')

# Extract results and labels for MakeTables
result = mt.extract_current_stata_results()

# Create table
mt.ETable([result], caption="Regression Results from Stata")


  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 18.5
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: Unlimited-user 4-core network, expiring 14 Dec 2025
Serial number: 501809302858
  Licensed to: Dirk Sliwka
               Universität zu Köln

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

. 
.     sysuse auto, clear
(1978 automobile data)

.     regress mpg w

Regression Results from Stata,Regression Results from Stata
Unnamed: 0_level_1,Mileage (mpg)
Unnamed: 0_level_2,(1)
coef,coef
Weight (lbs.),-0.004** (0.002)
Length (in.),-0.083 (0.055)
Car origin,-1.708 (1.067)
Intercept,50.537*** (6.246)
stats,stats
Observations,74
R2,0.673
"Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)"




### `rstata()` Wrapper Function

The `rstata()` function combines Stata execution and result extraction.

In [11]:
# Run regression and auto-extract results in one step (quietly=True supresses display of stata output)
result = mt.rstata("regress mpg weight length foreign", quietly=True)

# Create table
mt.ETable([result], caption="Regression Results from Stata")

Regression Results from Stata,Regression Results from Stata
Unnamed: 0_level_1,Mileage (mpg)
Unnamed: 0_level_2,(1)
coef,coef
Weight (lbs.),-0.004** (0.002)
Length (in.),-0.083 (0.055)
Car origin,-1.708 (1.067)
Intercept,50.537*** (6.246)
stats,stats
Observations,74
R2,0.673
"Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)"




## Multiple Model Comparison


In [6]:
# Run multiple specifications with quietly=True for clean output
model1 = mt.rstata('regress mpg weight', quietly=True)
model2 = mt.rstata('regress mpg weight length', quietly=True)
model3 = mt.rstata('regress mpg weight length foreign', quietly=True)

# Create comparison table
mt.ETable([model1, model2, model3])

Unnamed: 0_level_0,Mileage (mpg),Mileage (mpg),Mileage (mpg)
Unnamed: 0_level_1,(1),(2),(3)
coef,coef,coef,coef
Weight (lbs.),-0.006*** (0.001),-0.004* (0.002),-0.004** (0.002)
Length (in.),,-0.080 (0.055),-0.083 (0.055)
Car origin,,,-1.708 (1.067)
Intercept,39.440*** (1.614),47.885*** (6.088),50.537*** (6.246)
stats,stats,stats,stats
Observations,74,74,74
R2,0.652,0.661,0.673
"Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)"




## Categorical Variables and Interactions

You can also use Stata's i. and c. operators to create dummy variables and interaction terms. The makeTables Stata extractor will extract also Stata value labels and convert the stata variable names into the formulaic notation used by python regression packages and thus also handles relabeling and formating of these categorical variables and interaction terms. 

In [10]:
# Setup data with categorical variables
pystata.stata.run('''
    sysuse auto, clear
    
    // Create categorical variables for demonstration
    gen price_cat = 1 if price < 5000
    replace price_cat = 2 if price >= 5000 & price < 10000  
    replace price_cat = 3 if price >= 10000 & price != .
    label define price_lbl 1 "Low" 2 "Medium" 3 "High"
    label values price_cat price_lbl
    label variable price_cat "Price category"
    
''', quietly=True)

model1 = mt.rstata('regress mpg i.price_cat weight foreign', quietly=True)
model2 = mt.rstata('regress mpg c.weight##i.foreign i.price_cat', quietly=True)

# Create comparison table
mt.ETable([model1, model2], cat_template="{value}")





Unnamed: 0_level_0,Mileage (mpg),Mileage (mpg)
Unnamed: 0_level_1,(1),(2)
coef,coef,coef
Medium,-0.641 (1.045),-0.386 (1.013)
High,-0.085 (1.727),0.705 (1.694)
Weight (lbs.),-0.006*** (0.001),-0.006*** (0.001)
Car origin,-1.353 (1.343),
Foreign,,9.604* (4.560)
Foreign × Weight (lbs.),,-0.005* (0.002)
Intercept,41.422*** (2.809),40.121*** (2.757)
stats,stats,stats
Observations,74,74




## Combining results from different packages

Demonstrating identical regression specification run in both Stata and PyFixest.

In [12]:
# Stata vs PyFixest Side-by-Side Comparison
import pandas as pd
import pyfixest as pf

# Get Stata data and run Stata regression
df = pystata.stata.pdataframe_from_data()

# Apply the same value labels as defined in Stata
df['price_cat'] = df['price_cat'].map({1: 'Low', 2: 'Medium', 3: 'High'}).astype('category')
df['foreign'] = df['foreign'].map({0: 'Domestic', 1: 'Foreign'}).astype('category')

# Order categorial to assure that reference group correctly picked
df['price_cat'] = df['price_cat'].cat.reorder_categories(['Low', 'Medium', 'High'])
df['foreign'] = df['foreign'].cat.reorder_categories(['Domestic', 'Foreign'])

# Run regressions
pyfixest_result = pf.feols("mpg ~ i(price_cat)*weight", data=df)
stata_result = mt.rstata('regress mpg c.weight##i.price_cat', quietly=True, formulaic_names=True)

# Create comparison table
mt.ETable([stata_result, pyfixest_result], model_heads=["Stata (PyStata)", "PyFixest"])

Unnamed: 0_level_0,Mileage (mpg),Mileage (mpg)
Unnamed: 0_level_1,Stata (PyStata),PyFixest
Unnamed: 0_level_2,(1),(2)
coef,coef,coef
Weight (lbs.),-0.007*** (0.001),-0.007*** (0.001)
Price category=Medium,-5.139 (3.797),-5.139 (3.797)
Price category=High,-20.317* (9.061),-20.317* (9.061)
Price category=Medium × Weight (lbs.),0.001 (0.001),0.001 (0.001)
Price category=High × Weight (lbs.),0.005* (0.002),0.005* (0.002)
Intercept,42.113*** (2.495),42.113*** (2.495)
stats,stats,stats
Observations,74,74
R2,0.684,0.684


