# Mixed Python and Stata Analysis

This notebook demonstrates how to combine Python and Stata in the same workflow using magic commands.

In [1]:
# Import necessary Python packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nbstata import config_stata

# Set up plotting style
sns.set_style("whitegrid")
%matplotlib inline

ImportError: cannot import name 'config_stata' from 'nbstata' (/Users/casparm4/Library/Caches/pypoetry/virtualenvs/jupyter-stata-environment-IHKJBeOb-py3.13/lib/python3.13/site-packages/nbstata/__init__.py)

## Generate Data in Python

In [None]:
# Generate synthetic data
np.random.seed(42)
n = 1000

data = pd.DataFrame({
    'x1': np.random.normal(0, 1, n),
    'x2': np.random.normal(0, 1, n),
    'x3': np.random.uniform(0, 1, n),
    'group': np.random.choice(['A', 'B', 'C'], n)
})

# Create dependent variable with some relationship
data['y'] = (2 * data['x1'] - 1.5 * data['x2'] + 
             3 * data['x3'] + 
             data['group'].map({'A': 0, 'B': 1, 'C': 2}) +
             np.random.normal(0, 0.5, n))

print(data.head())
print(f"\nDataset shape: {data.shape}")

## Python Analysis

In [None]:
# Basic descriptive statistics in Python
print("Descriptive Statistics (Python):")
print(data.describe())

# Correlation matrix
print("\nCorrelation Matrix:")
print(data[['y', 'x1', 'x2', 'x3']].corr())

In [None]:
# Visualization with Python
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Histogram of y
axes[0, 0].hist(data['y'], bins=30, edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Distribution of Y')
axes[0, 0].set_xlabel('Y')
axes[0, 0].set_ylabel('Frequency')

# Scatter plot
axes[0, 1].scatter(data['x1'], data['y'], alpha=0.5)
axes[0, 1].set_title('Y vs X1')
axes[0, 1].set_xlabel('X1')
axes[0, 1].set_ylabel('Y')

# Box plot by group
data.boxplot(column='y', by='group', ax=axes[1, 0])
axes[1, 0].set_title('Y by Group')

# Correlation heatmap
corr = data[['y', 'x1', 'x2', 'x3']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, ax=axes[1, 1])
axes[1, 1].set_title('Correlation Heatmap')

plt.tight_layout()
plt.show()

## Transfer Data to Stata

Now let's save the data and analyze it in Stata:

In [None]:
# Save data for Stata
data.to_stata('temp_data.dta', write_index=False)
print("Data saved to temp_data.dta")

In [None]:
%%stata
* Load the data created in Python
use temp_data.dta, clear

* Display basic information
describe
summarize

## Stata Regression Analysis

In [None]:
%%stata
* Encode string variable for regression
encode group, generate(group_num)

* Run regression
regress y x1 x2 x3 i.group_num

* Store results
estimates store model1

In [None]:
%%stata
* Post-estimation diagnostics
estat vif
estat hettest
estat ovtest

In [None]:
%%stata
* Marginal effects plot
margins group_num, atmeans
marginsplot, title("Predicted Values by Group")

## Advanced Stata Analysis

In [None]:
%%stata
* Quantile regression
qreg y x1 x2 x3 i.group_num, quantile(0.25)
estimates store q25

qreg y x1 x2 x3 i.group_num, quantile(0.50)
estimates store q50

qreg y x1 x2 x3 i.group_num, quantile(0.75)
estimates store q75

* Compare results
estimates table model1 q25 q50 q75, b(%9.3f) se(%9.3f)

## Back to Python for Final Visualization

In [None]:
# Read the Stata file back into Python
data_processed = pd.read_stata('temp_data.dta')

# You could also get predictions from Stata and visualize them in Python
print("Data successfully read back into Python")
print(f"Processed data shape: {data_processed.shape}")

In [None]:
# Clean up temporary file
import os
if os.path.exists('temp_data.dta'):
    os.remove('temp_data.dta')
    print("Temporary file cleaned up")

## Summary

This notebook demonstrated how to:
1. Generate and manipulate data in Python
2. Transfer data between Python and Stata
3. Perform statistical analysis in Stata
4. Use Stata magic commands (%%stata) in Python notebooks
5. Combine the strengths of both languages in a single workflow

This approach allows you to leverage:
- Python's data manipulation and visualization libraries
- Stata's specialized econometric and statistical procedures
- The best of both worlds in a single, reproducible notebook