# Final Analysis: Prison Education and Recidivism

This notebook simulates post-release outcomes using trends from the Bureau of Justice Statistics. The goal is to estimate how post-release employment, offense type, and time served influence the likelihood of recidivism. Employment is used as a proxy for participation in prison education programs.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns

In [None]:
np.random.seed(42)
n = 1000

# Simulate binary predictors
employed = np.random.binomial(1, 0.5, n)
violent_offense = np.random.binomial(1, 0.4, n)
time_served = np.round(np.random.uniform(0.5, 10, n), 1)

# Create linear predictor
linear_pred = 0.9 * violent_offense - 0.98 * employed + 0.05 * time_served
prob_recidivism = 1 / (1 + np.exp(-linear_pred))

# Simulate recidivism based on predicted probabilities
recidivism = np.random.binomial(1, prob_recidivism)

# Build DataFrame
df = pd.DataFrame({
    "Employed": employed,
    "Violent_Offense": violent_offense,
    "Time_Served": time_served,
    "Recidivism": recidivism
})

df.head()

In [None]:
X = sm.add_constant(df[["Employed", "Violent_Offense", "Time_Served"]])
y = df["Recidivism"]
model = sm.Logit(y, X).fit()
model.summary()

In [None]:
plt.figure(figsize=(6, 4))
sns.barplot(x="Violent_Offense", y="Employed", data=df, ci=None)
plt.xticks([0, 1], ["Drug", "Violent"])
plt.title("Employment Rate by Offense Type")
plt.ylabel("Employment Rate")
plt.tight_layout()
plt.savefig("../outputs/employment_by_offense.png")
plt.show()

In [None]:
coef_df = pd.DataFrame({
    "Variable": model.params.index,
    "Coefficient": model.params.values
})

plt.figure(figsize=(6, 4))
sns.barplot(x="Coefficient", y="Variable", data=coef_df, palette="Blues_d")
plt.axvline(0, color='black', linestyle='--')
plt.title("Logistic Regression Coefficients")
plt.tight_layout()
plt.savefig("../outputs/logistic_regression_coefficients.png")
plt.show()

In [None]:
df["Predicted_Prob"] = model.predict(X)

plt.figure(figsize=(6, 4))
sns.histplot(df[df["Employed"] == 1]["Predicted_Prob"], color="green", label="Employed", kde=True)
sns.histplot(df[df["Employed"] == 0]["Predicted_Prob"], color="red", label="Unemployed", kde=True)
plt.title("Predicted Probability of Recidivism")
plt.xlabel("Probability")
plt.legend()
plt.tight_layout()
plt.savefig("../outputs/recidivism_probability_histogram.png")
plt.show()


In [None]:
df.to_csv("../data/simulated_dataset.csv", index=False)


## Conclusion

This analysis shows that:
- **Employment after release significantly lowers the odds of recidivism**
- **Violent offenses** increase the likelihood of reoffending
- **Time served** is not statistically significant in this simulation

These findings align with prior research and support expanding access to education and job-readiness programs for incarcerated individuals.
