main question :Do blue states have stricter crime policies than red states?
To measure this, we are examining gubernatorial races that have flipped results to see the impact on prison populations over 5 year period. Is there a noticeable difference in population trends between Democratic and Republican states?

The comparison between crime policies in blue states (typically Democratic) and red states (typically Republican) is complex and nuanced. Generally, blue states tend to implement more progressive policies, such as focusing on rehabilitation, reducing incarceration rates, and implementing stricter gun control laws. Red states, on the other hand, often emphasize tougher sentencing, increased police funding, and more lenient gun laws.

However, the effectiveness and impact of these policies can vary widely. For instance, some studies suggest that red states have higher overall violent crime rates, while blue states may have higher crime rates in urban areas. Additionally, the differences in crime rates can often be attributed to various social, economic, and demographic factors rather than the policies themselves.

The study has several **limitations** that must be acknowledged. Firstly, the analysis did not account for prisoners who died during the study period. Secondly, it overlooked prisoners who were transferred to facilities in other states. Additionally, individuals on parole were not considered in the evaluation. Lastly, the study did not fully address the impact of legislative changes and the time required for these laws to influence case outcomes. These factors collectively suggest that the findings may not fully capture the complexities of the prison system and its associated legal processes.

In [1]:
## Loading Libraries and Modules

# scikit-learn: barebones, but fast and reliable
from sklearn.linear_model import LogisticRegression 
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.metrics import confusion_matrix, classification_report, precision_score
from sklearn.tree import DecisionTreeClassifier

# statsmodels: pretty and good to use, great for interpretable ML
from statsmodels.formula.api import ols
from statsmodels.formula.api import logit
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.outliers_influence import variance_inflation_factor

# Data processing
import pandas as pd
import numpy as np

# Plotting things:
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px


load the data sets
https://www.prisonpolicy.org/data/prison_pops_2019_2023_sources.html 
https://en.wikipedia.org/wiki/2022_United_States_gubernatorial_elections#cite_note-11 

In [2]:
ele_22 = pd.read_csv('2022_US_Gubernatorial_Elections.csv')
ppop = pd.read_csv('Prison_Data_2019-2023.csv')

join on the state then look for pop and result

In [None]:
ele_22.head()


In [None]:
ppop.head()

In [3]:
# Performing an inner join on the 'ID' column
r22 = pd.merge(ele_22, ppop, on='State', how='inner')

# Display the result
print(r22)


             State PVI[3]                   Incumbent[4] Last\nrace  \
0          Alabama   R+15                       Kay Ivey    59.5% R   
1           Alaska    R+8                  Mike Dunleavy    51.4% R   
2          Arizona    R+2      Doug Ducey (term-limited)    56.0% R   
3         Arkansas   R+16  Asa Hutchinson (term-limited)    65.3% R   
4       California   D+13                   Gavin Newsom    61.9% D   
5         Colorado    D+4                    Jared Polis    53.4% D   
6      Connecticut    D+7                     Ned Lamont    49.4% D   
7          Florida    R+3                   Ron DeSantis    49.6% R   
8          Georgia    R+3                     Brian Kemp    50.2% R   
9           Hawaii   D+14       David Ige (term-limited)    62.7% D   
10           Idaho   R+18                    Brad Little    59.8% R   
11        Illinois    D+7                 J. B. Pritzker    54.5% D   
12            Iowa    R+6                   Kim Reynolds    50.3% R   
13    

In [None]:
r22.head()

looking at this data we will subset/drop many of the states we are looking at states that were toss ups from the RCP (right wing media). looking at right wing media will favor states that have a chance to turn red that historical are consider blue state

In [None]:
r22.info()

In [5]:
df = r22[r22.apply(lambda r: r.str.contains(r'\(flip\)').any(), axis=1)]
df.head()

Unnamed: 0,State,PVI[3],Incumbent[4],Last\nrace,"Cook\nOct 28,\n2022[5]","IE\nNov 3,\n2022[6]","Sabato\nNov 7,\n2022[7]","Politico\nNov 3,\n2022[8]","RCP\nNov 2,\n2022[9]","Fox\nNov 1,\n2022[10]",...,"ED\nNov 7,\n2022[12]",Result,2019,2020,2021,2022,2023,2023 Population source and notes,Source URL,Unnamed: 8
2,Arizona,R+2,Doug Ducey (term-limited),56.0% R,Tossup,Tossup,Lean R,Tossup,Tossup,Tossup,...,Lean R,Hobbs\n50.3% D (flip),42441,37794,33914,33865,34502,"Dec. 14, 2023 Count Sheet (Grand total)",https://corrections.az.gov/sites/default/files...,
13,Kansas,R+10,Laura Kelly,48.0% D,Tossup,Tossup,Lean R (flip),Tossup,Tossup,Tossup,...,Lean D,Kelly\n49.5% D,10177,8779,8521,8709,9005,"KS DOC Current populaton totals as of Dec. 13,...",https://www.doc.ks.gov/current_population_totals,
15,Maryland,D+14,Larry Hogan (term-limited),55.4% R,Solid D (flip),Likely D (flip),Safe D (flip),Solid D (flip),Safe D (flip),Solid D (flip),...,Safe D (flip),Moore\n64.5% D (flip),18595,15623,15134,15637,None located,,,
16,Massachusetts,D+15,Charlie Baker (retiring),66.6% R,Solid D (flip),Likely D (flip),Safe D (flip),Solid D (flip),Safe D (flip),Solid D (flip),...,Safe D (flip),Healey\n63.8% D (flip),8205,6762,6148,6001,6144,"MA DOC Weekly Inmate Count as of Dec. 4, 2023 ...",https://www.mass.gov/doc/weekly-inmate-count-1...,
20,Nevada,R+1,Steve Sisolak,49.4% D,Tossup,Tossup,Lean R (flip),Tossup,Tossup,Tossup,...,Lean R (flip),Lombardo\n48.8% R (flip),12840,11249,10202,10304,10535,NV DOC Monthly Statistical Abstracts Factsheet...,https://doc.nv.gov/uploadedFiles/docnvgov/cont...,


If this worked it would pull out Arizona, Kansas, Maryland, Massachusetts, and Nevada.

In [6]:
df.apply(pd.DataFrame.describe, axis=1)

Unnamed: 0,count,unique,top,freq
2,20,14,Tossup,5
13,20,15,Tossup,5
15,18,13,Solid D (flip),4
16,20,15,Solid D (flip),4
20,20,14,Tossup,5
34,20,14,Tossup,6


In [7]:
# Define the columns you want to keep
columns_to_keep = ['State', 'Result', '2019', '2020', '2021', '2022', '2023']

# Subset the DataFrame to keep only the desired columns
df_subset = df[columns_to_keep]

# Set 'State' column as the index if needed
df_subset.set_index('State', inplace=True)

# Display the resulting DataFrame
print(df_subset.head())


                                 Result    2019    2020    2021    2022  \
State                                                                     
Arizona           Hobbs\n50.3% D (flip)  42,441  37,794  33,914  33,865   
Kansas                   Kelly\n49.5% D  10,177   8,779   8,521   8,709   
Maryland          Moore\n64.5% D (flip)  18,595  15,623  15,134  15,637   
Massachusetts    Healey\n63.8% D (flip)   8,205   6,762   6,148   6,001   
Nevada         Lombardo\n48.8% R (flip)  12,840  11,249  10,202  10,304   

                       2023  
State                        
Arizona              34,502  
Kansas                9,005  
Maryland       None located  
Massachusetts         6,144  
Nevada               10,535  


In [15]:
pd.set_option('display.max_columns', None)
print(df_subset)

                                 Result    2019    2020    2021    2022  \
State                                                                     
Arizona           Hobbs\n50.3% D (flip)  42,441  37,794  33,914  33,865   
Kansas                   Kelly\n49.5% D  10,177   8,779   8,521   8,709   
Maryland          Moore\n64.5% D (flip)  18,595  15,623  15,134  15,637   
Massachusetts    Healey\n63.8% D (flip)   8,205   6,762   6,148   6,001   
Nevada         Lombardo\n48.8% R (flip)  12,840  11,249  10,202  10,304   
Wisconsin                Evers\n51.2% D  23,956  20,298  20,202  20,873   

                       2023  
State                        
Arizona              34,502  
Kansas                9,005  
Maryland       None located  
Massachusetts         6,144  
Nevada               10,535  
Wisconsin            21,923  


In [18]:
print(df_subset.columns)

Index(['Result', '2019', '2020', '2021', '2022', '2023'], dtype='object')


In [20]:
dfs = df_subset
dfs.head()

Unnamed: 0_level_0,Result,2019,2020,2021,2022,2023
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arizona,Hobbs\n50.3% D (flip),42441,37794,33914,33865,34502
Kansas,Kelly\n49.5% D,10177,8779,8521,8709,9005
Maryland,Moore\n64.5% D (flip),18595,15623,15134,15637,None located
Massachusetts,Healey\n63.8% D (flip),8205,6762,6148,6001,6144
Nevada,Lombardo\n48.8% R (flip),12840,11249,10202,10304,10535
