# US Prison Analysis

This is an analysis of the admissions and releases of convicts in prisons all over the United States. The data are categorized according to the state and the racial background of the convicts. The dataset came from the Kaggle notebook summarized by Konrad Banachewicz (2023) which can be found here: https://www.kaggle.com/datasets/konradb/prison-population-in-the-us

## Set-up

In [92]:
import pandas as pd
import numpy as np
import matplotlib as plotly
import datetime as dt

In [93]:
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/US Prison Data/prison_data_revised.csv")
df = pd.DataFrame(df)

## Data Cleaning

In [94]:
## Correcting data types.

df["date"] = pd.to_datetime(df["date"])

In [95]:
## add another column for Month

df["month"] = df["date"].dt.strftime("%B")
df["year"] = df["date"].dt.strftime("%Y")

In [96]:
##correcting data types for the rates.

admissions_rate = [name for name in df.columns if ("rate" in name) & ("admissions" in name)]

df[admissions_rate] = df[admissions_rate].apply(lambda x: x.str.replace('%',''))
for x in admissions_rate:
  df[x] = df[x].astype('float')

In [97]:
## creating the list of admissions-related columns

admissions_column = [column for column in df.columns if ("admissions" in column) & ("rate" not in column)]
admissions_column

['total_admissions',
 'admissions_white',
 'admissions_black',
 'admissions_hispanic',
 'admissions_amerind',
 'admissions_asian',
 'admissions_other']

In [98]:
## dropping columns regarding releases.

df_columns = df.columns

releases_columns = [column for column in df_columns if "releases" in column]
df = df.drop(columns=releases_columns,axis=1)

## Data Analysis

In [99]:
df.describe()[["admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]]

Unnamed: 0,admissions_white,admissions_black,admissions_hispanic,admissions_amerind,admissions_asian,admissions_other
count,1914.0,1914.0,1914.0,1914.0,1914.0,1914.0
mean,452.177116,192.448276,144.72675,19.3093,3.864159,9.731975
std,413.67884,268.8421,339.850524,24.6161,4.92965,21.344895
min,1.0,0.0,0.0,0.0,0.0,0.0
25%,217.0,36.0,0.0,1.0,0.0,0.0
50%,368.5,109.0,26.0,13.0,3.0,2.0
75%,458.0,264.0,96.0,29.0,6.0,13.0
max,2707.0,1918.0,2208.0,159.0,40.0,217.0


In [100]:
## Finding the total of admissions for each racial background.
print("Total admissions per racial background.")

df[["admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]].sum().to_frame()

Total admissions per racial background.


Unnamed: 0,0
admissions_white,865467
admissions_black,368346
admissions_hispanic,277007
admissions_amerind,36958
admissions_asian,7396
admissions_other,18627


In [101]:
## total admissions per year.

print("Total Admissions Per Year")

df.groupby("year").agg("sum")["total_admissions"].to_frame()

Total Admissions Per Year


  df.groupby("year").agg("sum")["total_admissions"].to_frame()


Unnamed: 0_level_0,total_admissions
year,Unnamed: 1_level_1
2000,8352
2001,8376
2002,8844
2003,8736
2004,9552
2005,9696
2006,10524
2007,10332
2008,25823
2009,24900


In [102]:
## total admissions per racial background per year.

print("Total admissions per racial background per year")

df.groupby("year").agg({
    "total_admissions":"sum",
    "admissions_white":"sum",
    "admissions_black":"sum",
    "admissions_hispanic":"sum",
    "admissions_amerind":"sum",
    "admissions_asian":"sum",
    "admissions_other":"sum"
})

Total admissions per racial background per year


Unnamed: 0_level_0,total_admissions,admissions_white,admissions_black,admissions_hispanic,admissions_amerind,admissions_asian,admissions_other
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2000,8352,3972,4008,0,312,48,0
2001,8376,4044,3948,0,312,48,0
2002,8844,4296,4164,0,312,48,0
2003,8736,4212,4128,0,288,60,12
2004,9552,4740,4368,0,348,60,12
2005,9696,4968,4272,0,372,60,24
2006,10524,5328,4656,0,456,60,12
2007,10332,5256,4548,0,420,72,0
2008,25823,16629,8303,205,450,78,134
2009,24900,16703,7386,167,449,63,108


In [103]:
## create a separate dataframe for data in 2022

df_2022 = df.loc[df["year"] == "2022"]

In [104]:
## Total admissions per state in 2022

print("Total admissions per state in 2022")

df_2022.groupby("state").agg("sum")["total_admissions"].to_frame().sort_values("total_admissions",ascending=False)

Total admissions per state in 2022


  df_2022.groupby("state").agg("sum")["total_admissions"].to_frame().sort_values("total_admissions",ascending=False)


Unnamed: 0_level_0,total_admissions
state,Unnamed: 1_level_1
Texas,26062
California,16689
Arizona,9713
Kentucky,9199
Illinois,7542
Wisconsin,4656
Idaho,4529
Colorado,4384
Kansas,3407
Washington,3270


In [105]:
## Total admissions per state in 2022 per racial background.

print("Total admissions per state in 2022 per racial background")

df_2022.groupby("state").agg("sum")[["total_admissions","admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]].sort_values("total_admissions",ascending=False)

Total admissions per state in 2022 per racial background


  df_2022.groupby("state").agg("sum")[["total_admissions","admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]].sort_values("total_admissions",ascending=False)


Unnamed: 0_level_0,total_admissions,admissions_white,admissions_black,admissions_hispanic,admissions_amerind,admissions_asian,admissions_other
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Texas,26062,10156,7000,8771,0,0,135
California,16689,4206,3633,7946,0,0,904
Arizona,9713,3526,1348,3828,781,44,186
Kentucky,9199,7535,1420,104,8,13,119
Illinois,7542,2811,3954,738,6,9,24
Wisconsin,4656,2538,1669,0,392,55,2
Idaho,4529,3229,116,599,164,0,156
Colorado,4384,2212,579,1349,195,28,16
Kansas,3407,2384,814,0,174,33,2
Washington,3270,1901,470,551,205,113,30


In [106]:
## average admissions made per prison per racial background.

print("average admissions made per prison per racial background")

df_2022.groupby("state").agg("mean")[["total_admissions","admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]].sort_values("total_admissions",ascending=False)

average admissions made per prison per racial background


  df_2022.groupby("state").agg("mean")[["total_admissions","admissions_white","admissions_black","admissions_hispanic","admissions_amerind","admissions_asian","admissions_other"]].sort_values("total_admissions",ascending=False)


Unnamed: 0_level_0,total_admissions,admissions_white,admissions_black,admissions_hispanic,admissions_amerind,admissions_asian,admissions_other
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Texas,4343.666667,1692.666667,1166.666667,1461.833333,0.0,0.0,22.5
California,2384.142857,600.857143,519.0,1135.142857,0.0,0.0,129.142857
Kentucky,1314.142857,1076.428571,202.857143,14.857143,1.142857,1.857143,17.0
Illinois,1257.0,468.5,659.0,123.0,1.0,1.5,4.0
Arizona,1079.222222,391.777778,149.777778,425.333333,86.777778,4.888889,20.666667
Wisconsin,582.0,317.25,208.625,0.0,49.0,6.875,0.25
Idaho,566.125,403.625,14.5,74.875,20.5,0.0,19.5
Colorado,548.0,276.5,72.375,168.625,24.375,3.5,2.0
Washington,408.75,237.625,58.75,68.875,25.625,14.125,3.75
Kansas,378.555556,264.888889,90.444444,0.0,19.333333,3.666667,0.222222


In [107]:
## creating a table for the total admissions and percentage of the total.


## finding the total admissions over the years.
total_admissions = df_2022["total_admissions"].sum()

## creating the dataframe.
percent_table = pd.DataFrame({
    "racial_background":["White","Black","Hispanic","American-Indian","Asian","Others"],
    "total_admissions":[df_2022["admissions_white"].sum(),df_2022["admissions_black"].sum(),df_2022["admissions_hispanic"].sum(),df_2022["admissions_amerind"].sum(),df_2022["admissions_asian"].sum(),df_2022["admissions_other"].sum()],
    "total_percent":[df_2022["admissions_white"].sum()/total_admissions,df_2022["admissions_black"].sum()/total_admissions,df_2022["admissions_hispanic"].sum()/total_admissions,df_2022["admissions_amerind"].sum()/total_admissions,df_2022["admissions_asian"].sum()/total_admissions,df_2022["admissions_other"].sum()/total_admissions]
})


## reformatting the total_percent column.
percent_table["total_percent"] = percent_table["total_percent"].apply(lambda x: round(x * 100,2))
percent_table["total_percent"] = percent_table["total_percent"].astype(str) + "%"


## sort values from greatest admissions.
percent_table= percent_table.sort_values("total_admissions",ascending=False)
display(percent_table)

Unnamed: 0,racial_background,total_admissions,total_percent
0,White,48807,48.07%
2,Hispanic,25004,24.63%
1,Black,22711,22.37%
3,American-Indian,2474,2.44%
5,Others,1688,1.66%
4,Asian,452,0.45%


In [108]:
## average percentage make-up per racial background per prison in 2022.

print("average percentage make-up per racial background per prison in 2022")

df_2022.groupby("state").agg({
    "total_admissions":"sum",
    "admissions_white_rate":"mean",
    "admissions_black_rate":"mean",
    "admissions_hispanic_rate":"mean",
    "admissions_amerind_rate":"mean",
    "admissions_asian_rate":"mean",
    "admissions_other_rate":"mean",
}).sort_values("total_admissions",ascending=False)

average percentage make-up per racial background per prison in 2022


Unnamed: 0_level_0,total_admissions,admissions_white_rate,admissions_black_rate,admissions_hispanic_rate,admissions_amerind_rate,admissions_asian_rate,admissions_other_rate
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Texas,26062,39.061667,26.833333,33.595,0.0,0.0,0.513333
California,16689,25.31,21.832857,47.358571,0.0,0.0,5.495714
Arizona,9713,36.335556,13.783333,39.411111,8.084444,0.458889,1.927778
Kentucky,9199,81.845714,15.5,1.117143,0.085714,0.138571,1.31
Illinois,7542,37.68,52.07,9.715,0.08,0.115,0.335
Wisconsin,4656,54.525,35.85625,0.0,8.39875,1.1775,0.04125
Idaho,4529,71.45875,2.5675,13.295,3.60875,0.0,3.45
Colorado,4384,50.46,13.22625,30.71,4.49,0.63375,0.36375
Kansas,3407,69.996667,23.84,0.0,5.12,0.978889,0.063333
Washington,3270,58.25375,14.35375,16.7175,6.3275,3.42,0.92875


## Specific Data Analytics

In [109]:
## For every 10 people incarcerated, how many are each racial background?

tempdf = df[admissions_rate].mean().to_frame()
tempdf = tempdf[0].apply(lambda x : round(x,2)).to_frame()
tempdf["per_10_people"] = tempdf[0].apply(lambda x : round((x / 100) * 10))
tempdf = tempdf.rename(columns={
    0:"admission_rate",
})
display(tempdf)

Unnamed: 0,admission_rate,per_10_people
admissions_white_rate,62.29,6
admissions_black_rate,20.1,2
admissions_hispanic_rate,11.35,1
admissions_amerind_rate,4.06,0
admissions_asian_rate,0.72,0
admissions_other_rate,1.17,0


In [110]:
## total admissions for the last 5 years.

print("The admissions have decreased significantly over the past five years.")

df["year"] = df["year"].astype(int)
tempdf = df.loc[df["year"]>=2018]
tempdf.groupby("year").agg("sum")[admissions_column]

The admissions have decreased significantly over the past five years.


  tempdf.groupby("year").agg("sum")[admissions_column]


Unnamed: 0_level_0,total_admissions,admissions_white,admissions_black,admissions_hispanic,admissions_amerind,admissions_asian,admissions_other
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018,199161,106382,46955,38368,4429,994,1657
2019,223463,110273,51090,52734,4453,945,3797
2020,123174,65537,25988,27255,3063,604,1879
2021,162059,77986,35621,40705,3268,654,2954
2022,101535,48807,22711,25004,2474,452,1688


In [111]:
## total admissions rate for the last 5 years.

print("Over the past five years, the average racial make-up of the total_admissions that year remained consistent.")

df["year"] = df["year"].astype(int)
tempdf = df.loc[df["year"]>=2018]
tempdf.groupby("year").agg("mean")[admissions_rate]

Over the past five years, the average racial make-up of the total_admissions that year remained consistent.


  tempdf.groupby("year").agg("mean")[admissions_rate]


Unnamed: 0_level_0,admissions_white_rate,admissions_black_rate,admissions_hispanic_rate,admissions_amerind_rate,admissions_asian_rate,admissions_other_rate
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018,63.373186,18.214118,11.737941,4.341078,0.841814,1.025245
2019,60.909583,18.086991,13.712083,4.143657,0.797593,1.554398
2020,62.177037,17.776991,14.264352,4.592454,0.774352,1.833333
2021,59.947593,17.975046,14.445046,4.25625,0.753657,1.606389
2022,60.108077,17.676846,14.018154,5.264923,0.894923,1.217385


In [112]:
## total admissions rate for the last 5 years.

print("This is supported by the fact that for the past five years, the standard deviation for each racial make-up remained largely one point above or below the previous year's.")

df["year"] = df["year"].astype(int)
tempdf = df.loc[df["year"]>=2018]
tempdf.groupby("year").agg("std")[admissions_rate]

This is supported by the fact that for the past five years, the standard deviation for each racial make-up remained largely one point above or below the previous year's.

  tempdf.groupby("year").agg("std")[admissions_rate]





Unnamed: 0_level_0,admissions_white_rate,admissions_black_rate,admissions_hispanic_rate,admissions_amerind_rate,admissions_asian_rate,admissions_other_rate
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018,15.38493,12.548433,11.739459,6.239734,1.064653,1.437803
2019,16.975245,12.005279,13.964297,6.37946,1.019722,2.030352
2020,21.40585,12.663421,14.082813,7.63611,1.185857,3.25909
2021,17.384949,11.707473,14.438939,6.308286,1.050448,2.093715
2022,17.13304,11.593389,14.797412,6.963408,1.172176,1.595969


In [113]:
## When was the year the admissions for black people the highest?

print("The share of black people in the total admissions per year were signifantly lower in the years 2015 up to present.")
df.groupby("year").agg("mean")["admissions_black_rate"].to_frame().sort_values("admissions_black_rate",ascending=False)

The share of black people in the total admissions per year were signifantly lower in the years 2015 up to present.


  df.groupby("year").agg("mean")["admissions_black_rate"].to_frame().sort_values("admissions_black_rate",ascending=False)


Unnamed: 0_level_0,admissions_black_rate
year,Unnamed: 1_level_1
2000,47.99
2003,47.25
2001,47.13
2002,47.08
2004,45.73
2006,44.24
2005,44.06
2007,44.02
2008,33.897917
2009,31.56125


In [114]:
## When was the year the admissions for asianpeople the highest?

print("This trend cannot be seen in the asian-related admissions which have increased in recent years despite it still being below 1% and close to negligible.")
df.groupby("year").agg("mean")["admissions_asian_rate"].to_frame().sort_values("admissions_asian_rate",ascending=False)

This trend cannot be seen in the asian-related admissions which have increased in recent years despite it still being below 1% and close to negligible.


  df.groupby("year").agg("mean")["admissions_asian_rate"].to_frame().sort_values("admissions_asian_rate",ascending=False)


Unnamed: 0_level_0,admissions_asian_rate
year,Unnamed: 1_level_1
2022,0.894923
2016,0.879235
2018,0.841814
2019,0.797593
2017,0.777304
2020,0.774352
2021,0.753657
2007,0.7
2003,0.69
2004,0.63


In [115]:
## average percentage make-up per racial background per prison in 2022.

print("In 2022, more than half of Illionois admissions were for black individuals.")

df_2022.groupby("state").agg({
    "total_admissions":"sum",
    "admissions_black_rate":"mean",
}).sort_values("admissions_black_rate",ascending=False)

In 2022, more than half of Illionois admissions were for black individuals.


Unnamed: 0_level_0,total_admissions,admissions_black_rate
state,Unnamed: 1_level_1,Unnamed: 2_level_1
Illinois,7542,52.07
Wisconsin,4656,35.85625
Texas,26062,26.833333
Nebraska,1708,23.85
Kansas,3407,23.84
Iowa,2497,22.7325
California,16689,21.832857
Kentucky,9199,15.5
Washington,3270,14.35375
Arizona,9713,13.783333


In [116]:
## average percentage make-up per racial background per prison in 2022.

print("In 2022, close to half of admissions made were for Hispanic individuals, highly likely owing to the Hispanic population in California.")

df_2022.groupby("state").agg({
    "total_admissions":"sum",
    "admissions_hispanic_rate":"mean",
}).sort_values("admissions_hispanic_rate",ascending=False)

In 2022, close to half of admissions made were for Hispanic individuals, highly likely owing to the Hispanic population in California.


Unnamed: 0_level_0,total_admissions,admissions_hispanic_rate
state,Unnamed: 1_level_1,Unnamed: 2_level_1
California,16689,47.358571
Arizona,9713,39.411111
Texas,26062,33.595
Colorado,4384,30.71
Utah,2053,19.351429
Washington,3270,16.7175
Idaho,4529,13.295
Nebraska,1708,13.0225
Oregon,2947,10.17875
Illinois,7542,9.715


In [117]:
## average percentage make-up per racial background per prison in 2022.

print(f'In 2022, Utah had the largest share of admissions of asian individuals. However, this is still significantly lower than the admission rate for black individuals.')

df_2022.groupby("state").agg({
    "total_admissions":"sum",
    "admissions_asian_rate":"mean",
}).sort_values("admissions_asian_rate",ascending=False)

In 2022, Utah had the largest share of admissions of asian individuals. However, this is still significantly lower than the admission rate for black individuals.


Unnamed: 0_level_0,total_admissions,admissions_asian_rate
state,Unnamed: 1_level_1,Unnamed: 2_level_1
Utah,2053,3.7
Washington,3270,3.42
Oregon,2947,1.36625
Wisconsin,4656,1.1775
Kansas,3407,0.978889
Maine,670,0.944444
Nebraska,1708,0.80625
Iowa,2497,0.69875
Colorado,4384,0.63375
Arizona,9713,0.458889


## Data Visualization