# Identifying Prison Conditions, by State

### From the UCLA COVID-19 Behind Bars dataset, identifying and cleaning the policies prisons have put in place to align with social distance recommendations, and support prisoner mental health as a result of the isolation. 
### From the same dataset, identifying the pre-COVID-19 incarcerated population. 
### From the Bureau of Justice Statistics, identifying prison capacity (as of 2018) and incarcerated population (as of 2018). 
### From the Marshall Project, identifying COVID-19 cases, tests, and deaths in staff and inmate populations. 

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
import pandas as pd
import re
import prison_conditions_wrangle as pcw

### First, extracting the prison population of each state prior to any COVID-19 releases.

In [3]:
#Read in the data
population = pcw.import_clean_data('../data/may_19/ucla_0519_COVID19_related_prison_releases.csv')

population.head(5)

Unnamed: 0,state,facility,authorizing_agent,known_capacity_\n(if_available),population_prior_to_releases,overall_pop._reduction_/_\ntotal_number_of_releases,does_the_source_report_this_reduction_as_a_result_of_releases_only_or_a_mix_of_releases/policy_changes,does_the_source_give_a_more_detailed_breakdown_on_the_releases,breakdown_of_releases:\nparole_tech_violation_,breakdown_of_releases:\nlower_level/_non-violent_crimes,breakdown_of_releases:_\nshort_time_left_on_sentence,breakdown_of_releases:\nvulnerable_populations,breakdown_of_releases:_\nother_(please_explain),date,legal_filing,source(s),"additional_notes_(explaining_""other""_column,_defining_vulnerable_populations_in_this_instance,_etc.)",unnamed:_17,unnamed:_18,unnamed:_19
0,,,,,623877.0,24356,,,,,,,,NaT,,,,,,
1,Alabama,Statewide,Governor,,,Unknown,Releases only,Yes - See columns I-M,x,,,,x,2020-04-02,No,https://www.wsfa.com/2020/04/02/ivey-order-all...,County jail releases only (no prisons); focus ...,,,
2,Arkansas,Statewide,Parole Board/Gov,,,300,Releases only,No,,,,,x,2020-05-12,,https://www.nwahomepage.com/lifestyle/health/c...,First 300 released. 1234 approved by Parole Bo...,,,
3,California,Statewide,CDCR,,114000.0,3418,Releases only,Yes - See columns I-M,,x,x,,,2020-04-13,Yes,https://www.courtlistener.com/recap/gov.uscour...,This is a declaration opposing further relief ...,,,
4,Colorado,Statwide,Governor,,20000.0,3500,Mix,Yes - See columns I-M,,,x,x,,2020-04-23,No,https://www.cpr.org/2020/04/23/colorado-correc...,150 people released due to early release for m...,,,


In [4]:
population = population.dropna(subset=["state", "population_prior_to_releases"])
population = pcw.clean_str_cols(population, ["state"])
population = pcw.clean_numeric_cols(population, ["population_prior_to_releases"])

In [5]:
population.shape

(22, 20)

In [6]:
population.head()

Unnamed: 0,state,facility,authorizing_agent,known_capacity_\n(if_available),population_prior_to_releases,overall_pop._reduction_/_\ntotal_number_of_releases,does_the_source_report_this_reduction_as_a_result_of_releases_only_or_a_mix_of_releases/policy_changes,does_the_source_give_a_more_detailed_breakdown_on_the_releases,breakdown_of_releases:\nparole_tech_violation_,breakdown_of_releases:\nlower_level/_non-violent_crimes,breakdown_of_releases:_\nshort_time_left_on_sentence,breakdown_of_releases:\nvulnerable_populations,breakdown_of_releases:_\nother_(please_explain),date,legal_filing,source(s),"additional_notes_(explaining_""other""_column,_defining_vulnerable_populations_in_this_instance,_etc.)",unnamed:_17,unnamed:_18,unnamed:_19
3,california,Statewide,CDCR,,114000.0,3418,Releases only,Yes - See columns I-M,,x,x,,,2020-04-13,Yes,https://www.courtlistener.com/recap/gov.uscour...,This is a declaration opposing further relief ...,,,
4,colorado,Statwide,Governor,,20000.0,3500,Mix,Yes - See columns I-M,,,x,x,,2020-04-23,No,https://www.cpr.org/2020/04/23/colorado-correc...,150 people released due to early release for m...,,,
6,federal bop,Nationwide,AG,,174000.0,1440,Releases only,Yes - See columns I-M,,x,,x,,2020-04-22,No,https://www.politico.com/amp/news/2020/04/22/c...,Number as of 4.22.20; (As of 4/6/20: 886 had...,,,
9,hawaii,statewide,courts,,2189.0,823,Releases only,No,,,,,,2020-05-09,Yes,https://www.staradvertiser.com/2020/05/09/hawa...,This article reports a reduction of 832. Anoth...,afc,,
10,illinois,Decatur,,,,6,Releases only,Yes - See columns I-M,,,,,x,2020-03-29,,https://www.chicagotribune.com/coronavirus/ct-...,Mothers and newborns,,,


In [7]:
#Select on the necessary columns; drop any rows that don't have a population
prison_pop = pcw.select_columns(population, features=["state", "population_prior_to_releases"])
prison_pop = prison_pop.dropna()

prison_pop

Unnamed: 0,state,population_prior_to_releases
3,california,114000.0
4,colorado,20000.0
6,federal bop,174000.0
9,hawaii,2189.0
11,illinois,37000.0
13,iowa,8519.0
14,kentucky,12240.0
16,maine,2240.0
17,maryland,19050.0
19,massachusetts,7697.0


The dataframe above shows the population of the prison prior to releases. Some of the states are missing data.

### Second, extracting any social distance policies in place in prisons, as well as mitigation policies to attempt to alleviate the effects of isolation on prisoners.

In [8]:
policies = pcw.import_clean_data('../data/may_19/ucla_0519_visitation_policy_by_state.csv')

policies.head(5)

Unnamed: 0,state,suspended_visitations,explicitly_allows_lawyer_access,compensatory_remote_access_(phone),compensatory_remote_access_(video),effective_date,length,source(s),"additional_notes_(related_activity_suspensions,_explanation_of_compensatory_access,_waivers,_etc.)"
0,Alabama,X,,x,,2020-03-13,30 days,https://www.waaytv.com/content/coronavirus-con...,"also suspended volunteer entry, medical co-pay..."
1,Alaska,X,,X,,2020-03-13,,https://doc.alaska.gov/covid-19,2 free 15-minute calls/week (effective 3/19/20...
2,Arizona,X,,X,,2020-03-13,30 days,https://corrections.az.gov/sites/default/files...,2 x 15 min. calls/wk in addition to normal pho...
3,Arkansas,X,,X,X,2020-03-16,21 days,https://adc.arkansas.gov/images/uploads/COVID_...,Price of phone calls reduced; no connect fee o...
4,California,X,X,X,,2020-03-14,,https://www.cdcr.ca.gov/covid19/,"Beginning March 27, staff and visitors enteri..."


In [9]:
policies = policies.dropna(subset=["state"])
policies = pcw.clean_str_cols(policies, ["state"])

In [10]:
preset_dummies = ["suspended_visitations", "explicitly_allows_lawyer_access", "compensatory_remote_access_(phone)", 
                "compensatory_remote_access_(video)"]
new_cols = ["no_visits", "lawyer_access", "phone_access", "video_access"]

policies = pcw.transform_dummy_cols(policies, preset_dummies, new_cols)

In [11]:
#Show the wide variety of policies in place.

for val in policies["additional_notes_(related_activity_suspensions,_explanation_of_compensatory_access,_waivers,_etc.)"].unique():
    print(val)

also suspended volunteer entry, medical co-pays; 3/18: announced compensatory free 15 min phone call once per week 
2 free 15-minute calls/week (effective 3/19/2020). 
2 x 15 min. calls/wk in addition to normal phone call policies/written letter policies. However, all legal and non-legal visitation is suspended (as of 3/18)
Price of phone calls reduced; no connect fee on telephone calls: 15 cents per minute (Div. of Correction & Div. of Community Correction facilities). Video visitation: $2.50 for 30-min visit (state prisons); 15 cents/min for video visits at community correction centers. Rates take effect 03/20/20 (until further notice). Marshall Project: "Legal visits may be granted on a case-by-case basis."
 Beginning March 27, staff and visitors entering CDCR state prisons and community correctional facilities will undergo an additional touchless temperature screening before entering the facility in addition to the ongoing verbal symptom screening put in place on March 14th. Any no

In [12]:
policies = pcw.encode_policies_str(policies, 
                      "additional_notes_(related_activity_suspensions,_explanation_of_compensatory_access,_waivers,_etc.)")

In [13]:
distance_policies = pcw.select_columns(policies)

In [14]:
distance_policies.head(5)

Unnamed: 0,state,effective_date,no_visits,lawyer_access,phone_access,video_access,no_volunteers,limiting_movement,screening,healthcare_support
0,alabama,2020-03-13,1,0,1,0,1,0,0,1
1,alaska,2020-03-13,1,0,1,0,0,0,0,0
2,arizona,2020-03-13,1,0,1,0,0,0,0,0
3,arkansas,2020-03-16,1,0,1,1,0,0,0,0
4,california,2020-03-14,1,1,1,0,0,1,1,0


The dataframe above shows the policies in place in a given state, and the effective date of those policies, based on the UCLA dataset. 

### Third, extracting prison capacity and populaton (from the end of 2018) from the Bureau of Justice Statistics

Note on the below cell: I was having encoding errors reading my CSV. This [stack overflow post](https://stackoverflow.com/questions/54133455/importing-csv-using-pd-read-csv-invalid-start-byte-error) provided the solution above. 

In [15]:
import chardet    
rawdata = open('../data/prison_capacity_2018_state.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

Windows-1252


In [16]:
#Read in the data; the file is formatted differently than the others, so we parse by hand
capacity = pd.read_csv('../data/prison_capacity_2018_state.csv', engine="python", header=11, skiprows=[12, 13], skipfooter=12,
                       encoding="Windows-1252")
capacity.columns = capacity.columns.str.lower()
capacity.columns = capacity.columns.str.replace(" ", "_")
capacity.rename(columns={'unnamed:_1': 'state'}, inplace=True)

capacity = pcw.select_columns(capacity, ["state", "rated", "operational", "custody_population"])

capacity.head(5)

Unnamed: 0,state,rated,operational,custody_population
0,Alabama/b,...,22176,20875
1,Alaska/c,4838,/,4235
2,Arizona/d,39714,41447,41937
3,Arkansas,16081,16120,15578
4,California,/,122302,117937


In [17]:
capacity = pcw.clean_str_cols(capacity, ["state"])
capacity = pcw.clean_numeric_cols(capacity, ["rated", "operational", "custody_population"])

In [18]:
# The operational capacity is the default for a prison capacity - this is the capacity of a state based on staffing and 
# services. The rated capacity is the capacity as established by a rating official. The custody population is the number
# of people actually incarcerated. For details on the get_capacity() function, see the prison_conditions_wrangle module
capacity = pcw.get_cap_pct(capacity, "operational", ["rated", "custody_population"])

In [19]:
prison_capacity = pcw.select_columns(capacity, ["state", "custody_population", "capacity", "pct_occup"])

In [20]:
prison_capacity.head()

Unnamed: 0,state,custody_population,capacity,pct_occup
0,alabama,20875,22176.0,0.941333
1,alaska,4235,4838.0,0.875362
2,arizona,41937,41447.0,1.011822
3,arkansas,15578,16120.0,0.966377
4,california,117937,122302.0,0.96431


The final dataframe shows the custody population, capacity, and percent occupancy by state, as of 2018.

### Finally, merging all the dataframes together for a summary of prison conditions.

In [21]:
policies_and_pop = prison_capacity.merge(distance_policies, how="outer", on="state")
prison_conditions = prison_pop.merge(policies_and_pop, how="outer", on="state")
prison_conditions.rename(columns={"custody_population":"pop_2018","population_prior_to_releases":"pop_2020"}, inplace=True)

prison_conditions["pop_2020"].fillna(prison_conditions["pop_2018"], inplace=True)

In [22]:
prison_conditions

Unnamed: 0,state,pop_2020,pop_2018,capacity,pct_occup,effective_date,no_visits,lawyer_access,phone_access,video_access,no_volunteers,limiting_movement,screening,healthcare_support
0,california,114000.0,117937.0,122302.0,0.96431,2020-03-14,1,1,1,0,0,1,1,0
1,colorado,20000.0,16086.0,14738.0,1.091464,2020-03-11,1,1,0,0,1,1,0,0
2,federal bop,174000.0,,,,2020-03-13,1,0,1,1,1,1,0,0
3,hawaii,2189.0,3527.0,3527.0,1.0,2020-03-13,1,1,0,0,0,0,0,0
4,illinois,37000.0,39392.0,51329.0,0.767441,2020-03-14,1,1,1,1,0,0,0,0
5,iowa,8519.0,8559.0,6934.0,1.234352,2020-03-14,1,0,0,0,1,0,0,0
6,kentucky,12240.0,12290.0,12784.0,0.961358,2020-03-14,1,1,1,0,0,0,0,0
7,maine,2240.0,2384.0,2591.0,0.920108,2020-03-14,1,0,0,0,0,0,0,0
8,maryland,19050.0,19180.0,21072.0,0.910213,2020-03-12,1,1,0,0,1,0,0,0
9,massachusetts,7697.0,8454.0,10208.0,0.828174,2020-03-12,1,1,0,0,0,0,0,0


## Merging Prison Conditions Data with Marshall Project data

The Marshall Project compiles data on the prevalence of coronavirus infection in prisons across the country. The data is downloadable on their GitHub repository: [https://github.com/themarshallproject/COVID_prison_data]. 

In [23]:
# Download the Marshall data and assign data types.
demographics = ['state', 'pop_2020', 'pop_2018', 'capacity', 'pct_occup']
policies = ['no_visits', 'lawyer_access', 'phone_access', 'video_access', 'no_volunteers', 'limiting_movement', 'screening',
                'healthcare_support']

marshall_dtypes = {'name': str,
                   'total_staff_cases': 'Int64',
                   'total_prisoner_cases': 'Int64',
                   'total_staff_deaths': 'Int64',
                   'total_prisoner_deaths': 'Int64',
                   'as_of_date': str}

marshall = pd.read_csv('https://raw.githubusercontent.com/themarshallproject/COVID_prison_data/master/data/covid_prison_cases.csv')

prison_conditions.replace('federal bop', 'federal', inplace=True)

marshall['as_of_date'] = pd.to_datetime(marshall['as_of_date'],
                                            format='%Y-%m-%d')
marshall['lower_name'] = marshall['name'].str.lower()

marshall.sort_values(by='as_of_date', inplace=True)

Once the Marshall Project data is loaded, populate the state's prison policies. Looping through each state, identify rows where the reported date is after the COVID-related policies are enacted. Update the policies in those rows with the related prison conditions. NA values for tests  cases, and deaths are back-filled then forward-filled with state numbers.

In [24]:
blank_policies = {k: 0 for k in policies}

df = marshall.merge(prison_conditions[demographics], left_on='lower_name',
                    right_on='state')
df = df.assign(**blank_policies)

for state in list(marshall['lower_name'].unique()):
    state_filter = df['lower_name'] == state
    date_filter = df['as_of_date'] > \
                    (prison_conditions[prison_conditions['state'] == state] \
                    ['effective_date'].values[0])
    for col in marshall.select_dtypes(include='number').columns.to_list():
        df.loc[state_filter, col] = df.loc[state_filter, col] \
                                    .fillna(method='bfill')
        df.loc[state_filter, col] = df.loc[state_filter, col] \
                                    .fillna(method='ffill')

    policies_state = prison_conditions.loc[prison_conditions['state'] == \
                        state, policies].reset_index(drop=True).iloc[0] \
                        .to_dict()

    df.loc[state_filter & date_filter] = df.loc[state_filter & date_filter] \
                                            .replace(to_replace=blank_policies,
                                                    value=policies_state)

df.drop(columns=['lower_name', 'state'], inplace=True)

In [27]:
pd.set_option('display.max_columns', None)

print(df.shape)
df.head(15)

(510, 26)


Unnamed: 0,name,abbreviation,staff_tests,staff_tests_with_multiples,prisoner_tests,prisoner_tests_with_multiples,total_staff_cases,total_prisoner_cases,staff_recovered,prisoners_recovered,total_staff_deaths,total_prisoner_deaths,as_of_date,notes,pop_2020,pop_2018,capacity,pct_occup,no_visits,lawyer_access,phone_access,video_access,no_volunteers,limiting_movement,screening,healthcare_support
0,Delaware,DE,301.0,,4.0,,0.0,0.0,53.0,132.0,0.0,0.0,2020-03-24,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
1,Delaware,DE,301.0,,5.0,,0.0,0.0,53.0,132.0,0.0,0.0,2020-04-01,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
2,Delaware,DE,301.0,,136.0,,6.0,2.0,53.0,132.0,0.0,0.0,2020-04-08,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
3,Delaware,DE,301.0,,136.0,,18.0,13.0,53.0,132.0,0.0,0.0,2020-04-15,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
4,Delaware,DE,301.0,,136.0,,26.0,41.0,53.0,132.0,0.0,1.0,2020-04-22,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
5,Delaware,DE,301.0,,200.0,,38.0,67.0,53.0,132.0,0.0,3.0,2020-04-28,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
6,Delaware,DE,301.0,,267.0,,64.0,125.0,53.0,132.0,0.0,3.0,2020-05-07,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
7,Delaware,DE,656.0,,282.0,,85.0,139.0,53.0,132.0,0.0,6.0,2020-05-13,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
8,Delaware,DE,656.0,,282.0,,88.0,141.0,53.0,132.0,0.0,6.0,2020-05-20,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
9,Delaware,DE,771.0,,282.0,,88.0,148.0,53.0,132.0,0.0,7.0,2020-05-29,,5582.0,5582.0,5566.0,1.002875,1,1,1,0,0,0,0,0
