### NYC Inmates

###### This notebook aims to analyse and visualise inmate data collected by New York City prisons in order to answer questions  such as the possibility of a relationship between the mental health of an inmate and violence and whether or not prisons need to improve mental health facilities.

###### 1. Data Cleaning
###### 2. Exploratory Data Analysis
###### 3. Data Visualisation


In [14]:
#reading in the data
import pandas as pd
from scipy import stats
import plotly as go
df = pd.read_csv('daily-inmates-in-custody.csv')


In [8]:
# checking missing values
df.isnull().sum()

INMATEID                 0
ADMITTED_DT              0
DISCHARGED_DT         7151
CUSTODY_LEVEL          156
BRADH                    0
RACE                    24
GENDER                  24
AGE                     13
INMATE_STATUS_CODE       0
SEALED                   0
SRG_FLG                  0
TOP_CHARGE             872
INFRACTION               0
dtype: int64

In [9]:
df = df.replace(to_replace='W',value='White')
df = df.replace(to_replace='B',value='Black')
df = df.replace(to_replace='O',value='Other Pacific Islander')
df = df.replace(to_replace='A',value='Asian')
df = df.replace(to_replace='I',value='Indian')

In [10]:
race = df['RACE'].value_counts(normalize=False).to_frame()
rlabels = race.index.tolist()
rvalues = race['RACE'].tolist()
race_age = df[(df['SRG_FLG']=='Y') & (df['RACE'])]
race = race_age['RACE'].value_counts(normalize=False).to_frame()
ralabels = race.index.tolist()
ravalues = race['RACE'].tolist()


###### All the values in the DISCHARGED_DT column are missing so I am dropping the column.

In [11]:
df.dropna(axis=1,how='all')

Unnamed: 0,INMATEID,ADMITTED_DT,CUSTODY_LEVEL,BRADH,RACE,GENDER,AGE,INMATE_STATUS_CODE,SEALED,SRG_FLG,TOP_CHARGE,INFRACTION
0,152258,2018-08-24T01:46:33.000,MIN,Y,White,M,54.0,DE,N,N,140.25,N
1,20124341,2018-06-13T00:59:55.000,MAX,Y,White,M,23.0,DE,N,N,125.25,Y
2,155323,2019-02-05T12:04:19.000,MAX,N,Black,M,31.0,CS,N,Y,105.05,Y
3,118754,2019-02-24T18:22:23.000,MAX,N,Other Pacific Islander,M,28.0,CS,N,N,,Y
4,20203998,2019-05-02T01:35:22.000,MAX,Y,Black,M,27.0,DE,N,N,125.25,Y
5,20200982,2019-08-09T18:03:42.000,MIN,N,Other Pacific Islander,M,57.0,CSP,N,N,160.05,N
6,20100177,2019-06-29T12:49:17.000,MIN,N,Other Pacific Islander,M,48.0,CS,N,N,155.25,N
7,20006004,2019-05-23T13:19:00.000,MED,N,Black,M,34.0,DEP,N,N,155.30,N
8,20173572,2019-08-13T11:19:00.000,MED,N,Black,M,19.0,DE,N,N,160.15,N
9,49115,2019-02-12T20:02:00.000,MIN,N,Other Pacific Islander,M,57.0,CS,N,N,110-140.20,N


###### The null hypothesis states that there is no relationship between mental health and being potentially violent. 
###### The alternative hypothesis states that there is a strong relationship between inmates under mental observation and being potentially violent.

### Significance Test

In [19]:
from scipy import stats

r = stats.pearsonr((df['BRADH'] == 'Y'),(df['INFRACTION'] == 'Y'))
r

(0.17296678612605065, 3.823568226592532e-49)

###### A p value of 3.8e-49 disproves the null hypothesis and leads me to accepting the alternative hypothesis which states that inmates under mental observation are in fact potentially violent. This shows that there is statistical significance between the two variables.

##### A count of inmates who are under mental observation and have an infraction

In [16]:
mental_obs_custody = df[(df.BRADH == 'Y') & (df.INFRACTION == 'Y')]
mental_obs_custody.count()

INMATEID              1313
ADMITTED_DT           1313
DISCHARGED_DT            0
CUSTODY_LEVEL         1313
BRADH                 1313
RACE                  1312
GENDER                1312
AGE                   1313
INMATE_STATUS_CODE    1313
SEALED                1313
SRG_FLG               1313
TOP_CHARGE            1219
INFRACTION            1313
dtype: int64

 ###### This is a count of all inmates who are under mental observation and are potentially harmful.

In [None]:
mental_custody = mental_obs_custody['CUSTODY_LEVEL'].value_counts(normalize=False).to_frame()
label = mental_custody.index.tolist()
value = mental_custody['CUSTODY_LEVEL'].tolist()



##### Distributions of custody levels of inmates under metal observation

In [None]:
mental = df[(df['BRADH'] == 'Y') & (df['CUSTODY_LEVEL'])]
mental_custody_level = mental['CUSTODY_LEVEL'].value_counts(normalize=False).to_frame()
label = mental_custody_level.index.tolist()
value = mental_custody_level['CUSTODY_LEVEL'].tolist()



##### A count of inmates under mental observation

In [None]:
date = df['BRADH'].value_counts(normalize=False).to_frame()
labels = date.index.tolist()

values = date['BRADH'].tolist()



##### Distribution of inmates across all custody levels

In [None]:
custodyl = df['CUSTODY_LEVEL'].value_counts(normalize=False).to_frame()
labels = custodyl.index.tolist()
values = custodyl['CUSTODY_LEVEL'].tolist()


##### Distribution of races

In [None]:
race = df['RACE'].value_counts(normalize=False).to_frame()
labels = race.index.tolist()
values = race['RACE'].tolist()


#### Inmates who are affiliated with a gang

In [None]:
date = df['SRG_FLG'].value_counts(normalize=False).to_frame()
labels = date.index.tolist()

values = date['SRG_FLG'].tolist()


#### Distribution of inmates' ages

In [None]:
age = df['AGE'].value_counts(normalize=False).to_frame()
labels = age.index.tolist()
values = age['AGE'].tolist()


#### Inmates with infractions

In [None]:
infract = df['INFRACTION'].value_counts(normalize=False).to_frame()
labels = infract.index.tolist()
values = infract['INFRACTION'].tolist()


#### Gender

In [None]:
gender = df['GENDER'].value_counts(normalize=False).to_frame()
label = gender.index.tolist()
value = gender['GENDER'].tolist()

