# Critical Incidents in Youth Mental Health Facilities in Utah

Story: Utah Found Few Rule Violations for Years<br /> 
Data Reporter: Will Craft, wcraft@apmreports.org

An APM Reports analysis of cricitical incident reports filed by youth mental health facilities in Utah found that the state...

### Data Overview

The raw data was obtained through a records request made by APM Reports. Dozens of pdf files were scraped into the following spreadsheet: "cleaned_critical_incidents.csv"

The spreadsheet contains all the critical incidents reports filed by youth mental health facilities around Utah. The spreadsheet contains information such as the facility, investigation name, the summary of the incident, the conclusion, a timeline concerning the start and end date of the investigation and the date that the incident occurred.

With that information, we used pandas to filter investigations by facility, filter investigations by year, and filter investigations concerning sexual abuse or conduct by year.


In [46]:
import pandas as pd

In [47]:
import altair as alt

In [48]:
from collections import OrderedDict

In [49]:
alt.renderers.enable('default')

RendererRegistry.enable('default')

In [261]:
df = pd.read_csv("/Users/josemartinez/Desktop/cleaned_critical_incidents.csv")

In [262]:
df.columns

Index(['facility_name', 'investigation_name', 'start_date', 'incident_date',
       'reported_date', 'finalized_date', 'summary', 'conclusion', 'file_path',
       'start_date_year', 'incident_date_year', 'reported_date_year',
       'finalized_date_year'],
      dtype='object')

# Here I start figuring out sexual misconduct and abuse cases

In [263]:
df.summary.count()

273

In [264]:
df.investigation_name.str.contains('sexual',case=False,regex=False).sum()

36

In [265]:
(36/273)*100

13.186813186813188

In [266]:
df.investigation_name.isna().sum()

11

In [267]:
sa=df[['investigation_name','start_date_year']]

In [269]:
sa1 = sa[(sa['start_date_year']== 2020)].investigation_name.str.contains('sexual',case=False,regex=False).sum()

In [270]:
print(f'''
     Investigations Related to Sexual Misconduct or Abuse in 2020 = {sa[(sa['start_date_year']== 2020)].investigation_name.str.contains('sexual',case=False,regex=False).sum()}
     Investigations Related to Sexual Misconduct or Abuse in 2019 = {sa[(sa['start_date_year']== 2019)].investigation_name.str.contains('sexual',case=False,regex=False).sum()}
     Investigations Related to Sexual Misconduct or Abuse in 2018 = {sa[(sa['start_date_year']== 2018)].investigation_name.str.contains('sexual',case=False,regex=False).sum()}
     Investigations Related to Sexual Misconduct or Abuse in 2017= {sa[(sa['start_date_year']== 2017)].investigation_name.str.contains('sexual',case=False,regex=False).sum()}''')


     Investigations Related to Sexual Misconduct or Abuse in 2020 = 14
     Investigations Related to Sexual Misconduct or Abuse in 2019 = 19
     Investigations Related to Sexual Misconduct or Abuse in 2018 = 1
     Investigations Related to Sexual Misconduct or Abuse in 2017= 0


## Here I start figuring out the number of incidents by facility

In [274]:
df.facility_name.value_counts()

provo_canyon_school_provo_canyon_springville___critical_incidents             15
provo_canyon_school_provo_canyon_school___critical_incidents                  14
youth_health_associates_draper_ranch_yha_draper_ranch___critical_incidents    10
three_points_center_three_points_center___critical_incidents                  10
synergy_youth_treatment_cornish_synergy_cornish___critical_incidents           9
                                                                              ..
live_for_life_cypress_live_for_life_cypress___critical_incidents               1
compass_academy_compass___critical_incidents                                   1
maple_lake_academy_boys_home_maple_lake_boys___critical_incidents              1
canyon_river_ranch_canyon_river___critical_incidents                           1
daniels_academy_heber_house_heber_house___critical_incidents                   1
Name: facility_name, Length: 93, dtype: int64

In [61]:
df1 = df.facility_name.value_counts().rename_axis('Facilities').reset_index(name='Incidents')

In [62]:
df1.shape

(93, 2)

In [115]:
data = df1[:5]

In [116]:
data1=data.copy()

In [117]:
data1['Facilities'] = data1['Facilities'].str.replace('__critical_incidents', '')

In [118]:
data1['Facilities']=data1['Facilities'].str.replace('_',' ')

In [119]:
data1['Facilities']=data1['Facilities'].str.title()

In [121]:
data1['Facilities']=data1['Facilities'].str.replace("Provo Canyon School",'',1)

In [123]:
data1['Facilities']=data1['Facilities'].str.replace('Three Points Center','',1)

In [125]:
data1['Facilities']=data1['Facilities'].str.replace('Synergy Cornish','',1)

In [127]:
data1['Facilities']=data1['Facilities'].str.replace('Yha Draper Ranch','',1)

In [128]:
data1

Unnamed: 0,Facilities,Incidents
0,Provo Canyon Springville,15
1,Provo Canyon School,14
2,Youth Health Associates Draper Ranch,10
3,Three Points Center,10
4,Synergy Youth Treatment Cornish,9


In [171]:
alt.Chart(data1, title="Critical Incidents by Facility").mark_bar().encode(
x=alt.X('Facilities',axis=alt.Axis(labelAngle=0),sort='y'),
y='Incidents',
).properties(
    width=1000,
    height=500)

## Here I start figuring out the number of incidents per year at Provo Canyon Only.

In [180]:
count = df[['facility_name','start_date_year']]

In [289]:
prov = count[count['facility_name']=='provo_canyon_school_provo_canyon_school___critical_incidents']

In [494]:
prov1=prov.copy()

In [495]:
prov1['facility_name']=prov1['facility_name'].str.replace('__critical_incidents', '')

In [496]:
prov1['facility_name']=prov1['facility_name'].str.replace('_',' ')

In [497]:
prov1['facility_name']=prov1['facility_name'].str.title()

In [498]:
prov1['facility_name']=prov1['facility_name'].str.replace("Provo Canyon School",'',1)

In [499]:
prov2 = prov1.start_date_year.value_counts().rename_axis('Year').reset_index(name='Incidents')

In [500]:
prov2.columns

Index(['Year', 'Incidents'], dtype='object')

In [502]:
prov2['Year'] = prov2['Year'].astype(str)

In [504]:
prov2['Year'] = prov2['Year'].str.replace('.','',1)

  prov2['Year'] = prov2['Year'].str.replace('.','',1)


In [506]:
prov2['Year'] = prov2['Year'].str[:-1]

In [507]:
prov2

Unnamed: 0,Year,Incidents
0,2019,7
1,2017,3
2,2020,2
3,2018,2


In [525]:
alt.Chart(prov2, title='Provo Incidents by Year').mark_bar().encode(
x=alt.X('Year',axis=alt.Axis(labelAngle=0)),y='Incidents',).properties(
    width=500,
    height=500)

# Here I figure out the # of incidents per year of all facilities

In [69]:
dfstart=df['start_date_year']

In [70]:
dfstart.value_counts()

2019.0    126
2020.0    112
2018.0     21
2017.0      9
Name: start_date_year, dtype: int64

In [27]:
df.reported_date_year.isna().sum()

70

In [28]:
df.start_date_year.isna().sum()

5

In [29]:
df.finalized_date_year.isna().sum()

3

In [30]:
print(f'''
     Reported Date Year NA = {df.reported_date_year.isna().sum()}
     Start Date Year NA = {df.start_date_year.isna().sum()}
     Finalized Date Year NA = {df.finalized_date_year.isna().sum()}
     Incident Date Year NA = {df.incident_date_year.isna().sum()}''')


     Reported Date Year NA = 70
     Start Date Year NA = 5
     Finalized Date Year NA = 3
     Incident Date Year NA = 72
