In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yaml

We reviewed four different sources of records relating to segregation placements at the Northwest ICE Processing Center/Northwest Detention Center (NWIPC/NWDC). These include two internal datasets created by GEO Group or its employees; and two installments of data from ICE’s Segregation Review Management System (SRMS). We detail each of these sources in turn below.

The NWDC segregation datasets created by GEO Group were obtained from ICE via our ongoing FOIA lawsuit; we believe it is the first time that such internal records have been made publicly available. The first of these, with the original filename `Sep_1_2013_to_March_31_2020_SMU_geotrack_report_Redacted.pdf`, is described by US DOJ attorneys for ICE as follows:

> “The GEOtrack report that was provided to Plaintiffs runs from September 1, 2013 to March 31, 2020.  That report not only reports all placements into segregation, but it also tracks movement.  This means that if an individual is placed into one particular unit then simply moves to a different unit, it is tracked in that report (if an individual is moved from H unit cell 101 to H unit cell 102, it would reflect the move as a new placement on the report).”

We refer to this report here by the shorthand "SMU" for "Special Management Unit".

The second internal dataset, with the original filename `15_16_17_18_19_20_RHU_admission_Redacted.xlsx`, is described by US DOJ attorneys for ICE as follows:

> “The [RHU] spreadsheet runs from January 2015 to May 28, 2020 and was created by and for a lieutenant within the facility once he took over the segregation lieutenant duties. The spreadsheet is updated once a detainee departs segregation. The subjects who are included on this list, therefore, are those who were placed into segregation and have already been released from segregation. It does not include those individuals who are currently in segregation.”

We refer to this report here by the shorthand "RHU" for "Restricted Housing Unit". (US DOJ attorneys for ICE specified that the terms “Special Management Unit” and “Restricted Housing Unit” are interchangeable and identify the same locations.)

The Segregation Review Management System (SRMS) datasets are maintained by ICE based on segregation placements reported under ICE's national guidelines, including placements for longer than 14 days (or 14 days during a 21-day period) and involving people with “special vulnerabilities.” UWCHR received two installments of SRMS data for NWDC: the first, released in 2019, covers the period from 2013-05-13 to 2018-05-14; the second, released in 2020, covers the period from 2013-09-03 to 2020-03-16.

The SRMS is the only available source of data for national-level analysis of segregation placements and comparisons between various detention facilities. Two investigative journalism and advocacy organizations, the International Consortium for Investigative Journalism (ICIJ) and Project On Government Oversight (POGO), have released national SRMS datasets covering different time periods.

A close review of these various datasets gives us an overview of solitary confinement practices at the NWDC, but also raises further questions about consistency of record-keeping and reporting. While the datasets are not directly comparable, and lack unique or consistent identifiers that would allow de-duplication across datasets; there are some characteristics we would expect if record-keeping and reporting was consistent:

- Between the GEO-created datasets for segregation placements at NWDC, SMU would be expected to show more and shorter placements than RHU, because it tracks specific placement locations within the NWDC; however, this would depend on frequency of transfers of detained people within SMU/RHU.

- SRMS should contain approximately the same number of long placements as SMU and RHU datasets, based on requirement to report stays longer than 14 days; plus any shorter placements involving populations with “special vulnerabilities.”

# SMU/RHU data

In [2]:
smu = pd.read_csv('../input/smu.csv.gz', sep='|', quotechar='"', compression='gzip')
smu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3433 entries, 0 to 3432
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   citizenship    3433 non-null   object
 1   housing        3433 non-null   object
 2   assigned_dt    3433 non-null   object
 3   removed_dt     3433 non-null   object
 4   days_in_seg    3433 non-null   int64 
 5   assigned_date  3433 non-null   object
 6   assigned_hour  3433 non-null   object
 7   removed_date   3433 non-null   object
 8   removed_hour   3433 non-null   object
 9   hashid         3433 non-null   object
dtypes: int64(1), object(9)
memory usage: 268.3+ KB


In [3]:
assert pd.to_datetime(smu['assigned_dt'], errors='coerce').isnull().sum() == 0
smu['assigned_dt'] = pd.to_datetime(smu['assigned_dt'])
assert pd.to_datetime(smu['removed_dt'], errors='coerce').isnull().sum() == 0
smu['removed_dt'] = pd.to_datetime(smu['removed_dt'])

# SRMS data

In [4]:
srms_1 = pd.read_csv('../input/srms-1.csv.gz', sep='|', compression='gzip')
srms_2 = pd.read_csv('../input/srms-2.csv.gz', sep='|', compression='gzip')

In [5]:
srms_1.columns

Index(['tracking_number', 'gender', 'country_of_citizenship', 'facility_aor',
       'facility', 'facility:detloc', 'report_type', 'placement_date',
       'placement_reason', 'release_date', 'length_of_stay',
       'disciplinary_infraction', 'sanction_length', 'attorney_of_record',
       'attorney_notification', 'detainee_request',
       'compliance_with_detention_standards', 'mental_illness',
       'serious_medical_illness', 'serious_disability', 'ever_smi',
       'non_compliance_detail', 'special_criteria', 'id', 'current_review',
       'created', 'lgbti', 'item_type', 'placement_reason_type',
       'admin_or_disciplinary', 'detloc', 'hashid'],
      dtype='object')

In [6]:
srms_1['placement_date'] = pd.to_datetime(srms_1['placement_date'])
srms_1['release_date'] = pd.to_datetime(srms_1['release_date'])
srms_2['placement_date'] = pd.to_datetime(srms_2['placement_date'])
srms_2['release_date'] = pd.to_datetime(srms_2['release_date'])

In [7]:
srms_1['days_calc'] = (srms_1['release_date'] - srms_1['placement_date']) / np.timedelta64(1, 'D')
srms_2['days_calc'] = (srms_2['release_date'] - srms_2['placement_date']) / np.timedelta64(1, 'D')

There are no SRMS records with placement and release on same day; this suggests comparison with first-day-inclusive calculation for stay length for SMU/RHU records.

In [8]:
srms_1['days_calc'].describe()

count    357.000000
mean      59.605042
std       88.263162
min        1.000000
25%       19.000000
50%       30.000000
75%       62.000000
max      781.000000
Name: days_calc, dtype: float64

In [9]:
srms_2['days_calc'].describe()

count    453.000000
mean      59.410596
std       77.013067
min        1.000000
25%       20.000000
50%       31.000000
75%       68.000000
max      691.000000
Name: days_calc, dtype: float64

In [10]:
srms_1_annual = srms_1.set_index('placement_date').groupby(pd.Grouper(freq='AS'))['tracking_number'].count()
srms_2_annual = srms_2.set_index('placement_date').groupby(pd.Grouper(freq='AS'))['tracking_number'].count()
srms_1_monthly = srms_1.set_index('placement_date').groupby(pd.Grouper(freq='M'))['tracking_number'].count()
srms_2_monthly = srms_2.set_index('placement_date').groupby(pd.Grouper(freq='M'))['tracking_number'].count()

## Comparison of placement counts

In [11]:
data = pd.concat([rhu_annual, smu_annual, srms_1_annual, srms_2_annual], axis=1)

NameError: name 'rhu_annual' is not defined

In [None]:
data.columns = ['RHU', 'SMU', 'SRMS 1', 'SRMS 2']

In [None]:
data.plot(kind='bar')

In [None]:
data

In [None]:
data = pd.concat([rhu_monthly, smu_monthly, srms_1_monthly, srms_2_monthly], axis=1)
data.columns = ['RHU', 'SMU', 'SRMS 1', 'SRMS 2']
data.plot()

## Comparison avg. placement length over time

In [None]:
smu_mean = smu.set_index('assigned_dt').groupby([pd.Grouper(freq='Q')])['days_calc'].mean()

In [None]:
rhu_mean = rhu.set_index('date_in').groupby([pd.Grouper(freq='Q')])['total_days'].mean()

In [None]:
srms_1_mean = srms_1.set_index('placement_date').groupby([pd.Grouper(freq='Q')])['days_calc'].mean()

In [None]:
srms_2_mean = srms_2.set_index('placement_date').groupby([pd.Grouper(freq='Q')])['days_calc'].mean()

In [None]:
data = pd.concat([rhu_mean, smu_mean, srms_1_mean, srms_2_mean], axis=1)
data.columns = ['RHU', 'SMU', 'SRMS 1', 'SRMS 2']
data.plot()

In [None]:
data

## Comparing "long stays"

Solitary placements longer than 14 days should be reported to SRMS. Yet GEO records consistently report more long stays than long stays reflected in SRMS

In [None]:
rhu['long_stay'] = rhu['total_days'] > 14

In [None]:
srms_1['long_stay'] = srms_1['days_calc'] > 14
srms_2['long_stay'] = srms_2['days_calc'] > 14

In [None]:
rhu_long = rhu.set_index('date_in').groupby(pd.Grouper(freq='AS'))['long_stay'].sum()

In [None]:
smu_long = smu.set_index('assigned_dt').groupby(pd.Grouper(freq='AS'))['long_stay'].sum()

In [None]:
srms_1_long = srms_1.set_index('placement_date').groupby(pd.Grouper(freq='AS'))['long_stay'].sum()
srms_2_long = srms_2.set_index('placement_date').groupby(pd.Grouper(freq='AS'))['long_stay'].sum()

In [None]:
data = pd.concat([rhu_long, smu_long, srms_1_long, srms_2_long], axis=1)
data.columns = ['RHU', 'SMU', 'SRMS 1', 'SRMS 2']
data.plot(kind='bar')

In [None]:
data

### Comparison of only long RHU/SMU stays versus all SRMS stays

In [None]:
data = pd.concat([rhu_long, smu_long, srms_1_annual, srms_2_annual], axis=1)
data.columns = ['RHU long', 'SMU long', 'SRMS 1 all', 'SRMS 2 all']
data.plot(kind='bar')

In [None]:
data

In [None]:
rhu['long_stay'].sum() / len(rhu)

In [None]:
smu['long_stay'].sum() / len(smu)

In [None]:
srms_1['long_stay'].sum() / len(srms_1)

In [None]:
srms_2['long_stay'].sum() / len(srms_2)

In [None]:
len(rhu)

In [None]:
len(srms_2)

# Join datasets

In [None]:
smu.columns

In [None]:
smu_ex = smu[['assigned_date', 'removed_date']].copy()
smu_ex = smu_ex.rename({'assigned_date': 'placement_date', 'removed_date': 'release_date'}, axis=1) 
smu_ex.loc[:,'placement_date'] = pd.to_datetime(smu_ex['placement_date'])
smu_ex.loc[:,'release_date'] = pd.to_datetime(smu_ex['release_date'])
smu_ex['dataset'] = 'SMU'
# First day inclusive:
smu_ex['days_calc'] = (smu_ex['release_date'] - smu_ex['placement_date']) / np.timedelta64(1, 'D') + 1

In [None]:
rhu.columns

In [None]:
rhu_ex = rhu[['date_in', 'date_out']].copy()
rhu_ex = rhu_ex.rename({'date_in': 'placement_date', 'date_out': 'release_date'}, axis=1)
rhu_ex['dataset'] = 'RHU'
# First day inclusive:
rhu_ex['days_calc'] = (rhu_ex['release_date'] - rhu_ex['placement_date']) / np.timedelta64(1, 'D') + 1

In [None]:
srms_1.columns

In [None]:
srms_1_ex = srms_1[['placement_date', 'release_date']].copy()
srms_1_ex['dataset'] = 'SRMS 1'
# SRMS datasets are already first day inclusive:
srms_1_ex['days_calc'] = (srms_1_ex['release_date'] - srms_1_ex['placement_date']) / np.timedelta64(1, 'D')

srms_2_ex = srms_2[['placement_date', 'release_date']].copy()
srms_2_ex['dataset'] = 'SRMS 2'
# SRMS datasets are already first day inclusive:
srms_2_ex['days_calc'] = (srms_2_ex['release_date'] - srms_2_ex['placement_date']) / np.timedelta64(1, 'D')

In [None]:
df = pd.concat([smu_ex, rhu_ex, srms_1_ex, srms_2_ex], axis=0)

In [None]:
df = df.dropna()

In [None]:
# df.loc[:,'placement_date'] = pd.to_datetime(df['placement_date'])
# df.loc[:,'release_date'] = pd.to_datetime(df['release_date'])

In [None]:
# df['days_calc'] = (df['release_date'] - df['placement_date']) / np.timedelta64(1, 'D')

In [None]:
df['long_stay'] = df['days_calc'] > 14

In [None]:
table = pd.DataFrame()

In [None]:
table['total'] = df.groupby(['dataset'])['placement_date'].count()
table['min_date'] = df.groupby(['dataset'])['placement_date'].min()
table['max_date'] = df.groupby(['dataset'])['placement_date'].max()
table['avg_length'] = df.groupby(['dataset'])['days_calc'].mean()
table['med_length'] = df.groupby(['dataset'])['days_calc'].median()
table['min_length'] = df.groupby(['dataset'])['days_calc'].min()
table['max_length'] = df.groupby(['dataset'])['days_calc'].max()
# table['total_days'] = df.groupby(['dataset'])['days_calc'].sum()
table['total_long'] = df.groupby(['dataset'])['long_stay'].sum()
# table['pct_long'] = (df.groupby(['dataset'])['long_stay'].sum()) / (df.groupby(['dataset'])['placement_date'].count())

In [None]:
table

In [None]:
table.to_csv('../output/dataset_description.csv')

In [None]:
d = df[df['dataset'] != 'SRMS 1']
d = d[d['long_stay'] == True]
d = d.set_index('placement_date').loc['2015-01-03': '2020-03-16']

In [None]:
g = d.groupby([pd.Grouper(freq='AS'), 'dataset'])['long_stay'].sum().unstack()

In [None]:
g.plot(kind='bar')

In [None]:
g

In [None]:
d = df[df['dataset'] != 'SRMS 1']
d = d[d['long_stay'] == True]
d = d.set_index('placement_date').loc['2015-01-03':'2020-03-16']

In [None]:
g = d.groupby([pd.Grouper(freq='AS'), 'dataset'])['days_calc'].sum().unstack()

In [None]:
g.plot(kind='bar')

In [None]:
g.sum()

In [None]:
df[df['dataset'] == 'SRMS 2'].set_index('placement_date').loc['2015-01-03':'2020-03-16']['days_calc'].sum()