# Task-2 Data Wrangling

The team wants to understand whether the “Apprehension Site Landmark” column could provide clues about potential partnerships between local and state law enforcement and ICE.

Use this column to help identify which **jail** or **prison** individuals have been taken to, where available. Add:
* one column to the dataset indicating the name of the facility
* another showing the county where that facility is located

## 0. Set up

In [1]:
from pathlib import Path
from datetime import datetime

import numpy as np
import pandas as pd
import re

import process_data

In [2]:
arrests_filename = 'arrests-0923-0625.csv'
cwd = Path.cwd()
root = cwd.parent
data = root / "data"

In [62]:
arrests_df = process_data.read_arrests_data(data/arrests_filename)

## 1 - Apprehension Site Landmark - data exploration

#### Quick scan of data to understand different formats in the field:

In [4]:
arrests_df['apprehension_site_landmark'].value_counts().head(20)

apprehension_site_landmark
DALLAS COUNTY GENERAL AREA                             11103
MTG GENERAL AREA, NON-SPECIFIC                          9009
NDD - 26 FEDERAL PLAZA NY, NY                           5803
HARRIS COUNTY JAIL, HOUSTON, TX                         4713
LOS ANGELES COUNTY GENERAL AREA, NON-SPECIFIC           4387
HLG GENERAL AREA, NON-SPECIFIC                          3535
ATLANTA, GA                                             3442
SNA GENERAL AREA, NON-SPECIFIC                          3357
AUS GENERAL AREA, NON-SPECIFIC                          2869
FUGITIVE OPERATIONS MA                                  2827
CAP - MARICOPA COUNTY SHERIFFS OFFICE JAIL              2732
MIRAMAR ICE/ERO SUB-OFFICE                              2465
WAS GENERAL AREA, NON-SPECIFIC                          2359
MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT (TGK)     2334
EDN GENERAL AREA, NON-SPECIFIC                          2227
ICE ERO NEWARK                                          21

A lot of `GENERAL AREA, NON-SPECIFIC`, which are not relevant for this task as they do not tell us about any partnerships or any jails or prisons. Removing them to get a better look at the types of variables in this field:

In [14]:
arrests_df[~arrests_df['apprehension_site_landmark'].fillna('').str.contains('GENERAL AREA, NON-SPECIFIC')]['apprehension_site_landmark'].value_counts().iloc[50:70]

apprehension_site_landmark
BENTON COUNTY JAIL 287(G)                              498
LIMESTONE COUNTY DETENTION CENTER, GROESBECK, TEXAS    493
FUGITIVE OPERATIONS NY STATE                           488
FTM-LEE COUNTY JAIL                                    475
FUGITIVE OPERATIONS CA STATE                           458
TRAVIS COUNTY JAIL, AUSTIN, TEXAS - TX2270000          433
TAM-POLK COUNTY JAIL                                   432
STUART-MARTIN COUNTY JAIL, FLORIDA                     416
FEDERAL PRISON LOMPOC MEDIUM                           407
UNION COUNTY JAIL                                      404
FEDERAL PRISON LOMPOC FCI II                           403
GWINNETT COUNTY JAIL                                   402
MCAT AZ STATE                                          401
WCD GENERAL AREA                                       400
HUDSON COUNTY JAIL                                     398
FAYETTE COUNTY CORRECTIONS, KY                         396
US PENITENTIARY THOMSON      

In [13]:
arrests_df[arrests_df['apprehension_site_landmark'].fillna('').str.contains('PRISON')]['apprehension_site_landmark'].value_counts().head(20)

apprehension_site_landmark
FEDERAL PRISON LOMPOC MEDIUM                                    407
FEDERAL PRISON LOMPOC FCI II                                    403
AVENAL STATE PRISON                                             300
PA STATE PRISON                                                 151
IRONWOOD STATE PRISON BLYTHE, CA                                111
US MEDICAL CENTER FOR FEDERAL PRISONERS, MISSOURI               106
PLEASANT VALLEY STATE PRISON                                     99
VALLEY STATE PRISON                                              86
CORCORAN STATE PRISON                                            84
HIGH DESERT STATE PRISON (NDOC)                                  78
CAP-DAUPHIN COUNTY PRISON PA                                     74
LIVINGSTON PARISH PRISON                                         69
ASCENSION PARISH PRISON                                          67
EAST BATON ROUGE PARISH PRISON                                   67
CHESTER COUNTY PRISON

In [12]:
arrests_df[arrests_df['apprehension_site_landmark'].fillna('').str.contains('JAIL')]['apprehension_site_landmark'].value_counts().iloc[50:70]

apprehension_site_landmark
HAMILTON COUNTY JAIL, TN                   206
WEBER COUNTY JAIL - UT                     201
MIDDLESEX COUNTY JAIL                      201
STUART-SAINT LUCIE COUNTY JAIL, FLORIDA    199
ADAMS COUNTY JAIL                          195
DAVIS COUNTY JAIL - UT                     195
HALL COUNTY JAIL - 287(G)                  194
HAYS COUNTY JAIL, SAN MARCOS, TEXAS        193
SUMNER COUNTY JAIL, TN                     184
CAP - JEFFERSON COUNTY JAIL, AL STATE      178
ESCAMBIA COUNTY JAIL                       168
FT BEND CO JAIL, RICHMOND, TX              167
CAP - MADISON COUNTY JAIL, AL STATE        164
OAKLAND COUNTY JAIL, PONTIAC, MI           163
LUBBOCK COUNTY JAIL                        162
BOONE COUNTY JAIL, KY                      161
DENTON COUNTY JAIL                         160
GALVESTON CO JAIL, GALVESTON, TX           156
BERGEN COUNTY JAIL                         153
ORL - BREVARD COUNTY JAIL FL STATE         153
Name: count, dtype: int64

#### Observations:

* Quite a lot of times where State is not given along with County - Counties in different States can have the same name, so will have to keep this in mind
* `ROCKINGHAM/HARRISONBURG REGIONAL JAIL` - `/` is a bit annoying, might need to clean that before extracting jail names
* Codes before jail name, e.g. `ORL - MARION COUNTY JAIL FLORIDA STATE` - do we want to capture these too? `CAP` means `Criminal Alien Program`, so other codes could be names of programs too
* Some state prisons only have abbreviations not names e.g. `PA STATE PRISON` instead of Pennsylvania

#### What about other names for prisons/jails?

In [18]:
arrests_df[arrests_df['apprehension_method']=='CAP STATE INCARCERATION']['apprehension_site_landmark'].value_counts().iloc[100:150]

apprehension_site_landmark
CAP - ORANGE COUNTY JAIL NY STATE                              16
CLEVELAND ERO OFFICE                                           16
OMAHA CORRECTIONAL CENTER, NE                                  16
NCCI GARDNER                                                   16
HARTFORD, CT                                                   15
FL WOMENS RECEPTION CENTER                                     15
S.C.I GARDEN STATE, NEW JERSEY                                 15
SBD GENERAL AREA, NON-SPECIFIC                                 15
PROVIDENCE COUNTY COURT                                        14
SNJ GENERAL AREA, NON-SPECIFIC                                 14
MASSACHUSETTS TREATMENT CENTER                                 14
KDOC, HUTCHINSON CORRECTIONAL  FACILITY, HUTCHINSON, KANSAS    14
MLN GENERAL AREA, NON-SPECIFIC                                 13
OMAHA NE NON-FUGITIVE ARREST                                   13
LOS ANGELES COUNTY GENERAL AREA, NON-SPECIFIC    

In [None]:
arrests_df[arrests_df['apprehension_method']=='CAP STATE INCARCERATION']['apprehension_site_landmark'].value_counts().iloc[50:100]

**Notes:**
* "CORRECTIONAL"/"CORRECTIONS" in a lot of names
* Ones that will be trickier are e.g.:
  * "CSP FOLSOM" - CSP = California State Prison - I can extract the obvious ones I can find, but that might be biased to the ones that are most common
  * "ADOC SAFFORD" - ADOC="Arizona Department of Corrections
* PRISONS don't seem to often have county info
* There is a mix between Federal and State prisons here though - I think Federal prisons are out of scope because the focus is on State and Local involvement? If so, **how to identify between Federal and State??**

## 2 - Working out how to extract data 

#### 1. County:

In [64]:
county_expression = r"((?:\w+\s){0,3}\bCOUNTY)"

(expression worked out [here](https://regex101.com/r/OXnqFf/1))

In [20]:
prog = re.compile(county_expression)
result = prog.match('HALL COUNTY JAIL - 287(G)')

In [23]:
result.group(0)

'HALL COUNTY'

**But** - looks like some are abbreviated to county e.g. `FT BEND CO JAIL`, which is "Fort Bend County Jail"

Will see how common this is:

In [32]:
arrests_df[arrests_df['apprehension_site_landmark'].fillna('').str.contains(' CO ')]['apprehension_site_landmark'].value_counts()

apprehension_site_landmark
N DIST TX LUBBOCK DIV LUBBOCK CO NON CRIM         242
FT BEND CO JAIL, RICHMOND, TX                     167
GALVESTON CO JAIL, GALVESTON, TX                  156
BRAZORIA CO JAIL, ANGLETON, TX                    124
MINNEHAHA CO JAIL, SIOUX FALLS, SD                102
                                                 ... 
N DIST TX LUBBOCK DIV LYNN CO CRIM AT LARGE         1
N DIST TX AMARILLO DIV CARSON CO CRIM AT LARGE      1
HUGHES CO JAIL, PIERRE, SD                          1
N DIST TX AMARILLO DIV PARMER CO CRIM AT LARGE      1
N DIST TX LUBBOCK DIV LYNN CO P&P AT LARGE          1
Name: count, Length: 148, dtype: int64


Ok so definitely something to account for. Some of these don't seem to be jails/prisons though 

It looks like it's fine to replace 'CO' with 'COUNTY', and then treat them the same as the other 'COUNTY' entries:

In [29]:
arrests_df[
    (arrests_df['apprehension_site_landmark'].fillna('').str.contains(' CO ')) & 
    (arrests_df['apprehension_site_landmark'].fillna('').str.contains('JAIL'))
    ]['apprehension_site_landmark'].value_counts()

apprehension_site_landmark
FT BEND CO JAIL, RICHMOND, TX          167
GALVESTON CO JAIL, GALVESTON, TX       156
BRAZORIA CO JAIL, ANGLETON, TX         124
MINNEHAHA CO JAIL, SIOUX FALLS, SD     102
BRAZOS CO JAIL, BRYAN, TX              100
BROOKINGS CO JAIL, BROOKINGS, SD        33
DAVISON CO JAIL, MITCHELL, SD           28
JEFFERSON CO JAIL, BEAUMONT TX          27
CHAMBERS CO JAIL, ANAHUAC, TX           12
LAKE CO JAIL, MADISON, SD                9
FAYETTE CO JAIL, LA GRANGE, TX           7
BROWN CO JAIL, ABERDEEN, SD              5
NOBLES CO JAIL, WORTHINGTON, MN          5
BURLESON CO JAIL, CALDWELL, TX           5
COTTONWOOD CO JAIL, WINDOM, MN           4
ROBERTS CO JAIL, SISSETON, SD            3
DEWITT CO JAIL, CUERO, TX                3
MINER CO JAIL, HOWARD, SD                3
JEFFERSON CO JAIL, BEAUMONT, TX          2
CALHOUN CO JAIL, PORT LAVACA, TX         2
FAULK CO JAIL, FAULKTON, SD              2
TRIPP CO JAIL, WINNER, SD                1
HUGHES CO JAIL, PIERRE, SD 

In [63]:
arrests_df['apprehension_site_landmark'] = arrests_df['apprehension_site_landmark'].str.replace(' CO ', ' COUNTY ')

#### 2. Jail

Location information always seems to be before "JAIL", so can use the same expression

In [65]:
jail_expression = r"((?:\w+\s){0,3}\bJAIL)"

In [75]:
prog = re.compile(jail_expression)
result = prog.match('HALL COUNTY JAIL - 287(G)')

In [83]:
result.group(0)

'HALL COUNTY JAIL'

#### 3. Prison

Note - limitations and assumptions with this to explore in future work:
* Assumption that federal prisons are outside the scope of this project?
* Losing any location information that happens after "PRISON" - this could be improved in future work, but this appears to catch the majority of the cases

In [24]:
arrests_df[(arrests_df['apprehension_site_landmark'].fillna('').str.contains('PRISON|CORRECTION')) & 
    ~(arrests_df['apprehension_site_landmark'].fillna('').str.contains('FEDERAL'))]

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier
14,2024-07-19 03:00:00,PENNSYLVANIA,PHILADELPHIA AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2024-09-06,HONDURAS,YES,2003,HONDURAS,MALE,BUTLER COUNTY PRISON,0001ffaa13721ade141dceb83a6f45813bd59c9a
15,2024-06-17 07:07:00,NEW YORK,BUFFALO AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2024-08-23,HONDURAS,YES,1991,HONDURAS,MALE,WENDE CORRECTIONAL FACILITY,00025d4c106dd33de639055bc8af31d438002e45
26,2024-06-27 05:41:00,CALIFORNIA,LOS ANGELES AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,LOCATED,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[11] ADMINISTRATIVE DEPORTATION / REMOVAL,2024-07-26,EL SALVADOR,YES,1986,EL SALVADOR,MALE,"IRONWOOD STATE PRISON BLYTHE, CA",00063873021bc619937ae28bb10e88e5a3557089
45,2024-12-06 06:10:00,PENNSYLVANIA,PHILADELPHIA AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,3 OTHER IMMIGRATION VIOLATOR,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-04-01,ECUADOR,YES,1989,ECUADOR,MALE,LUZERNE COUNTY CORRECTIONAL FACILITY,000aad14058363fee2d30cced9da00bbd8925121
121,2025-01-21 06:27:00,DELAWARE,PHILADELPHIA AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,ACTIVE,[16] REINSTATED FINAL ORDER,NaT,,NO,1982,DOMINICAN REPUBLIC,MALE,HARRY R. YOUNG CORRECTIONAL INSTITUTION,001edcde2527cdb8ed6fcd7689fbed2c79261ad4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
263090,2025-06-16 03:45:00,WASHINGTON,SEATTLE AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,ACTIVE,[8B] EXCLUDABLE / INADMISSIBLE - UNDER ADJUDIC...,NaT,,NO,1996,EL SALVADOR,MALE,COYOTE RIDGE CORRECTIONS,ffe73529b0487406d2f69a05933623e7e2c8011f
263102,2024-01-09 08:45:00,CALIFORNIA,SAN FRANCISCO AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2024-01-17,MEXICO,YES,1993,MEXICO,MALE,PLEASANT VALLEY STATE PRISON,ffea4fa17c9a1a369bbeca4741f24a59793212e0
263152,2025-06-25 09:03:00,SOUTH CAROLINA,ATLANTA AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,ACTIVE,[8B] EXCLUDABLE / INADMISSIBLE - UNDER ADJUDIC...,NaT,,NO,1997,MEXICO,MALE,SOUTH CAROLINA DEPARTMENT OF CORRECTIONS,fff3a717a3afdee2b3ed7b6a2f1053a66e5548f8
263192,2024-05-03 07:05:00,NEW YORK,BUFFALO AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP STATE INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[3] DEPORTABLE - ADMINISTRATIVELY FINAL ORDER,2025-01-07,DOMINICAN REPUBLIC,YES,1996,DOMINICAN REPUBLIC,UNKNOWN,WENDE CORRECTIONAL FACILITY,fffe36c630d9ee3f0fac159e662ab8e1f979c81a


## 3- Extracting the relevant information

In [119]:
arrests_df['county'] = arrests_df['apprehension_site_landmark'].str.extract(county_expression)

**NOTE** - this is a bit of a clunky way to do it, if I get time I will come back to this, otherwise this is something that could be improved in the next stage of this work

In [106]:
arrests_df['jail'] = arrests_df['apprehension_site_landmark'].str.extract(jail_expression)

In [108]:
arrests_df['facility'] = np.where(
    ~arrests_df['jail'].isna(), arrests_df['jail'], np.where(
            (arrests_df['apprehension_site_landmark'].fillna('').str.contains('PRISON|CORRECTION')) & 
            ~(arrests_df['apprehension_site_landmark'].fillna('').str.contains('FEDERAL')), arrests_df['apprehension_site_landmark'], np.nan))

In [120]:
# only want counties associated with a relevant facility
arrests_df['county'] = np.where(~arrests_df['facility'].isna(), arrests_df['county'], np.nan) 

In [110]:
arrests_df[['apprehension_site_landmark','facility','county']].drop_duplicates().head(20)

Unnamed: 0,apprehension_site_landmark,facility,county
0,"HBG GENERAL AREA, NON-SPECIFIC",,
1,"HARRIS COUNTY JAIL, HOUSTON, TX",HARRIS COUNTY JAIL,HARRIS COUNTY
2,"FORT DIX EAST, NEW JERSEY",,
3,"SPM GENERAL AREA, NON-SPECIFIC",,
4,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,MIAMI DADE COUNTY JAIL,MIAMI DADE COUNTY
5,TAM-PINELLAS COUNTY JAIL,PINELLAS COUNTY JAIL,PINELLAS COUNTY
6,"MTG GENERAL AREA, NON-SPECIFIC",,
7,"SFR GENERAL AREA, NON-SPECIFIC",,
8,DALLAS COUNTY GENERAL AREA,,
10,"VEN GENERAL AREA, NON-SPECIFIC",,


#### Extracting city, state with county, if it exists:

Would also be good to capture city, state info with the county if it exists, because e.g. there is a `KENT COUNTY JAIL` in Grand Rapids, MI, and in Maryland

First, checking whether we need to capture this information...

In [27]:
arrests_df.groupby('county')['apprehension_state'].nunique().sort_values()

county
MONTOUR COUNTY        0
AVERY COUNTY          0
WASHITA COUNTY        0
DEAF SMITH COUNTY     0
MARIN COUNTY          0
                     ..
LEE COUNTY            9
MADISON COUNTY        9
JEFFERSON COUNTY     10
FRANKLIN COUNTY      11
WASHINGTON COUNTY    14
Name: apprehension_state, Length: 899, dtype: int64

In [29]:
arrests_df[arrests_df['apprehension_site_landmark'].fillna('').str.contains('WASHINGTON COUNTY')]['apprehension_site_landmark'].unique()

array(['WASHINGTON COUNTY JAIL',
       'WASHINGTON COUNTY CORRECTIONAL FACILITY',
       'WASHINGTON COUNTY JAIL, UT',
       'WASHINGTON COUNTY SHERIFF, ILLINOIS',
       'WASHINGTON COUNTY JAIL, TN', 'WASHINGTON COUNTY JAIL, MN',
       'A - WASHINGTON COUNTY JAIL', 'WASHINGTON COUNTY JAIL ID',
       'A - WASHINGTON COUNTY GENERAL AREA',
       'WASHINGTON COUNTY PROBATION AND PAROLE',
       'WASHINGTON COUNTY FUGITIVE OPERATIONS',
       'WASHINGTON COUNTY DETENTION CENTER', 'WASHINGTON COUNTY MD',
       'WASHINGTON COUNTY JAIL, IA', 'WASHINGTON COUNTY, MS',
       'WASHINGTON COUNTY', 'A-WASHINGTON COUNTY JAIL',
       'WASHINGTON COUNTY GENERAL AREA ID', 'WASHINGTON COUNTY JAIL, NE'],
      dtype=object)

So here we can see there are multiple county jails called "Washington County Jail", one in Utah, Tennessee, Idaho, Minnesota etc. - so we do need to make sure we extract the information if it is there.

Note - data limitation - sometimes it just says "WASHINGTON COUNTY JAIL", without the relevant state information, so that could refer many different places

In [70]:
city_state_expression = r"(?:,|PRISON|JAIL)(\s+\w.*$)"

(regex worked out [here](https://regex101.com/r/LHiqEk/1)

In [55]:
re.search(city_state_expression, 'KENT COUNTY JAIL GRAND RAPIDS, MI').group(1)

' GRAND RAPIDS, MI'

In [98]:
arrests_df['other_facility_loc_info'] = arrests_df['apprehension_site_landmark'].str.extract(city_state_expression)

In [111]:
arrests_df['other_facility_loc_info'] = np.where(
                        ~arrests_df['facility'].isna(), arrests_df['other_facility_loc_info'], np.nan)

probably makes sense to combine the location info with the county name for ease of use in analysis:

In [121]:
arrests_df['county'] = arrests_df['county'].fillna('') + arrests_df['other_facility_loc_info'].fillna('')

In [122]:
arrests_df[['apprehension_site_landmark','county','facility']].drop_duplicates().iloc[50:100]

Unnamed: 0,apprehension_site_landmark,county,facility
61,"STUART-OKEECHOBEE COUNTY JAIL, FLORIDA",OKEECHOBEE COUNTY FLORIDA,OKEECHOBEE COUNTY JAIL
62,"BASTROP COUNTY JAIL, BASTROP, TEXAS","BASTROP COUNTY BASTROP, TEXAS",BASTROP COUNTY JAIL
63,FUGITIVE OPERATIONS AR ARKANSAS,,
64,"MECKLENBURG COUNTY, NC",,
65,"TAL GENERAL AREA, NON-SPECIFIC",,
66,"LOS ANGELES COUNTY GENERAL AREA, NON-SPECIFIC",,
69,N DIST TX LUBBOCK DIV LUBBOCK COUNTY NON CRIM,,
71,"TEXAS DEPT OF CRIMINAL JUSTICE, WALKER CO",,
72,FEDERAL PRISON LOMPOC MEDIUM,,
74,"SEVIER COUNTY JAIL, TN",SEVIER COUNTY TN,SEVIER COUNTY JAIL


In [88]:
arrests_df.drop(['jail', 'other_facility_loc_info'], axis=1)

arrests_df.to_csv(data/'arrests_with_facility_county.csv', index=False)

## Limitations and Next Steps:


#### Limitations:
There are many different string approaches used in naming the Apprehension Site Landmark in the data. It would take some more detailed time on this to work through cleaning all of the different abbreviations etc. At the moment I think my approach captures the majority of the cases, but there are a number of additional facilities I am missing at the moment - e.g. I have not yet dealt with abbreviations, such as CSP == CALIFORNIA STATE PRISON.


#### Next steps for this work
* Add in additional facilities of interest: for example, are we also interested in police departments or sherrif's offices?
* Deal with abbreviations:
   * Expand state abbreviations (e.g TN -> TENNESSEE)
   * Identify prison and jail abbreviated names (e.g capture CSP FOLSOM, as CALIFORNIA STATE PRISON FOLSOM)
* There is other potentially important information in Apprehension Site Landmark that could be extracted to add to the investigation:
   * Potentially informative codes, e.g. CAP, or TX1080000 which [seems to refer to specific Texas arresting agency](https://www.dps.texas.gov/administration/crime_records/docs/cjis/arrestingAgencyORIs.xls)
