# Task 4 - Analysis

## Question 1
From Task 3 Step 1, analyze how the national background of those arrested changed before and after the start of the second Trump administration, broken down by ICE Area of Responsibility (AOR).


### 0- Setup

In [1]:
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
import altair as alt

import process_data

In [2]:
pd.set_option("display.max_rows", 300)
pd.set_option("display.max_columns", 200)

In [3]:
arrests_filename = 'arrests_with_facility_county.csv'
cwd = Path.cwd()
root = cwd.parent
data = root / "data"

In [13]:
arrests_df = pd.read_csv(data/arrests_filename, parse_dates=['apprehension_date','departed_date'])

In [14]:
arrests_df.head()

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier,county,jail,prison,facility,other_facility_loc_info
0,2024-08-07 09:43:00,VIRGINIA,WASHINGTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,NON-CUSTODIAL ARREST,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2024-08-19,HONDURAS,YES,1981,HONDURAS,MALE,"HBG GENERAL AREA, NON-SPECIFIC",0000b34edd657d516c02b13a7c352d62d0effcb6,,,,,
1,2024-10-19 08:33:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-10-22,MEXICO,YES,1984,MEXICO,MALE,"HARRIS COUNTY JAIL, HOUSTON, TX",0000ba6e459998a6046d185d82cf4349de1479d0,"HARRIS COUNTY HOUSTON, TX",HARRIS COUNTY JAIL,,HARRIS COUNTY JAIL,"HOUSTON, TX"
2,2025-04-15 10:08:00,NEW JERSEY,NEWARK AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP FEDERAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2025-06-10,DOMINICAN REPUBLIC,YES,1988,DOMINICAN REPUBLIC,MALE,"FORT DIX EAST, NEW JERSEY",0000c3d23fb0e444864559575900d410c4e8490f,,,,,
3,2025-06-03 09:20:00,MINNESOTA,ST. PAUL AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,YES,1985,COLOMBIA,FEMALE,"SPM GENERAL AREA, NON-SPECIFIC",0000d3dbf8033b5f209f6547ffee5b84feb4f599,,,,,
4,2025-01-21 05:41:00,,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,3-VOLUNTARY DEPARTURE CONFIRMED,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-01,MEXICO,YES,1983,MEXICO,MALE,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,000104d730bf021326c6dc0deb3dd575304136b5,,MIAMI DADE COUNTY JAIL,,MIAMI DADE COUNTY JAIL,


**Note** - saving df as csv (from task 2) has also messed up times in the datetime - must be something in the excel format. Can look to change in future if we need times

### 1- Data exploration and processing

**Key info:**
* Trump was inaugerated on 20th January 2025

**Dates approach:**
* Will likely want to look at month totals to smooth out -> create month var
* More data pre Trump, so will want to do some analysis looking at same number of days/months pre and post (for overall numbers/proportions etc.)

**National background approach:**
* Have the number of citizenship countries changed?
* What about the specific countries - what has changed about them?
   * Add in continent?
   * Any other country meta data to add in?
* Is this different depending on ICE Area of Responsibility (AOR)?

##### ICE Area of Responsibilitiy (AOR)

In [7]:
arrests_df['apprehension_aor'].value_counts(dropna=False)

apprehension_aor
MIAMI AREA OF RESPONSIBILITY             26925
NEW ORLEANS AREA OF RESPONSIBILITY       23598
DALLAS AREA OF RESPONSIBILITY            23185
HOUSTON AREA OF RESPONSIBILITY           21785
CHICAGO AREA OF RESPONSIBILITY           18370
ATLANTA AREA OF RESPONSIBILITY           16596
SAN ANTONIO AREA OF RESPONSIBILITY       15161
HARLINGEN AREA OF RESPONSIBILITY         11607
LOS ANGELES AREA OF RESPONSIBILITY       11575
NEWARK AREA OF RESPONSIBILITY             8506
PHOENIX AREA OF RESPONSIBILITY            8351
NEW YORK CITY AREA OF RESPONSIBILITY      8101
SALT LAKE CITY AREA OF RESPONSIBILITY     7664
BOSTON AREA OF RESPONSIBILITY             7148
WASHINGTON AREA OF RESPONSIBILITY         6955
PHILADELPHIA AREA OF RESPONSIBILITY       6240
ST. PAUL AREA OF RESPONSIBILITY           6124
NaN                                       5903
DETROIT AREA OF RESPONSIBILITY            5400
SAN FRANCISCO AREA OF RESPONSIBILITY      5277
DENVER AREA OF RESPONSIBILITY             4

In [7]:
arrests_df['apprehension_aor'].fillna('MISSING', inplace = True) 
# done because I don't want to lose these from the analysis, there may be a reason that AOR is missing

##### Citizenship country

In [9]:
arrests_df['citizenship_country'].isna().sum()

0

In [10]:
arrests_df['citizenship_country'].value_counts().reset_index().head(10)

Unnamed: 0,citizenship_country,count
0,MEXICO,101036
1,GUATEMALA,32638
2,HONDURAS,29628
3,VENEZUELA,15238
4,NICARAGUA,14688
5,EL SALVADOR,12041
6,COLOMBIA,9943
7,ECUADOR,9339
8,CUBA,6205
9,DOMINICAN REPUBLIC,5065


##### Adding continent info

**Note** this is a clunky, quick way to do this just to see whether it is something to include in the analysis. In the next steps of this analysis it would be good to bring in additional country information, and think about what geographic area is interesting (e.g. instead of contintent we could use slightly corser geographic areas that are meaningful, such as "Western Asia", "Central America" etc.)

In [8]:
continent_dict = {
    'NORTH AND CENTRAL AMERICA': [
        'TURKS AND CAICOS ISLANDS','TRINIDAD AND TOBAGO','ST. VINCENT-GRENADINES', 'ST. LUCIA','ST. KITTS-NEVIS',
        'SINT EUSTATIUS', 'SINT MAARTEN(DUTCH)', 'PANAMA','NICARAGUA','NETHERLANDS ANTILLES', 'MONTSERRAT','MEXICO',
        'JAMAICA','HAITI', 'HONDURAS','GRENADA', 'GUADELOUPE', 'GUATEMALA','CURACAO','CANADA', 'ANGUILLA','ANTIGUA-BARBUDA',
        'BAHAMAS', 'BARBADOS', 'BELIZE','BOLIVIA','COSTA RICA','CUBA','DOMINICA', 'DOMINICAN REPUBLIC','EL SALVADOR','BRITISH VIRGIN ISLANDS'
    ],
     'SOUTH AMERICA': [
        'VENEZUELA','URUGUAY','SURINAME','PERU','PARAGUAY','GUYANA','FRENCH GUIANA','ECUADOR','COLOMBIA', 'CHILE','BRAZIL','ARGENTINA'
    ],
     'EUROPE': [
        'YUGOSLAVIA','USSR','UNITED KINGDOM','UKRAINE','SWITZERLAND','SWEDEN','SPAIN','SLOVENIA','SLOVAKIA', 'SERBIA AND MONTENEGRO','SERBIA',
        'RUSSIA','ROMANIA','PORTUGAL','POLAND', 'NORWAY','NORTH MACEDONIA','NETHERLANDS','MONTENEGRO','MOLDOVA','MALTA','LITHUANIA','LATVIA',
        'KOSOVO','ITALY','IRELAND','ICELAND','HUNGARY', 'GREECE','GERMANY','FRANCE','FINLAND','ESTONIA','DENMARK','CZECHOSLOVAKIA','CZECH REPUBLIC',
        'CYPRUS','CROATIA','BULGARIA','BOSNIA-HERZEGOVINA','BELGIUM','BELARUS','AUSTRIA','ALBANIA', 'ANDORRA'
    ],
     'AFRICA': [
        'ZIMBABWE','ZAMBIA','UGANDA','TUNISIA','TOGO','TANZANIA','SUDAN','SOUTH SUDAN','SOUTH AFRICA','SOMALIA','SIERRA LEONE','SENEGAL',
        'SAO TOME AND PRINCIPE','RWANDA','NIGERIA','NIGER', 'NAMIBIA','MOZAMBIQUE','MOROCCO','MAURITIUS','MAURITANIA','MALI','LIBYA',
        'LIBERIA','KENYA','IVORY COAST','GUINEA', 'GUINEA-BISSAU','GHANA','GAMBIA','GABON','EGYPT','ETHIOPIA','ESWATINI','ERITREA','EQUATORIAL GUINEA',
        'DJIBOUTI','DEM REP OF THE CONGO','CONGO','CHAD','CENTRAL AFRICAN REPUBLIC','CAPE VERDE','CAMEROON','BURUNDI','BURKINA FASO','BOTSWANA',
        'BENIN','ANGOLA','ALGERIA'],
     'ASIA': [
        'YEMEN','VIETNAM','UZBEKISTAN','UNITED ARAB EMIRATES','TURKMENISTAN','TURKIYE','THAILAND','TAJIKISTAN','TAIWAN','SYRIA','SRI LANKA',
        'SOUTH KOREA','SAUDI ARABIA','PHILIPPINES','PAKISTAN','OMAN','NEPAL','MONGOLIA', 'MALAYSIA','MALAWI','LEBANON','LAOS','KYRGYZSTAN','KUWAIT',
        'KOREA','KAZAKHSTAN','JORDAN','JAPAN','ISRAEL','IRAQ','IRAN','INDONESIA', 'INDIA','HONG KONG', 'GEORGIA','EAST TIMOR','CHINA, PEOPLES REPUBLIC OF',
        'CAMBODIA','BURMA','BRUNEI','BHUTAN','BANGLADESH','BAHRAIN','AZERBAIJAN','ARMENIA', 'AFGHANISTAN'
    ],
     'OCEANIA': [
        'TONGA','SAMOA','PAPUA NEW GUINEA','PALAU','NEW ZEALAND','MICRONESIA, FEDERATED STATES OF','MARSHALL ISLANDS','FRENCH POLYNESIA',
        'FIJI', 'AUSTRALIA'
    ]}

In [9]:
country_continent_lookup = {}

for cont in continent_dict.keys():
    for coun in continent_dict[cont]:
        country_continent_lookup[coun] = cont
        

In [26]:
arrests_df['citizenship_continent'] = arrests_df['citizenship_country'].map(country_continent_lookup)

##### Date vars


In [11]:
trump_inaugaration_date = datetime.strptime('2025-01-20','%Y-%m-%d').date()

In [347]:
arrests_df['trump_bool'] = np.where(
    arrests_df['apprehension_date'].dt.date >= trump_inaugaration_date, "trump", "pre_trump")

In [17]:
arrests_df['apprehension_month_year'] = arrests_df['apprehension_date'].dt.to_period('M')
arrests_df['apprehension_day'] = arrests_df['apprehension_date'].dt.date

**Note** - the below is a rough approach, taking the exact same number of days pre and post the trump inaugeration. In future steps of this research I would check this makes sense, and whether we need to take into account seasonality, day of week, holidays etc. 

In [349]:
number_days_trump_administration = (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days
start_date = trump_inaugaration_date - timedelta(number_days_trump_administration)

eq_days_pre_post_trump = arrests_df[arrests_df['apprehension_day'] >= start_date]

In [337]:
# check equal:

(trump_inaugaration_date - start_date).days == (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days

True

In [350]:
eq_days_pre_post_trump['trump_bool'].value_counts()

trump_bool
trump        111567
pre_trump     44648
Name: count, dtype: int64

### 2- Analysis

In [351]:
def get_summary_table(df, groupby_col='citizenship_country'):
    num_arrests_by_cc = df.groupby('trump_bool')[groupby_col].value_counts().reset_index().rename(columns={0:'number_arrests'})
    
    pivot_num_by_cc = num_arrests_by_cc.pivot(index=groupby_col, values='count', columns='trump_bool')

    pivot_num_by_cc['fact_increase'] = pivot_num_by_cc['trump'] / pivot_num_by_cc['pre_trump']
    pivot_num_by_cc['num_increase'] = pivot_num_by_cc['trump'] - pivot_num_by_cc['pre_trump']

    for c in ['pre_trump','trump']:
        pivot_num_by_cc[f'{c}_perc_arrests'] = (pivot_num_by_cc[c] / pivot_num_by_cc[c].sum()) * 100

    return pivot_num_by_cc

#### AOR:

First looking to see which AORs have had the largest jump in numbers of arrests

In [352]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='fact_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SAN DIEGO AREA OF RESPONSIBILITY,241,1303,5.406639,1062,0.545126,1.173641
BOSTON AREA OF RESPONSIBILITY,879,3957,4.501706,3078,1.988238,3.564158
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,3.97654,3045,2.313956,3.664139
DENVER AREA OF RESPONSIBILITY,661,2242,3.391831,1581,1.495137,2.01942
BUFFALO AREA OF RESPONSIBILITY,316,1057,3.344937,741,0.71477,0.952064
DETROIT AREA OF RESPONSIBILITY,763,2503,3.280472,1740,1.725854,2.254508
PHILADELPHIA AREA OF RESPONSIBILITY,1004,3214,3.201195,2210,2.270979,2.894922
HQ AREA OF RESPONSIBILITY,7,22,3.142857,15,0.015834,0.019816
MIAMI AREA OF RESPONSIBILITY,4291,13345,3.109998,9054,9.705949,12.02014
ATLANTA AREA OF RESPONSIBILITY,2696,8330,3.089763,5634,6.098168,7.503017


In [353]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='num_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MIAMI AREA OF RESPONSIBILITY,4291,13345,3.109998,9054,9.705949,12.02014
ATLANTA AREA OF RESPONSIBILITY,2696,8330,3.089763,5634,6.098168,7.503017
DALLAS AREA OF RESPONSIBILITY,4273,9159,2.143459,4886,9.665234,8.249716
NEW ORLEANS AREA OF RESPONSIBILITY,4725,9494,2.009312,4769,10.687627,8.551458
HOUSTON AREA OF RESPONSIBILITY,4425,8419,1.902599,3994,10.009048,7.583182
SAN ANTONIO AREA OF RESPONSIBILITY,2479,5946,2.398548,3467,5.607329,5.355695
LOS ANGELES AREA OF RESPONSIBILITY,1705,5106,2.994721,3401,3.856594,4.599088
CHICAGO AREA OF RESPONSIBILITY,3385,6657,1.966617,3272,7.656639,5.996109
BOSTON AREA OF RESPONSIBILITY,879,3957,4.501706,3078,1.988238,3.564158
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,3.97654,3045,2.313956,3.664139


**Observations**
* Interesting to see the differnce in AORs with the largest increase in number of arrests vs proportion of arrests - from eyeballing, it looks like mainly cities are the ones with largest prop increase (San Diego, Boston, Washington (need to check if this means DC), Denver, Buffalo (NY), Detroit, Philly
* And then ones with the largest actual increase are ones which already had quite high numbers - Miami, Atlanta, Dallas, New Orleans, Houston, San Antonio, LA, Chicago -> NB, some of this is probably to do with population (e.g. LA and Chicago); would be good to add in populations (if mapping between county to aor is available)

##### Citizenship country

In [354]:
eq_days_pre_post_trump.groupby('trump_bool')['citizenship_country'].nunique()

trump_bool
pre_trump    168
trump        181
Name: citizenship_country, dtype: int64

In [358]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_country').sort_values(by='num_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
citizenship_country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MEXICO,19400.0,41786.0,2.153918,22386.0,43.450994,37.453727
GUATEMALA,5370.0,15627.0,2.910056,10257.0,12.027414,14.00683
HONDURAS,5332.0,12877.0,2.415041,7545.0,11.942304,11.541943
VENEZUELA,1628.0,7983.0,4.903563,6355.0,3.6463,7.155342
EL SALVADOR,2088.0,5367.0,2.570402,3279.0,4.676581,4.810562
COLOMBIA,1447.0,3548.0,2.45197,2101.0,3.240907,3.180152
CUBA,737.0,2834.0,3.845319,2097.0,1.65069,2.540178
NICARAGUA,1986.0,3811.0,1.918933,1825.0,4.448128,3.415885
ECUADOR,1309.0,2963.0,2.26356,1654.0,2.931822,2.655803
BRAZIL,564.0,1627.0,2.884752,1063.0,1.263214,1.458317


In [359]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_continent')

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
citizenship_continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFRICA,559,1738,3.109123,1179,1.252016,1.557808
ASIA,1181,4294,3.635902,3113,2.645135,3.848808
EUROPE,578,1396,2.415225,818,1.294571,1.251266
NORTH AND CENTRAL AMERICA,36597,86135,2.353608,49538,81.967837,77.204729
OCEANIA,85,152,1.788235,67,0.190378,0.136241
SOUTH AMERICA,5648,17852,3.160765,12204,12.650063,16.001147


In [360]:
aor_summary_country = eq_days_pre_post_trump.groupby('apprehension_aor').apply(get_summary_table)

In [361]:
aor_summary_country.sort_values(by='num_increase', ascending=False).head(20)

Unnamed: 0_level_0,trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ATLANTA AREA OF RESPONSIBILITY,MEXICO,1210.0,3356.0,2.773554,2146.0,44.881306,40.288115
MIAMI AREA OF RESPONSIBILITY,GUATEMALA,836.0,2875.0,3.438995,2039.0,19.482638,21.543649
DALLAS AREA OF RESPONSIBILITY,MEXICO,2545.0,4550.0,1.787819,2005.0,59.560028,49.677912
HARLINGEN AREA OF RESPONSIBILITY,MEXICO,1406.0,3196.0,2.273115,1790.0,65.243619,81.885729
MIAMI AREA OF RESPONSIBILITY,MEXICO,1025.0,2753.0,2.685854,1728.0,23.887206,20.629449
LOS ANGELES AREA OF RESPONSIBILITY,MEXICO,928.0,2573.0,2.772629,1645.0,54.428152,50.391696
NEW ORLEANS AREA OF RESPONSIBILITY,MEXICO,1944.0,3542.0,1.822016,1598.0,41.142857,37.307773
HOUSTON AREA OF RESPONSIBILITY,MEXICO,2148.0,3636.0,1.692737,1488.0,48.542373,43.188027
CHICAGO AREA OF RESPONSIBILITY,MEXICO,1437.0,2709.0,1.885177,1272.0,42.451994,40.694006
PHOENIX AREA OF RESPONSIBILITY,MEXICO,1285.0,2454.0,1.909728,1169.0,76.261128,69.479049


In [362]:
aor_summary_country.sort_values(by='fact_increase', ascending=False).head(20)

Unnamed: 0_level_0,trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
WASHINGTON AREA OF RESPONSIBILITY,INDIA,1.0,36.0,36.0,35.0,0.097752,0.884956
NEW YORK CITY AREA OF RESPONSIBILITY,HAITI,1.0,36.0,36.0,35.0,0.124224,1.483924
WASHINGTON AREA OF RESPONSIBILITY,AFGHANISTAN,1.0,31.0,31.0,30.0,0.097752,0.762045
WASHINGTON AREA OF RESPONSIBILITY,EGYPT,1.0,30.0,30.0,29.0,0.097752,0.737463
WASHINGTON AREA OF RESPONSIBILITY,CUBA,2.0,53.0,26.5,51.0,0.195503,1.302852
SEATTLE AREA OF RESPONSIBILITY,PERU,1.0,24.0,24.0,23.0,0.228833,1.848998
LOS ANGELES AREA OF RESPONSIBILITY,IRAN,3.0,68.0,22.666667,65.0,0.175953,1.331767
WASHINGTON AREA OF RESPONSIBILITY,"CHINA, PEOPLES REPUBLIC OF",1.0,21.0,21.0,20.0,0.097752,0.516224
DETROIT AREA OF RESPONSIBILITY,SENEGAL,1.0,21.0,21.0,20.0,0.131062,0.838993
SAN FRANCISCO AREA OF RESPONSIBILITY,VENEZUELA,3.0,62.0,20.666667,59.0,0.344037,2.859779


In [363]:
aor_summary_country.groupby(level=0, group_keys=False)['num_increase'].nlargest(5)

apprehension_aor                       citizenship_country       
ATLANTA AREA OF RESPONSIBILITY         MEXICO                        2146.0
                                       GUATEMALA                     1009.0
                                       HONDURAS                       783.0
                                       VENEZUELA                      584.0
                                       COLOMBIA                       230.0
BALTIMORE AREA OF RESPONSIBILITY       GUATEMALA                      273.0
                                       EL SALVADOR                    225.0
                                       HONDURAS                       147.0
                                       MEXICO                         137.0
                                       VENEZUELA                       59.0
BOSTON AREA OF RESPONSIBILITY          BRAZIL                         690.0
                                       GUATEMALA                      616.0
                      

Creating a plot to look at the country changes for each AOR:

In [374]:
# so the countries in plots are in the same order, for ease of comparison:
country_order = list(aor_summary_country.reset_index().groupby('citizenship_country')['trump'].sum().sort_values(ascending=False).index)

def country_pre_post_plot(aor_str, interactive=False):
    aor_country_plot_df = aor_summary_country[['pre_trump','trump']].stack().reset_index().rename(
        columns={0:'num_arrests'})
    
    chart = alt.layer(
            data=aor_country_plot_df[aor_country_plot_df['apprehension_aor']==aor_str]
        )
    
    chart += alt.Chart().mark_line(color='#9E9EA3').encode(
            x=alt.X('num_arrests:Q'),
            y=alt.Y('citizenship_country:N', sort=country_order),
            detail='citizenship_country:N',
        )
        # Add points for life expectancy in 1955 & 2000
    chart += alt.Chart().mark_point(
            size=100,
            opacity=0.5,
            filled=True
        ).encode(
            x=alt.X('num_arrests:Q', title="Number arrests"),
            y=alt.Y('citizenship_country:N', title="", sort=country_order),
            color=alt.Color('trump_bool',
                scale=alt.Scale(
                    domain=['pre_trump','trump'],
                    range=['#2a6ca8', '#bf1515']
                )
            )
        ).properties(width=800, title=aor_str)
    if interactive:
        return chart.interactive()
    return chart
    
    # use chart.interactive() if you want to look into what is happening for the countries with smaller n's. as you can then scroll in

In [368]:
aor_summary_country.stack().reset_index().rename(
        columns={0:'num_arrests'})

Unnamed: 0,apprehension_aor,citizenship_country,trump_bool,num_arrests
0,ATLANTA AREA OF RESPONSIBILITY,AFGHANISTAN,trump,1.000000
1,ATLANTA AREA OF RESPONSIBILITY,AFGHANISTAN,trump_perc_arrests,0.012005
2,ATLANTA AREA OF RESPONSIBILITY,ALBANIA,trump,1.000000
3,ATLANTA AREA OF RESPONSIBILITY,ALBANIA,trump_perc_arrests,0.012005
4,ATLANTA AREA OF RESPONSIBILITY,ANGOLA,trump,1.000000
...,...,...,...,...
8633,WASHINGTON AREA OF RESPONSIBILITY,VENEZUELA,trump_perc_arrests,8.210423
8634,WASHINGTON AREA OF RESPONSIBILITY,VIETNAM,trump,23.000000
8635,WASHINGTON AREA OF RESPONSIBILITY,VIETNAM,trump_perc_arrests,0.565388
8636,WASHINGTON AREA OF RESPONSIBILITY,YEMEN,trump,1.000000


In [375]:
country_pre_post_plot('NEW YORK CITY AREA OF RESPONSIBILITY', True)

In [376]:
country_pre_post_plot('BOSTON AREA OF RESPONSIBILITY')

In [377]:
country_pre_post_plot('ATLANTA AREA OF RESPONSIBILITY', interactive=True)

In [396]:
country_pre_post_plot('DALLAS AREA OF RESPONSIBILITY', interactive=True)

##### Observations:

* Seems like there has been a big increase across most countries, but in particular there are some countries which did not have many arrests at all before that have had huge proportional increases.
* Looks like the AORs with the biggest prop increases for these countries with lower numbers of arrests overall are the same AORs which saw the biggest overall proportion increase -> could this be an indication that these areas are arresting people from more countries?

In [380]:
num_countries_pre_post_by_aor = eq_days_pre_post_trump.groupby(['apprehension_aor', 'trump_bool'])['citizenship_country'].nunique().unstack()
num_countries_pre_post_by_aor.columns = [str(c) for c in num_countries_pre_post_by_aor.columns]
num_countries_pre_post_by_aor['delta'] = num_countries_pre_post_by_aor['trump'] - num_countries_pre_post_by_aor['pre_trump']

In [381]:
num_countries_pre_post_by_aor.sort_values(by='delta', ascending=False)

Unnamed: 0_level_0,pre_trump,trump,delta
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NEW YORK CITY AREA OF RESPONSIBILITY,50,88,38
HOUSTON AREA OF RESPONSIBILITY,47,84,37
ATLANTA AREA OF RESPONSIBILITY,60,96,36
BOSTON AREA OF RESPONSIBILITY,62,98,36
DALLAS AREA OF RESPONSIBILITY,64,99,35
MIAMI AREA OF RESPONSIBILITY,85,119,34
BUFFALO AREA OF RESPONSIBILITY,44,78,34
BALTIMORE AREA OF RESPONSIBILITY,40,71,31
DENVER AREA OF RESPONSIBILITY,29,58,29
SEATTLE AREA OF RESPONSIBILITY,40,69,29


**What does this mean** In trump's administration the number of countries people are being arrested from has increased in all AORs - this lends to idea of "indiscriminately"

#### Change over time (instead of just pre post)

Overall stats of pre and post trump give us some indications of patterns, but we can't see the true picture of what is happening. Next I am going to look at change over time.

In [382]:
def aor_month_cc(df, top_countries=None, aor_str=None, n=5):
    if aor_str is None:
        aor_str = df['apprehension_aor'].unique().item()
    aor_group = df[df['apprehension_aor'] == aor_str]
    aor_group_cc = aor_group[['apprehension_month_year','citizenship_country']].value_counts().reset_index().rename(columns={'count':'num_arrests'})
    if top_countries is None: # done so I can feed in top countries if I want them to be consisten across aors, rather than unique to the top for that aor
        top_countries = aor_group_cc.groupby('citizenship_country')['num_arrests'].sum().sort_values(ascending=False).head(n).index
    aor_group_cc['summary_country'] = np.where(
            aor_group_cc['citizenship_country'].isin(top_countries), aor_group_cc['citizenship_country'], 'OTHER COUNTRY')
    return aor_group_cc

In [383]:
COUNTRY_ORDER = ['EL SALVADOR','GUATEMALA','HONDURAS','MEXICO','VENEZUELA','OTHER COUNTRY']

def plot_area_chart(df,aor_str,top_countries=None):
    aor_summary = aor_month_cc(df, top_countries, aor_str)
    aor_summary['apprehension_month_year'] = aor_summary['apprehension_month_year'].dt.strftime('%Y-%m')
    return alt.Chart(aor_summary, title=aor_str).mark_bar().encode(
        x='apprehension_month_year:O',
        y='sum(num_arrests)',
        color=alt.Color('summary_country', sort=COUNTRY_ORDER)).properties(width=800, height=400)
    

In [385]:
top_countries_overall = lambda n: arrests_df[arrests_df['trump_bool']=='trump']['citizenship_country'].value_counts().head(n).index

In [386]:
plot_area_chart(arrests_df, 'WASHINGTON AREA OF RESPONSIBILITY', top_countries_overall(5))

In [391]:
plot_area_chart(arrests_df, 'NEW YORK CITY AREA OF RESPONSIBILITY', top_countries_overall(5)) 

In [395]:
plot_area_chart(arrests_df, 'DALLAS AREA OF RESPONSIBILITY', top_countries_overall(5)) 

In [401]:
plot_area_chart(arrests_df, 'BOSTON AREA OF RESPONSIBILITY', top_countries_overall(5))

Other ways to look at this would be:
* Number of citizenship countries by month by AOR

In [64]:
num_countries_over_time = arrests_df.groupby(['apprehension_aor','apprehension_month_year'])['citizenship_country'].nunique().reset_index()

num_countries_over_time['apprehension_month_year'] = num_countries_over_time['apprehension_month_year'].dt.strftime('%Y-%m')

In [65]:
num_countries_over_time['trump'] = '2025-01'

In [331]:
chart = alt.Chart(num_countries_over_time).mark_line().encode(
    x=alt.X('apprehension_month_year:O', title=None, axis=alt.Axis( grid = False, values=['2025-01'])),
    y=alt.Y('citizenship_country', title='Number of nationalities'),
    color=alt.Color('apprehension_aor', legend=None),
    tooltip=['apprehension_aor']
).properties(
    width=200,
    height=200,
).facet(
    facet=alt.Facet('apprehension_aor', title=None),
    columns=4,
        title={
        'text':'Some AORs have seen huge jumps in number of nationalities they are arresting under Trump',
        'subtitle':['Change in monthly number of unique citizenship countries for each ICE AOR', '']}
).resolve_axis(
    x='independent')

chart.save("aor_num_nationalities_monthly.svg")

chart

#### Next steps for this analysis:

* Check pre-post equal number of days makes sense and adjust if needed to account for holidays, weekends, seasonality etc.
* Significance tests:
  * If the number of citizenships is an area we want to go down, we can identify the areas that have seen a significant shift post Trump (rather than just gradually following the pre trump trend of increasing) - Regression Discontinuity or Interrupted Time Series (probably Interrupted time series because close to the inaugeration could be some overlaps of cases in progress etc. (need to think about how quickly we would expect to see the change, and what a reasonable comparison is)
  * Which country increases in number of arrests are significant, and for which AORs? - paired t-tests

## Question 2
Select one question from Task 3 Steps 2-4 and complete the analysis.

**Question:** are the AORs that have seen the big spike in "other countries" also the places that we are seeing local enforcement partnerships (ie this looks like "rounding up immigrants indiscriminantly")

In [266]:
arrests_df['apprehension_method'].value_counts()

apprehension_method
CAP LOCAL INCARCERATION                        112087
NON-CUSTODIAL ARREST                            56902
LOCATED                                         31394
CAP FEDERAL INCARCERATION                       23408
CAP STATE INCARCERATION                         10417
OTHER EFFORTS                                    9014
ERO REPROCESSED ARREST                           8875
287(G) PROGRAM                                   6313
PROBATION AND PAROLE                             3833
LAW ENFORCEMENT AGENCY RESPONSE UNIT              845
OTHER TASK FORCE                                  505
PATROL BORDER                                     477
OTHER AGENCY (TURNED OVER TO INS)                 444
WORKSITE ENFORCEMENT                              252
INSPECTIONS                                       127
ANTI-SMUGGLING                                     84
TRAFFIC CHECK                                      62
ORGANIZED CRIME DRUG ENFORCEMENT TASK FORCE        52
PATROL I

In [408]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_method').sort_values(by='num_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CAP LOCAL INCARCERATION,23345.0,45886.0,1.96556,22541.0,52.286777,41.128649
NON-CUSTODIAL ARREST,6309.0,27071.0,4.290854,20762.0,14.130532,24.264343
LOCATED,3221.0,20455.0,6.350512,17234.0,7.214209,18.334274
CAP FEDERAL INCARCERATION,5049.0,6684.0,1.323827,1635.0,11.308457,5.991019
OTHER EFFORTS,1298.0,2870.0,2.211094,1572.0,2.907185,2.572445
287(G) PROGRAM,1432.0,2308.0,1.611732,876.0,3.207311,2.068712
LAW ENFORCEMENT AGENCY RESPONSE UNIT,105.0,580.0,5.52381,475.0,0.235173,0.519867
OTHER TASK FORCE,8.0,451.0,56.375,443.0,0.017918,0.404241
CAP STATE INCARCERATION,2353.0,2718.0,1.155121,365.0,5.270113,2.436204
WORKSITE ENFORCEMENT,8.0,230.0,28.75,222.0,0.017918,0.206154


In [412]:
arrests_df[arrests_df['apprehension_method']=='LOCATED'][['apprehension_site_landmark','county','facility']].drop_duplicates().head(20)

Unnamed: 0,apprehension_site_landmark,county,facility
9,DALLAS COUNTY GENERAL AREA,,
11,ATD MA STATE,,
17,STUART,,
21,"FUG OPS - LOS ANGELES, CA STATE",,
26,"IRONWOOD STATE PRISON BLYTHE, CA",,IRONWOOD STATE PRISON
40,"SLC GENERAL AREA, NON-SPECIFIC",,
78,SEATTLE FUG OPS,,
80,ERO OFFICE DELEGATES DRIVE,,
100,"MIA GENERAL AREA, NON-SPECIFIC",,
103,"EDN GENERAL AREA, NON-SPECIFIC",,


So there has been a huge increase in CAP LOCAL INCARCERATION, NON-CUSTODIAL ARREST, and LOCATED since trump. Looks like the key options that are relevant for this investigation are "CAP LOCAL INCARCERATION" and "LOCATED" (note, LOCATED seems very broad, but some instances are at prisons or jails. 

In [447]:
get_summary_table(eq_days_pre_post_trump[eq_days_pre_post_trump['apprehension_method']=='CAP LOCAL INCARCERATION'], 'apprehension_aor').sort_values(
    by='num_increase', ascending=False).rename(columns={'pre_trump':'pre_trump_CAP_LOCAL','trump':'post_trump_CAP_LOCAL'})

trump_bool,pre_trump_CAP_LOCAL,post_trump_CAP_LOCAL,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MIAMI AREA OF RESPONSIBILITY,2106.0,5866.0,2.785375,3760.0,9.028165,12.793056
DALLAS AREA OF RESPONSIBILITY,2584.0,5372.0,2.078947,2788.0,11.077292,11.7157
ATLANTA AREA OF RESPONSIBILITY,1389.0,3961.0,2.851692,2572.0,5.954473,8.638475
NEW ORLEANS AREA OF RESPONSIBILITY,3362.0,5747.0,1.709399,2385.0,14.412483,12.533531
HOUSTON AREA OF RESPONSIBILITY,3038.0,4990.0,1.642528,1952.0,13.023535,10.882603
CHICAGO AREA OF RESPONSIBILITY,1426.0,3341.0,2.342917,1915.0,6.113088,7.286328
SAN ANTONIO AREA OF RESPONSIBILITY,1983.0,3343.0,1.68583,1360.0,8.500879,7.29069
SALT LAKE CITY AREA OF RESPONSIBILITY,771.0,2053.0,2.662776,1282.0,3.305183,4.477352
ST. PAUL AREA OF RESPONSIBILITY,400.0,1306.0,3.265,906.0,1.714751,2.848232
DETROIT AREA OF RESPONSIBILITY,417.0,1160.0,2.781775,743.0,1.787628,2.529824


In [426]:
eq_days_pre_post_trump[eq_days_pre_post_trump['apprehension_aor'] == 'CHICAGO AREA OF RESPONSIBILITY'][['apprehension_method','trump_bool']].value_counts().unstack()

trump_bool,pre_trump,trump
apprehension_method,Unnamed: 1_level_1,Unnamed: 2_level_1
287(G) PROGRAM,1.0,13.0
ANTI-SMUGGLING,1.0,
BOAT PATROL,,1.0
CAP FEDERAL INCARCERATION,308.0,429.0
CAP LOCAL INCARCERATION,1426.0,3341.0
CAP STATE INCARCERATION,143.0,172.0
ERO REPROCESSED ARREST,2.0,9.0
INSPECTIONS,,2.0
LAW ENFORCEMENT AGENCY RESPONSE UNIT,,8.0
LOCATED,226.0,866.0


In [437]:
eq_days_pre_post_trump[eq_days_pre_post_trump['apprehension_site_landmark'].fillna('').str.contains('POLICE| PD')][[
    'apprehension_aor','apprehension_method','final_program','apprehension_site_landmark']].drop_duplicates()

Unnamed: 0,apprehension_aor,apprehension_method,final_program,apprehension_site_landmark
56,ST. PAUL AREA OF RESPONSIBILITY,CAP LOCAL INCARCERATION,ERO CRIMINAL ALIEN PROGRAM,"BLOOMINGTON POLICE DEPARTMENT, MN"
149,HOUSTON AREA OF RESPONSIBILITY,LOCATED,FUGITIVE OPERATIONS,"FORT CAVAZOS POLICE DEPARTMENT, FORT CAVAZOS, ..."
332,NEWARK AREA OF RESPONSIBILITY,LOCATED,FUGITIVE OPERATIONS,PASSAIC PD
465,BOSTON AREA OF RESPONSIBILITY,LOCATED,FUGITIVE OPERATIONS,REVERE POLICE DEPT
643,NEW ORLEANS AREA OF RESPONSIBILITY,CAP LOCAL INCARCERATION,ERO CRIMINAL ALIEN PROGRAM,CAP - BARTLETT POLICE DEPARTMENT TN STATE
...,...,...,...,...
259775,NEWARK AREA OF RESPONSIBILITY,LOCATED,FUGITIVE OPERATIONS,MT OLIVE TWP PD
260829,NEWARK AREA OF RESPONSIBILITY,LOCATED,FUGITIVE OPERATIONS,ORANGE TWP PD
261112,CHICAGO AREA OF RESPONSIBILITY,CAP LOCAL INCARCERATION,FUGITIVE OPERATIONS,"ST CHARLES CITY POLICE DEPARTMENT, MISSOURI"
261744,PHOENIX AREA OF RESPONSIBILITY,LAW ENFORCEMENT AGENCY RESPONSE UNIT,ERO CRIMINAL ALIEN PROGRAM,LEAR - FT. MCDOWELL POLICE DEPARTMENT


In [446]:
arrests_df[arrests_df['trump_bool']=='trump'].groupby(['county', 'apprehension_aor']).size().sort_values(ascending=False).head(50)

county                                    apprehension_aor                     
HARRIS COUNTY HOUSTON, TX                 HOUSTON AREA OF RESPONSIBILITY           1630
HIDALGO COUNTY EDINBURG, TXN - TX1080000  HARLINGEN AREA OF RESPONSIBILITY          255
DAVIDSON COUNTY TN                        NEW ORLEANS AREA OF RESPONSIBILITY        245
TRAVIS COUNTY TEXAS                       SAN ANTONIO AREA OF RESPONSIBILITY        241
TRAVIS COUNTY AUSTIN, TEXAS - TX2270000   SAN ANTONIO AREA OF RESPONSIBILITY        178
MARTIN COUNTY FLORIDA                     MIAMI AREA OF RESPONSIBILITY              177
RUTHERFORD COUNTY TN                      NEW ORLEANS AREA OF RESPONSIBILITY        151
WILLIAMSON COUNTY GEORGETOWN, TEXAS       SAN ANTONIO AREA OF RESPONSIBILITY        129
HAMILTON COUNTY TN                        NEW ORLEANS AREA OF RESPONSIBILITY        124
PALM BEACH COUNTY FLORIDA                 MIAMI AREA OF RESPONSIBILITY              113
LEE COUNTY AL STATE                     