# Task 4 - Analysis

## Question 1
From Task 3 Step 1, analyze how the national background of those arrested changed before and after the start of the second Trump administration, broken down by ICE Area of Responsibility (AOR).


### 0- Setup

In [1]:
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
import altair as alt

import process_data

In [2]:
pd.set_option("display.max_rows", 300)
pd.set_option("display.max_columns", 200)

In [34]:
arrests_filename = 'arrests_with_facility_county.csv'
cwd = Path.cwd()
root = cwd.parent
data = root / "data"
figures = root / "figures"

In [4]:
arrests_df = pd.read_csv(data/arrests_filename, parse_dates=['apprehension_date','departed_date'])

In [14]:
arrests_df.head()

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier,county,jail,prison,facility,other_facility_loc_info
0,2024-08-07 09:43:00,VIRGINIA,WASHINGTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,NON-CUSTODIAL ARREST,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2024-08-19,HONDURAS,YES,1981,HONDURAS,MALE,"HBG GENERAL AREA, NON-SPECIFIC",0000b34edd657d516c02b13a7c352d62d0effcb6,,,,,
1,2024-10-19 08:33:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-10-22,MEXICO,YES,1984,MEXICO,MALE,"HARRIS COUNTY JAIL, HOUSTON, TX",0000ba6e459998a6046d185d82cf4349de1479d0,"HARRIS COUNTY HOUSTON, TX",HARRIS COUNTY JAIL,,HARRIS COUNTY JAIL,"HOUSTON, TX"
2,2025-04-15 10:08:00,NEW JERSEY,NEWARK AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP FEDERAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2025-06-10,DOMINICAN REPUBLIC,YES,1988,DOMINICAN REPUBLIC,MALE,"FORT DIX EAST, NEW JERSEY",0000c3d23fb0e444864559575900d410c4e8490f,,,,,
3,2025-06-03 09:20:00,MINNESOTA,ST. PAUL AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,YES,1985,COLOMBIA,FEMALE,"SPM GENERAL AREA, NON-SPECIFIC",0000d3dbf8033b5f209f6547ffee5b84feb4f599,,,,,
4,2025-01-21 05:41:00,,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,3-VOLUNTARY DEPARTURE CONFIRMED,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-01,MEXICO,YES,1983,MEXICO,MALE,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,000104d730bf021326c6dc0deb3dd575304136b5,,MIAMI DADE COUNTY JAIL,,MIAMI DADE COUNTY JAIL,


**Note** - saving df as csv (from task 2) has also messed up times in the datetime - must be something in the excel format. Can look to change in future if we need times

### 1- Data exploration and processing

**Key info:**
* Trump was inaugerated on 20th January 2025

**Dates approach:**
* Will likely want to look at month totals to smooth out -> create month var
* More data pre Trump, so will want to do some analysis looking at same number of days/months pre and post (for overall numbers/proportions etc.)

**National background approach:**
* Have the number of citizenship countries changed?
* What about the specific countries - what has changed about them?
   * Add in continent?
   * Any other country meta data to add in?
* Is this different depending on ICE Area of Responsibility (AOR)?

##### ICE Area of Responsibilitiy (AOR)

In [7]:
arrests_df['apprehension_aor'].value_counts(dropna=False)

apprehension_aor
MIAMI AREA OF RESPONSIBILITY             26925
NEW ORLEANS AREA OF RESPONSIBILITY       23598
DALLAS AREA OF RESPONSIBILITY            23185
HOUSTON AREA OF RESPONSIBILITY           21785
CHICAGO AREA OF RESPONSIBILITY           18370
ATLANTA AREA OF RESPONSIBILITY           16596
SAN ANTONIO AREA OF RESPONSIBILITY       15161
HARLINGEN AREA OF RESPONSIBILITY         11607
LOS ANGELES AREA OF RESPONSIBILITY       11575
NEWARK AREA OF RESPONSIBILITY             8506
PHOENIX AREA OF RESPONSIBILITY            8351
NEW YORK CITY AREA OF RESPONSIBILITY      8101
SALT LAKE CITY AREA OF RESPONSIBILITY     7664
BOSTON AREA OF RESPONSIBILITY             7148
WASHINGTON AREA OF RESPONSIBILITY         6955
PHILADELPHIA AREA OF RESPONSIBILITY       6240
ST. PAUL AREA OF RESPONSIBILITY           6124
NaN                                       5903
DETROIT AREA OF RESPONSIBILITY            5400
SAN FRANCISCO AREA OF RESPONSIBILITY      5277
DENVER AREA OF RESPONSIBILITY             4

In [5]:
arrests_df['apprehension_aor'].fillna('MISSING', inplace = True) 
# done because I don't want to lose these from the analysis, there may be a reason that AOR is missing

##### Citizenship country

In [9]:
arrests_df['citizenship_country'].isna().sum()

0

In [10]:
arrests_df['citizenship_country'].value_counts().reset_index().head(10)

Unnamed: 0,citizenship_country,count
0,MEXICO,101036
1,GUATEMALA,32638
2,HONDURAS,29628
3,VENEZUELA,15238
4,NICARAGUA,14688
5,EL SALVADOR,12041
6,COLOMBIA,9943
7,ECUADOR,9339
8,CUBA,6205
9,DOMINICAN REPUBLIC,5065


##### Adding continent info

**Note** this is a clunky, quick way to do this just to see whether it is something to include in the analysis. In the next steps of this analysis it would be good to bring in additional country information, and think about what geographic area is interesting (e.g. instead of contintent we could use slightly corser geographic areas that are meaningful, such as "Western Asia", "Central America" etc.)

In [6]:
continent_dict = {
    'NORTH AND CENTRAL AMERICA': [
        'TURKS AND CAICOS ISLANDS','TRINIDAD AND TOBAGO','ST. VINCENT-GRENADINES', 'ST. LUCIA','ST. KITTS-NEVIS',
        'SINT EUSTATIUS', 'SINT MAARTEN(DUTCH)', 'PANAMA','NICARAGUA','NETHERLANDS ANTILLES', 'MONTSERRAT','MEXICO',
        'JAMAICA','HAITI', 'HONDURAS','GRENADA', 'GUADELOUPE', 'GUATEMALA','CURACAO','CANADA', 'ANGUILLA','ANTIGUA-BARBUDA',
        'BAHAMAS', 'BARBADOS', 'BELIZE','BOLIVIA','COSTA RICA','CUBA','DOMINICA', 'DOMINICAN REPUBLIC','EL SALVADOR','BRITISH VIRGIN ISLANDS'
    ],
     'SOUTH AMERICA': [
        'VENEZUELA','URUGUAY','SURINAME','PERU','PARAGUAY','GUYANA','FRENCH GUIANA','ECUADOR','COLOMBIA', 'CHILE','BRAZIL','ARGENTINA'
    ],
     'EUROPE': [
        'YUGOSLAVIA','USSR','UNITED KINGDOM','UKRAINE','SWITZERLAND','SWEDEN','SPAIN','SLOVENIA','SLOVAKIA', 'SERBIA AND MONTENEGRO','SERBIA',
        'RUSSIA','ROMANIA','PORTUGAL','POLAND', 'NORWAY','NORTH MACEDONIA','NETHERLANDS','MONTENEGRO','MOLDOVA','MALTA','LITHUANIA','LATVIA',
        'KOSOVO','ITALY','IRELAND','ICELAND','HUNGARY', 'GREECE','GERMANY','FRANCE','FINLAND','ESTONIA','DENMARK','CZECHOSLOVAKIA','CZECH REPUBLIC',
        'CYPRUS','CROATIA','BULGARIA','BOSNIA-HERZEGOVINA','BELGIUM','BELARUS','AUSTRIA','ALBANIA', 'ANDORRA'
    ],
     'AFRICA': [
        'ZIMBABWE','ZAMBIA','UGANDA','TUNISIA','TOGO','TANZANIA','SUDAN','SOUTH SUDAN','SOUTH AFRICA','SOMALIA','SIERRA LEONE','SENEGAL',
        'SAO TOME AND PRINCIPE','RWANDA','NIGERIA','NIGER', 'NAMIBIA','MOZAMBIQUE','MOROCCO','MAURITIUS','MAURITANIA','MALI','LIBYA',
        'LIBERIA','KENYA','IVORY COAST','GUINEA', 'GUINEA-BISSAU','GHANA','GAMBIA','GABON','EGYPT','ETHIOPIA','ESWATINI','ERITREA','EQUATORIAL GUINEA',
        'DJIBOUTI','DEM REP OF THE CONGO','CONGO','CHAD','CENTRAL AFRICAN REPUBLIC','CAPE VERDE','CAMEROON','BURUNDI','BURKINA FASO','BOTSWANA',
        'BENIN','ANGOLA','ALGERIA'],
     'ASIA': [
        'YEMEN','VIETNAM','UZBEKISTAN','UNITED ARAB EMIRATES','TURKMENISTAN','TURKIYE','THAILAND','TAJIKISTAN','TAIWAN','SYRIA','SRI LANKA',
        'SOUTH KOREA','SAUDI ARABIA','PHILIPPINES','PAKISTAN','OMAN','NEPAL','MONGOLIA', 'MALAYSIA','MALAWI','LEBANON','LAOS','KYRGYZSTAN','KUWAIT',
        'KOREA','KAZAKHSTAN','JORDAN','JAPAN','ISRAEL','IRAQ','IRAN','INDONESIA', 'INDIA','HONG KONG', 'GEORGIA','EAST TIMOR','CHINA, PEOPLES REPUBLIC OF',
        'CAMBODIA','BURMA','BRUNEI','BHUTAN','BANGLADESH','BAHRAIN','AZERBAIJAN','ARMENIA', 'AFGHANISTAN'
    ],
     'OCEANIA': [
        'TONGA','SAMOA','PAPUA NEW GUINEA','PALAU','NEW ZEALAND','MICRONESIA, FEDERATED STATES OF','MARSHALL ISLANDS','FRENCH POLYNESIA',
        'FIJI', 'AUSTRALIA'
    ]}

In [7]:
country_continent_lookup = {}

for cont in continent_dict.keys():
    for coun in continent_dict[cont]:
        country_continent_lookup[coun] = cont
        

In [8]:
arrests_df['citizenship_continent'] = arrests_df['citizenship_country'].map(country_continent_lookup)

##### Date vars


In [9]:
trump_inaugaration_date = datetime.strptime('2025-01-20','%Y-%m-%d').date()

In [10]:
arrests_df['trump_bool'] = np.where(
    arrests_df['apprehension_date'].dt.date >= trump_inaugaration_date, "trump", "pre_trump")

In [11]:
arrests_df['apprehension_month_year'] = arrests_df['apprehension_date'].dt.to_period('M')
arrests_df['apprehension_day'] = arrests_df['apprehension_date'].dt.date

**Note** - the below is a rough approach, taking the exact same number of days pre and post the trump inaugeration. In future steps of this research I would check this makes sense, and whether we need to take into account seasonality, day of week, holidays etc. 

In [61]:
number_days_trump_administration = (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days
start_date = trump_inaugaration_date - timedelta(number_days_trump_administration)

eq_days_pre_post_trump = arrests_df[arrests_df['apprehension_day'] >= start_date].copy()

In [13]:
# check equal:

(trump_inaugaration_date - start_date).days == (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days

True

In [14]:
eq_days_pre_post_trump['trump_bool'].value_counts()

trump_bool
trump        111567
pre_trump     44648
Name: count, dtype: int64

### 2- Analysis

In [15]:
def get_summary_table(df, groupby_col='citizenship_country'):
    num_arrests_by_cc = df.groupby('trump_bool')[groupby_col].value_counts().reset_index().rename(columns={0:'number_arrests'})
    
    pivot_num_by_cc = num_arrests_by_cc.pivot(index=groupby_col, values='count', columns='trump_bool')

    pivot_num_by_cc['fact_increase'] = pivot_num_by_cc['trump'] / pivot_num_by_cc['pre_trump']
    pivot_num_by_cc['num_increase'] = pivot_num_by_cc['trump'] - pivot_num_by_cc['pre_trump']

    for c in ['pre_trump','trump']:
        pivot_num_by_cc[f'{c}_perc_arrests'] = (pivot_num_by_cc[c] / pivot_num_by_cc[c].sum()) * 100

    return pivot_num_by_cc

#### AOR:

First looking to see which AORs have had the largest jump in numbers of arrests

In [352]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='fact_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SAN DIEGO AREA OF RESPONSIBILITY,241,1303,5.406639,1062,0.545126,1.173641
BOSTON AREA OF RESPONSIBILITY,879,3957,4.501706,3078,1.988238,3.564158
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,3.97654,3045,2.313956,3.664139
DENVER AREA OF RESPONSIBILITY,661,2242,3.391831,1581,1.495137,2.01942
BUFFALO AREA OF RESPONSIBILITY,316,1057,3.344937,741,0.71477,0.952064
DETROIT AREA OF RESPONSIBILITY,763,2503,3.280472,1740,1.725854,2.254508
PHILADELPHIA AREA OF RESPONSIBILITY,1004,3214,3.201195,2210,2.270979,2.894922
HQ AREA OF RESPONSIBILITY,7,22,3.142857,15,0.015834,0.019816
MIAMI AREA OF RESPONSIBILITY,4291,13345,3.109998,9054,9.705949,12.02014
ATLANTA AREA OF RESPONSIBILITY,2696,8330,3.089763,5634,6.098168,7.503017


In [353]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='num_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MIAMI AREA OF RESPONSIBILITY,4291,13345,3.109998,9054,9.705949,12.02014
ATLANTA AREA OF RESPONSIBILITY,2696,8330,3.089763,5634,6.098168,7.503017
DALLAS AREA OF RESPONSIBILITY,4273,9159,2.143459,4886,9.665234,8.249716
NEW ORLEANS AREA OF RESPONSIBILITY,4725,9494,2.009312,4769,10.687627,8.551458
HOUSTON AREA OF RESPONSIBILITY,4425,8419,1.902599,3994,10.009048,7.583182
SAN ANTONIO AREA OF RESPONSIBILITY,2479,5946,2.398548,3467,5.607329,5.355695
LOS ANGELES AREA OF RESPONSIBILITY,1705,5106,2.994721,3401,3.856594,4.599088
CHICAGO AREA OF RESPONSIBILITY,3385,6657,1.966617,3272,7.656639,5.996109
BOSTON AREA OF RESPONSIBILITY,879,3957,4.501706,3078,1.988238,3.564158
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,3.97654,3045,2.313956,3.664139


**Observations**
* Interesting to see the differnce in AORs with the largest increase in number of arrests vs proportion of arrests - from eyeballing, it looks like mainly cities are the ones with largest prop increase (San Diego, Boston, Washington (need to check if this means DC), Denver, Buffalo (NY), Detroit, Philly
* And then ones with the largest actual increase are ones which already had quite high numbers - Miami, Atlanta, Dallas, New Orleans, Houston, San Antonio, LA, Chicago -> NB, some of this is probably to do with population (e.g. LA and Chicago); would be good to add in populations (if mapping between county to aor is available)

##### Citizenship country

In [16]:
eq_days_pre_post_trump.groupby('trump_bool')['citizenship_country'].nunique()

trump_bool
pre_trump    168
trump        181
Name: citizenship_country, dtype: int64

In [358]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_country').sort_values(by='num_increase', ascending=False)

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
citizenship_country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MEXICO,19400.0,41786.0,2.153918,22386.0,43.450994,37.453727
GUATEMALA,5370.0,15627.0,2.910056,10257.0,12.027414,14.00683
HONDURAS,5332.0,12877.0,2.415041,7545.0,11.942304,11.541943
VENEZUELA,1628.0,7983.0,4.903563,6355.0,3.6463,7.155342
EL SALVADOR,2088.0,5367.0,2.570402,3279.0,4.676581,4.810562
COLOMBIA,1447.0,3548.0,2.45197,2101.0,3.240907,3.180152
CUBA,737.0,2834.0,3.845319,2097.0,1.65069,2.540178
NICARAGUA,1986.0,3811.0,1.918933,1825.0,4.448128,3.415885
ECUADOR,1309.0,2963.0,2.26356,1654.0,2.931822,2.655803
BRAZIL,564.0,1627.0,2.884752,1063.0,1.263214,1.458317


In [359]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_continent')

trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
citizenship_continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFRICA,559,1738,3.109123,1179,1.252016,1.557808
ASIA,1181,4294,3.635902,3113,2.645135,3.848808
EUROPE,578,1396,2.415225,818,1.294571,1.251266
NORTH AND CENTRAL AMERICA,36597,86135,2.353608,49538,81.967837,77.204729
OCEANIA,85,152,1.788235,67,0.190378,0.136241
SOUTH AMERICA,5648,17852,3.160765,12204,12.650063,16.001147


In [17]:
aor_summary_country = eq_days_pre_post_trump.groupby('apprehension_aor').apply(get_summary_table)

In [361]:
aor_summary_country.sort_values(by='num_increase', ascending=False).head(20)

Unnamed: 0_level_0,trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ATLANTA AREA OF RESPONSIBILITY,MEXICO,1210.0,3356.0,2.773554,2146.0,44.881306,40.288115
MIAMI AREA OF RESPONSIBILITY,GUATEMALA,836.0,2875.0,3.438995,2039.0,19.482638,21.543649
DALLAS AREA OF RESPONSIBILITY,MEXICO,2545.0,4550.0,1.787819,2005.0,59.560028,49.677912
HARLINGEN AREA OF RESPONSIBILITY,MEXICO,1406.0,3196.0,2.273115,1790.0,65.243619,81.885729
MIAMI AREA OF RESPONSIBILITY,MEXICO,1025.0,2753.0,2.685854,1728.0,23.887206,20.629449
LOS ANGELES AREA OF RESPONSIBILITY,MEXICO,928.0,2573.0,2.772629,1645.0,54.428152,50.391696
NEW ORLEANS AREA OF RESPONSIBILITY,MEXICO,1944.0,3542.0,1.822016,1598.0,41.142857,37.307773
HOUSTON AREA OF RESPONSIBILITY,MEXICO,2148.0,3636.0,1.692737,1488.0,48.542373,43.188027
CHICAGO AREA OF RESPONSIBILITY,MEXICO,1437.0,2709.0,1.885177,1272.0,42.451994,40.694006
PHOENIX AREA OF RESPONSIBILITY,MEXICO,1285.0,2454.0,1.909728,1169.0,76.261128,69.479049


In [362]:
aor_summary_country.sort_values(by='fact_increase', ascending=False).head(20)

Unnamed: 0_level_0,trump_bool,pre_trump,trump,fact_increase,num_increase,pre_trump_perc_arrests,trump_perc_arrests
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
WASHINGTON AREA OF RESPONSIBILITY,INDIA,1.0,36.0,36.0,35.0,0.097752,0.884956
NEW YORK CITY AREA OF RESPONSIBILITY,HAITI,1.0,36.0,36.0,35.0,0.124224,1.483924
WASHINGTON AREA OF RESPONSIBILITY,AFGHANISTAN,1.0,31.0,31.0,30.0,0.097752,0.762045
WASHINGTON AREA OF RESPONSIBILITY,EGYPT,1.0,30.0,30.0,29.0,0.097752,0.737463
WASHINGTON AREA OF RESPONSIBILITY,CUBA,2.0,53.0,26.5,51.0,0.195503,1.302852
SEATTLE AREA OF RESPONSIBILITY,PERU,1.0,24.0,24.0,23.0,0.228833,1.848998
LOS ANGELES AREA OF RESPONSIBILITY,IRAN,3.0,68.0,22.666667,65.0,0.175953,1.331767
WASHINGTON AREA OF RESPONSIBILITY,"CHINA, PEOPLES REPUBLIC OF",1.0,21.0,21.0,20.0,0.097752,0.516224
DETROIT AREA OF RESPONSIBILITY,SENEGAL,1.0,21.0,21.0,20.0,0.131062,0.838993
SAN FRANCISCO AREA OF RESPONSIBILITY,VENEZUELA,3.0,62.0,20.666667,59.0,0.344037,2.859779


In [363]:
aor_summary_country.groupby(level=0, group_keys=False)['num_increase'].nlargest(5)

apprehension_aor                       citizenship_country       
ATLANTA AREA OF RESPONSIBILITY         MEXICO                        2146.0
                                       GUATEMALA                     1009.0
                                       HONDURAS                       783.0
                                       VENEZUELA                      584.0
                                       COLOMBIA                       230.0
BALTIMORE AREA OF RESPONSIBILITY       GUATEMALA                      273.0
                                       EL SALVADOR                    225.0
                                       HONDURAS                       147.0
                                       MEXICO                         137.0
                                       VENEZUELA                       59.0
BOSTON AREA OF RESPONSIBILITY          BRAZIL                         690.0
                                       GUATEMALA                      616.0
                      

Creating a plot to look at the country changes for each AOR:

In [35]:
# so the countries in plots are in the same order, for ease of comparison:
country_order = list(aor_summary_country.reset_index().groupby('citizenship_country')['trump'].sum().sort_values(ascending=False).index)

def country_pre_post_plot(aor_str, interactive=False, save=False):
    aor_country_plot_df = aor_summary_country[['pre_trump','trump']].stack().reset_index().rename(
        columns={0:'num_arrests'})
    
    chart = alt.layer(
            data=aor_country_plot_df[aor_country_plot_df['apprehension_aor']==aor_str]
        )
    
    chart += alt.Chart().mark_line(color='#9E9EA3').encode(
            x=alt.X('num_arrests:Q'),
            y=alt.Y('citizenship_country:N', sort=country_order),
            detail='citizenship_country:N',
        )
        # Add points for life expectancy in 1955 & 2000
    chart += alt.Chart().mark_point(
            size=100,
            opacity=0.5,
            filled=True
        ).encode(
            x=alt.X('num_arrests:Q', title="Number arrests"),
            y=alt.Y('citizenship_country:N', title="", sort=country_order),
            color=alt.Color('trump_bool',
                scale=alt.Scale(
                    domain=['pre_trump','trump'],
                    range=['#2a6ca8', '#bf1515']
                )
            )
        ).properties(width=800, title=aor_str)
    if save:
        chart.save(f'{aor_str}_country_arrest_change.png')
    
    if interactive:
        return chart.interactive()
    return chart
    
    # use chart.interactive() if you want to look into what is happening for the countries with smaller n's. as you can then scroll in

In [36]:
country_pre_post_plot('NEW YORK CITY AREA OF RESPONSIBILITY', save=True)

In [46]:
country_pre_post_plot('WASHINGTON AREA OF RESPONSIBILITY', save=True)

In [37]:
country_pre_post_plot('BOSTON AREA OF RESPONSIBILITY', save=True)

In [233]:
country_pre_post_plot('ATLANTA AREA OF RESPONSIBILITY', interactive=True)

In [45]:
country_pre_post_plot('DALLAS AREA OF RESPONSIBILITY', save=True)

##### Observations:

* Seems like there has been a big increase across most countries, but in particular there are some countries which did not have many arrests at all before that have had huge proportional increases.
* Looks like the AORs with the biggest prop increases for these countries with lower numbers of arrests overall are the same AORs which saw the biggest overall proportion increase -> could this be an indication that these areas are arresting people from more countries?

In [22]:
num_countries_pre_post_by_aor = eq_days_pre_post_trump.groupby(['apprehension_aor', 'trump_bool'])['citizenship_country'].nunique().unstack()
num_countries_pre_post_by_aor.columns = [str(c) for c in num_countries_pre_post_by_aor.columns]
num_countries_pre_post_by_aor['delta'] = num_countries_pre_post_by_aor['trump'] - num_countries_pre_post_by_aor['pre_trump']

In [381]:
num_countries_pre_post_by_aor.sort_values(by='delta', ascending=False)

Unnamed: 0_level_0,pre_trump,trump,delta
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NEW YORK CITY AREA OF RESPONSIBILITY,50,88,38
HOUSTON AREA OF RESPONSIBILITY,47,84,37
ATLANTA AREA OF RESPONSIBILITY,60,96,36
BOSTON AREA OF RESPONSIBILITY,62,98,36
DALLAS AREA OF RESPONSIBILITY,64,99,35
MIAMI AREA OF RESPONSIBILITY,85,119,34
BUFFALO AREA OF RESPONSIBILITY,44,78,34
BALTIMORE AREA OF RESPONSIBILITY,40,71,31
DENVER AREA OF RESPONSIBILITY,29,58,29
SEATTLE AREA OF RESPONSIBILITY,40,69,29


**What does this mean** In trump's administration the number of countries people are being arrested from has increased in all AORs - this lends to idea of "indiscriminately"

#### Change over time (instead of just pre post)

Overall stats of pre and post trump give us some indications of patterns, but we can't see the true picture of what is happening. Next I am going to look at change over time.

In [23]:
def aor_month_cc(df, top_countries=None, aor_str=None, n=5):
    if aor_str is None:
        aor_str = df['apprehension_aor'].unique().item()
    aor_group = df[df['apprehension_aor'] == aor_str]
    aor_group_cc = aor_group[['apprehension_month_year','citizenship_country']].value_counts().reset_index().rename(columns={'count':'num_arrests'})
    if top_countries is None: # done so I can feed in top countries if I want them to be consisten across aors, rather than unique to the top for that aor
        top_countries = aor_group_cc.groupby('citizenship_country')['num_arrests'].sum().sort_values(ascending=False).head(n).index
    aor_group_cc['summary_country'] = np.where(
            aor_group_cc['citizenship_country'].isin(top_countries), aor_group_cc['citizenship_country'], 'OTHER COUNTRY')
    return aor_group_cc

In [44]:
COUNTRY_ORDER = ['EL SALVADOR','GUATEMALA','HONDURAS','MEXICO','VENEZUELA','OTHER COUNTRY']

def plot_area_chart(df,aor_str, top_countries=None, save=False):
    aor_summary = aor_month_cc(df, top_countries, aor_str)
    aor_summary['apprehension_month_year'] = aor_summary['apprehension_month_year'].dt.strftime('%Y-%m')
    chart = alt.Chart(aor_summary, title=aor_str).mark_bar().encode(
        x='apprehension_month_year:O',
        y='sum(num_arrests)',
        color=alt.Color('summary_country', sort=COUNTRY_ORDER)).properties(width=800, height=400)
    if save:
        chart.save(figures/f'{aor_str}_monthly_country_arrests.png')
    return chart
    

In [25]:
top_countries_overall = lambda n: arrests_df[arrests_df['trump_bool']=='trump']['citizenship_country'].value_counts().head(n).index

In [40]:
plot_area_chart(arrests_df, 'WASHINGTON AREA OF RESPONSIBILITY', top_countries_overall(5), save=True)

In [241]:
plot_area_chart(arrests_df, 'NEW YORK CITY AREA OF RESPONSIBILITY', top_countries_overall(5)) 

In [41]:
plot_area_chart(arrests_df, 'DALLAS AREA OF RESPONSIBILITY', top_countries_overall(5), save=True) 

In [243]:
plot_area_chart(arrests_df, 'BOSTON AREA OF RESPONSIBILITY', top_countries_overall(5))

Other ways to look at this would be:
* Number of citizenship countries by month by AOR

In [27]:
num_countries_over_time = arrests_df.groupby(['apprehension_aor','apprehension_month_year'])['citizenship_country'].nunique().reset_index()

num_countries_over_time['apprehension_month_year'] = num_countries_over_time['apprehension_month_year'].dt.strftime('%Y-%m')

In [43]:
chart = alt.Chart(num_countries_over_time).mark_line().encode(
    x=alt.X('apprehension_month_year:O', title=None, axis=alt.Axis( grid = False, values=['2025-01'])),
    y=alt.Y('citizenship_country', title='Number of nationalities'),
    color=alt.Color('apprehension_aor', legend=None),
    tooltip=['apprehension_aor']
).properties(
    width=200,
    height=200,
).facet(
    facet=alt.Facet('apprehension_aor', title=None),
    columns=4,
        title={
        'text':'Some AORs have had huge jumps in number of nationalities they are arresting under Trump',
        'subtitle':['Change in monthly number of unique citizenship countries for each ICE AOR', '']}
).resolve_axis(
    x='independent')

chart.save(figures/"aor_num_nationalities_monthly.svg")

chart

### Next steps:

* Check pre-post equal number of days makes sense and adjust if needed to account for holidays, weekends, seasonality etc.
* Significance tests:
  * Number of nationalities: if the number of nationalities is an area we want to focus on, we can identify the areas that have seen a significant shift post Trump (rather than just gradually following the pre trump trend of increasing) through Regression Discontinuity or Interrupted Time Series (probably Interrupted time series because close to the inaugeration could be some overlaps of cases in progress etc. (need to think about how quickly we would expect to see the change, and what a reasonable comparison is)
  * Which country increases in number of arrests are significant, and for which AORs? - paired t-tests
* There is a lot more that could be done in this analysis - I have only really scratched the surface. Particular areas I would be interested in focusing on:
  * Bringing in data on estimated numbers of people of each nationality by state - in addition to overall numbers of nationalities increasing, there are some really large increases in numbers of people from particular nationalities being arrested (e.g. Brazillians in Boston AOR). I would like to know whether these big increases are due to targeted programs, or whether they are simply a reflection of the number of people of each nationality living in that area.
  * Looking into more detail for specific AORs - while some AORs seem to have had a big increase in arrests (and number of nationalities) since January (e.g. Washington AOR), others have seen big jumps in particular months (e.g. Boston in May; New York in June). I would like to talk to the team on the ground to understand more about whether this is related to any particular program (e.g. could these represent times when the Local police forces are being coopted?) - there is a lot more detail to go into here .


## Question 2
Select one question from Task 3 Steps 2-4 and complete the analysis.

**Question:** 
Is there evidence that ICE are accelerating deportations?
1. Has the proportion of those arrested that are deported increased over time? 
2. For those that are deported, has the average length of time between arrest data and deportation date been decreasing over time?


In [50]:
arrests_df['departed_date']

0        2024-08-19
1        2024-10-22
2        2025-06-10
3               NaT
4        2025-02-01
            ...    
265180          NaT
265181          NaT
265182          NaT
265183          NaT
265184          NaT
Name: departed_date, Length: 265185, dtype: datetime64[ns]

#### Proportion of arrests leading to a deportation:

In [62]:
eq_days_pre_post_trump['deported_bool'] = np.where(eq_days_pre_post_trump['departure_country'].isna(), False, True)

In [59]:
eq_days_pre_post_trump[['deported_bool', 'trump_bool']].value_counts().unstack()

trump_bool,pre_trump,trump
deported_bool,Unnamed: 1_level_1,Unnamed: 2_level_1
False,14765,57732
True,29883,53835


In [64]:
eq_days_pre_post_trump['departed_month'] = eq_days_pre_post_trump['departed_date'].dt.to_period('M').astype(str)

In [72]:
month_prop_deported = eq_days_pre_post_trump[['apprehension_month_year', 'deported_bool']].value_counts().reset_index()
month_prop_deported['apprehension_month_year'] = month_prop_deported['apprehension_month_year'].astype(str)

In [75]:
alt.Chart(month_prop_deported).mark_bar().encode(
    x='apprehension_month_year:O',
    y=alt.Y('count').stack("normalize"),
    color='deported_bool').properties(width=400, height=400)

**Note** - there is a clear time period factor that needs taking into account here - it makes sense that May and June have lower proportions of deportations as deportations can take more than a month from arrest

In [76]:
alt.Chart(month_prop_deported).mark_bar().encode(
    x='apprehension_month_year:O',
    y=alt.Y('count'),
    color='deported_bool').properties(width=400, height=400)

**Conclusion:**

The data actually shows the opposite of my initial assertion, as proportion of arrests that lead to deportation has decreased since the start of the Trump administration. 

This makes sense, as the overall number of arrests has increased (and if "indiscriminate" arrests are happening, then there are likely a smaller proportion that actually have a reason for arrest and deportation. 

Limitations:
* Time period bias - this could be due to the fact that the deportations have not happed **yet** for recent arrests

#### Is the time between arrest to deportation date decreasing?

In [86]:
eq_days_pre_post_trump['days_arrest_to_deportation'] = (eq_days_pre_post_trump['departed_date'] - eq_days_pre_post_trump['apprehension_date']).dt.days

In [91]:
eq_days_pre_post_trump.groupby('trump_bool').agg({'days_arrest_to_deportation': 'mean'})

Unnamed: 0_level_0,days_arrest_to_deportation
trump_bool,Unnamed: 1_level_1
pre_trump,50.675879
trump,26.396211


In [100]:
arrests_df['deported_bool'] = np.where(arrests_df['departure_country'].isna(), False, True)
arrests_df['departed_month'] = arrests_df['departed_date'].dt.to_period('M').astype(str)
arrests_df['days_arrest_to_deportation'] = (arrests_df['departed_date'] - arrests_df['apprehension_date']).dt.days

In [121]:
deported_df = arrests_df[arrests_df['deported_bool']==True]

In [128]:
month_time_to_departed = deported_df.groupby('apprehension_month_year').agg({'days_arrest_to_deportation':'mean'}).reset_index()
month_time_to_departed['apprehension_month_year'] = month_time_to_departed['apprehension_month_year'].astype(str)

In [194]:
plot_timeframe_df = deported_df[
    (deported_df['apprehension_date'].dt.date >= datetime.strptime('2024-06-01', '%Y-%m-%d').date()) & 
    (deported_df['apprehension_date'].dt.date < datetime.strptime('2025-06-01', '%Y-%m-%d').date())]

In [196]:
day_time_to_departed = plot_timeframe_df.groupby('apprehension_day').agg({'days_arrest_to_deportation':'mean'})
day_time_to_departed_rolling_average = day_time_to_departed.rolling(window=7).mean().reset_index().rename(columns={'days_arrest_to_deportation':'rolling_average'})
day_time_to_departed_rolling_average['apprehension_day'] = day_time_to_departed_rolling_average['apprehension_day'].astype(str)


In [198]:
chart = alt.Chart(day_time_to_departed_rolling_average).mark_line().encode(
        x=alt.X('apprehension_day:T', axis=alt.Axis(labelAngle=-45, format="%B-%Y"), title='Apprehension Date'),
        y=alt.Y('rolling_average', title='Number of days between arrest to deportation date', scale=alt.Scale(domain=[0,70]))
    ).properties(
        width=1000,
        height=400,
        title={
            'text':'There has been a huge acceleration in speed of deportation since the beginning of the Trump administration',
            'subtitle':'7 day rolling average of number of days, shown by the date of arrest'})

chart.save(figures/"time_from_arrests_to_deportation.png")

chart

In [281]:

chart = alt.layer(
            data=aor_pre_post_num_days
        )
    
chart += alt.Chart().mark_line(color='#9E9EA3').encode(
            x=alt.X('days_arrest_to_deportation:Q'),
            y=alt.Y('apprehension_aor:N'),
            detail='apprehension_aor:N',
        )

chart += alt.Chart().mark_point(
            size=100,
            opacity=0.5,
            filled=True
        ).encode(
            x=alt.X('days_arrest_to_deportation:Q'),
            y=alt.Y('apprehension_aor:N', title="").sort('x'),
            color=alt.Color('trump_bool',
                scale=alt.Scale(
                    domain=['pre_trump','trump'],
                    range=['#2a6ca8', '#bf1515']
                )
            )
        ).properties(
            width=800, 
            title={
                'text':'All AORs (appart from LA) have seen an increase in speed of deportation', 
                'subtitle':'Change in time between arrest to deportation before and since Trump administration',
                })

chart.save(figures/'aor_change_num_days_arrest_to_deportation.png')

chart

In [280]:
aor_pre_post_num_days = eq_days_pre_post_trump.groupby(['apprehension_aor', 'trump_bool']).agg({'days_arrest_to_deportation':'mean'}).reset_index()
aor_pre_post_num_arrests = eq_days_pre_post_trump[eq_days_pre_post_trump['deported_bool']==True].groupby(['apprehension_aor', 'trump_bool']).agg({'days_arrest_to_deportation':'count'}).reset_index()

chart = alt.Chart(aor_pre_post_num_arrests).mark_bar(opacity=0.6).encode(
    x=alt.X('days_arrest_to_deportation', title='Number of deportations'),
    y=alt.Y('apprehension_aor'),
    color='trump_bool').properties(
        title = {
            'text': 'All AORs have had more deportations during the Trump administration than in the same time before',
            'subtitle':['Number of deportations by AOR, split by number during Trump administration, and before', '']})

chart.save(figures/'aor_number_deportations.png')

chart

### Limitations and next steps:

#### Limitations
* Bias of time since arrest - we might only be seeing the deportations that are happening quickly, and it could be that more of the arrests will lead to deportations that will happen in a longer time period (but are not shown in the data yet). From the final chart we can see that there are still more deportations happening under Trump, but as a proportion of total arrests this has decreased, which could be due to more indiscriminate arrests, or it could be because we will see more deportations for these arrests but they are taking longer.


#### Next steps:
Again, I am only really scratching the surface of this analysis. Things I'd like to look into:
* Does nationality have any impact on time to deportation?
* Are arrests that are done in conjunction with local and state law enforcement more or less likely to result in a deportation?

