# Task 4 - Analysis

## Question 1
From Task 3 Step 1, analyze how the national background of those arrested changed before and after the start of the second Trump administration, broken down by ICE Area of Responsibility (AOR).


### 0- Setup

In [1]:
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
import altair as alt

import process_data

In [2]:
pd.set_option("display.max_rows", 300)
pd.set_option("display.max_columns", 200)

In [3]:
arrests_filename = 'arrests_with_facility_county.csv'
cwd = Path.cwd()
root = cwd.parent
data = root / "data"

In [13]:
arrests_df = pd.read_csv(data/arrests_filename, parse_dates=['apprehension_date','departed_date'])

In [14]:
arrests_df.head()

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier,county,jail,prison,facility,other_facility_loc_info
0,2024-08-07 09:43:00,VIRGINIA,WASHINGTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,NON-CUSTODIAL ARREST,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2024-08-19,HONDURAS,YES,1981,HONDURAS,MALE,"HBG GENERAL AREA, NON-SPECIFIC",0000b34edd657d516c02b13a7c352d62d0effcb6,,,,,
1,2024-10-19 08:33:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-10-22,MEXICO,YES,1984,MEXICO,MALE,"HARRIS COUNTY JAIL, HOUSTON, TX",0000ba6e459998a6046d185d82cf4349de1479d0,"HARRIS COUNTY HOUSTON, TX",HARRIS COUNTY JAIL,,HARRIS COUNTY JAIL,"HOUSTON, TX"
2,2025-04-15 10:08:00,NEW JERSEY,NEWARK AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP FEDERAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2025-06-10,DOMINICAN REPUBLIC,YES,1988,DOMINICAN REPUBLIC,MALE,"FORT DIX EAST, NEW JERSEY",0000c3d23fb0e444864559575900d410c4e8490f,,,,,
3,2025-06-03 09:20:00,MINNESOTA,ST. PAUL AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,YES,1985,COLOMBIA,FEMALE,"SPM GENERAL AREA, NON-SPECIFIC",0000d3dbf8033b5f209f6547ffee5b84feb4f599,,,,,
4,2025-01-21 05:41:00,,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,3-VOLUNTARY DEPARTURE CONFIRMED,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-01,MEXICO,YES,1983,MEXICO,MALE,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,000104d730bf021326c6dc0deb3dd575304136b5,,MIAMI DADE COUNTY JAIL,,MIAMI DADE COUNTY JAIL,


**Note** - saving df as csv (from task 2) has also messed up times in the datetime - must be something in the excel format. Can look to change in future if we need times

### 1- Data exploration and processing

**Key info:**
* Trump was inaugerated on 20th January 2025

**Dates approach:**
* Will likely want to look at month totals to smooth out -> create month var
* More data pre Trump, so will want to do some analysis looking at same number of days/months pre and post (for overall numbers/proportions etc.)

**National background approach:**
* Have the number of citizenship countries changed?
* What about the specific countries - what has changed about them?
   * Add in continent?
   * Any other country meta data to add in?
* Is this different depending on ICE Area of Responsibility (AOR)?

##### ICE Area of Responsibilitiy (AOR)

In [7]:
arrests_df['apprehension_aor'].value_counts(dropna=False)

apprehension_aor
MIAMI AREA OF RESPONSIBILITY             26925
NEW ORLEANS AREA OF RESPONSIBILITY       23598
DALLAS AREA OF RESPONSIBILITY            23185
HOUSTON AREA OF RESPONSIBILITY           21785
CHICAGO AREA OF RESPONSIBILITY           18370
ATLANTA AREA OF RESPONSIBILITY           16596
SAN ANTONIO AREA OF RESPONSIBILITY       15161
HARLINGEN AREA OF RESPONSIBILITY         11607
LOS ANGELES AREA OF RESPONSIBILITY       11575
NEWARK AREA OF RESPONSIBILITY             8506
PHOENIX AREA OF RESPONSIBILITY            8351
NEW YORK CITY AREA OF RESPONSIBILITY      8101
SALT LAKE CITY AREA OF RESPONSIBILITY     7664
BOSTON AREA OF RESPONSIBILITY             7148
WASHINGTON AREA OF RESPONSIBILITY         6955
PHILADELPHIA AREA OF RESPONSIBILITY       6240
ST. PAUL AREA OF RESPONSIBILITY           6124
NaN                                       5903
DETROIT AREA OF RESPONSIBILITY            5400
SAN FRANCISCO AREA OF RESPONSIBILITY      5277
DENVER AREA OF RESPONSIBILITY             4

In [7]:
arrests_df['apprehension_aor'].fillna('MISSING', inplace = True) 
# done because I don't want to lose these from the analysis, there may be a reason that AOR is missing

##### Citizenship country

In [9]:
arrests_df['citizenship_country'].isna().sum()

0

In [10]:
arrests_df['citizenship_country'].value_counts().reset_index().head(10)

Unnamed: 0,citizenship_country,count
0,MEXICO,101036
1,GUATEMALA,32638
2,HONDURAS,29628
3,VENEZUELA,15238
4,NICARAGUA,14688
5,EL SALVADOR,12041
6,COLOMBIA,9943
7,ECUADOR,9339
8,CUBA,6205
9,DOMINICAN REPUBLIC,5065


##### Adding continent info

**Note** this is a clunky, quick way to do this just to see whether it is something to include in the analysis. In the next steps of this analysis it would be good to bring in additional country information, and think about what geographic area is interesting (e.g. instead of contintent we could use slightly corser geographic areas that are meaningful, such as "Western Asia", "Central America" etc.)

In [8]:
continent_dict = {
    'NORTH AND CENTRAL AMERICA': [
        'TURKS AND CAICOS ISLANDS','TRINIDAD AND TOBAGO','ST. VINCENT-GRENADINES', 'ST. LUCIA','ST. KITTS-NEVIS',
        'SINT EUSTATIUS', 'SINT MAARTEN(DUTCH)', 'PANAMA','NICARAGUA','NETHERLANDS ANTILLES', 'MONTSERRAT','MEXICO',
        'JAMAICA','HAITI', 'HONDURAS','GRENADA', 'GUADELOUPE', 'GUATEMALA','CURACAO','CANADA', 'ANGUILLA','ANTIGUA-BARBUDA',
        'BAHAMAS', 'BARBADOS', 'BELIZE','BOLIVIA','COSTA RICA','CUBA','DOMINICA', 'DOMINICAN REPUBLIC','EL SALVADOR','BRITISH VIRGIN ISLANDS'
    ],
     'SOUTH AMERICA': [
        'VENEZUELA','URUGUAY','SURINAME','PERU','PARAGUAY','GUYANA','FRENCH GUIANA','ECUADOR','COLOMBIA', 'CHILE','BRAZIL','ARGENTINA'
    ],
     'EUROPE': [
        'YUGOSLAVIA','USSR','UNITED KINGDOM','UKRAINE','SWITZERLAND','SWEDEN','SPAIN','SLOVENIA','SLOVAKIA', 'SERBIA AND MONTENEGRO','SERBIA',
        'RUSSIA','ROMANIA','PORTUGAL','POLAND', 'NORWAY','NORTH MACEDONIA','NETHERLANDS','MONTENEGRO','MOLDOVA','MALTA','LITHUANIA','LATVIA',
        'KOSOVO','ITALY','IRELAND','ICELAND','HUNGARY', 'GREECE','GERMANY','FRANCE','FINLAND','ESTONIA','DENMARK','CZECHOSLOVAKIA','CZECH REPUBLIC',
        'CYPRUS','CROATIA','BULGARIA','BOSNIA-HERZEGOVINA','BELGIUM','BELARUS','AUSTRIA','ALBANIA', 'ANDORRA'
    ],
     'AFRICA': [
        'ZIMBABWE','ZAMBIA','UGANDA','TUNISIA','TOGO','TANZANIA','SUDAN','SOUTH SUDAN','SOUTH AFRICA','SOMALIA','SIERRA LEONE','SENEGAL',
        'SAO TOME AND PRINCIPE','RWANDA','NIGERIA','NIGER', 'NAMIBIA','MOZAMBIQUE','MOROCCO','MAURITIUS','MAURITANIA','MALI','LIBYA',
        'LIBERIA','KENYA','IVORY COAST','GUINEA', 'GUINEA-BISSAU','GHANA','GAMBIA','GABON','EGYPT','ETHIOPIA','ESWATINI','ERITREA','EQUATORIAL GUINEA',
        'DJIBOUTI','DEM REP OF THE CONGO','CONGO','CHAD','CENTRAL AFRICAN REPUBLIC','CAPE VERDE','CAMEROON','BURUNDI','BURKINA FASO','BOTSWANA',
        'BENIN','ANGOLA','ALGERIA'],
     'ASIA': [
        'YEMEN','VIETNAM','UZBEKISTAN','UNITED ARAB EMIRATES','TURKMENISTAN','TURKIYE','THAILAND','TAJIKISTAN','TAIWAN','SYRIA','SRI LANKA',
        'SOUTH KOREA','SAUDI ARABIA','PHILIPPINES','PAKISTAN','OMAN','NEPAL','MONGOLIA', 'MALAYSIA','MALAWI','LEBANON','LAOS','KYRGYZSTAN','KUWAIT',
        'KOREA','KAZAKHSTAN','JORDAN','JAPAN','ISRAEL','IRAQ','IRAN','INDONESIA', 'INDIA','HONG KONG', 'GEORGIA','EAST TIMOR','CHINA, PEOPLES REPUBLIC OF',
        'CAMBODIA','BURMA','BRUNEI','BHUTAN','BANGLADESH','BAHRAIN','AZERBAIJAN','ARMENIA', 'AFGHANISTAN'
    ],
     'OCEANIA': [
        'TONGA','SAMOA','PAPUA NEW GUINEA','PALAU','NEW ZEALAND','MICRONESIA, FEDERATED STATES OF','MARSHALL ISLANDS','FRENCH POLYNESIA',
        'FIJI', 'AUSTRALIA'
    ]}

In [9]:
country_continent_lookup = {}

for cont in continent_dict.keys():
    for coun in continent_dict[cont]:
        country_continent_lookup[coun] = cont
        

In [26]:
arrests_df['citizenship_continent'] = arrests_df['citizenship_country'].map(country_continent_lookup)

##### Date vars


In [11]:
trump_inaugaration_date = datetime.strptime('2025-01-20','%Y-%m-%d').date()

In [16]:
arrests_df['trump'] = np.where(
    arrests_df['apprehension_date'].dt.date >= trump_inaugaration_date, True, False)

In [17]:
arrests_df['apprehension_month_year'] = arrests_df['apprehension_date'].dt.to_period('M')
arrests_df['apprehension_day'] = arrests_df['apprehension_date'].dt.date

**Note** - the below is a rough approach, taking the exact same number of days pre and post the trump inaugeration. In future steps of this research I would check this makes sense, and whether we need to take into account seasonality, day of week, holidays etc. 

In [27]:
number_days_trump_administration = (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days
start_date = trump_inaugaration_date - timedelta(number_days_trump_administration)

eq_days_pre_post_trump = arrests_df[arrests_df['apprehension_day'] >= start_date]

In [19]:
# check equal:

(trump_inaugaration_date - start_date).days == (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days

True

In [20]:
eq_days_pre_post_trump['trump'].value_counts()

trump
True     111567
False     44648
Name: count, dtype: int64

### 2- Analysis

In [78]:
def get_summary_table(df, groupby_col='citizenship_country'):
    num_arrests_by_cc = df.groupby('trump')[groupby_col].value_counts().reset_index().rename(columns={0:'number_arrests'})
    
    pivot_num_by_cc = num_arrests_by_cc.pivot(index=groupby_col, values='count', columns='trump')
    pivot_num_by_cc.columns = [str(x) for x in pivot_num_by_cc.columns]

    for c in pivot_num_by_cc.columns:
        pivot_num_by_cc[f'{c}_perc_arrests'] = (pivot_num_by_cc[c] / pivot_num_by_cc[c].sum()) * 100

    pivot_num_by_cc['fact_increase'] = pivot_num_by_cc['True'] / pivot_num_by_cc['False']
    pivot_num_by_cc['num_increase'] = pivot_num_by_cc['True'] - pivot_num_by_cc['False']

    return pivot_num_by_cc

#### AOR:

First looking to see which AORs have had the largest jump in numbers of arrests

In [77]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='fact_increase', ascending=False)

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,fact_increase,num_increase
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SAN DIEGO AREA OF RESPONSIBILITY,241,1303,0.545126,1.173641,5.406639,1062
BOSTON AREA OF RESPONSIBILITY,879,3957,1.988238,3.564158,4.501706,3078
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,2.313956,3.664139,3.97654,3045
DENVER AREA OF RESPONSIBILITY,661,2242,1.495137,2.01942,3.391831,1581
BUFFALO AREA OF RESPONSIBILITY,316,1057,0.71477,0.952064,3.344937,741
DETROIT AREA OF RESPONSIBILITY,763,2503,1.725854,2.254508,3.280472,1740
PHILADELPHIA AREA OF RESPONSIBILITY,1004,3214,2.270979,2.894922,3.201195,2210
HQ AREA OF RESPONSIBILITY,7,22,0.015834,0.019816,3.142857,15
MIAMI AREA OF RESPONSIBILITY,4291,13345,9.705949,12.02014,3.109998,9054
ATLANTA AREA OF RESPONSIBILITY,2696,8330,6.098168,7.503017,3.089763,5634


In [23]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='num_increase', ascending=False)

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MIAMI AREA OF RESPONSIBILITY,4291,13345,9.705949,12.02014,3.109998,9054
ATLANTA AREA OF RESPONSIBILITY,2696,8330,6.098168,7.503017,3.089763,5634
DALLAS AREA OF RESPONSIBILITY,4273,9159,9.665234,8.249716,2.143459,4886
NEW ORLEANS AREA OF RESPONSIBILITY,4725,9494,10.687627,8.551458,2.009312,4769
HOUSTON AREA OF RESPONSIBILITY,4425,8419,10.009048,7.583182,1.902599,3994
SAN ANTONIO AREA OF RESPONSIBILITY,2479,5946,5.607329,5.355695,2.398548,3467
LOS ANGELES AREA OF RESPONSIBILITY,1705,5106,3.856594,4.599088,2.994721,3401
CHICAGO AREA OF RESPONSIBILITY,3385,6657,7.656639,5.996109,1.966617,3272
BOSTON AREA OF RESPONSIBILITY,879,3957,1.988238,3.564158,4.501706,3078
WASHINGTON AREA OF RESPONSIBILITY,1023,4068,2.313956,3.664139,3.97654,3045


**Observations**
* Interesting to see the differnce in AORs with the largest increase in number of arrests vs proportion of arrests - from eyeballing, it looks like mainly cities are the ones with largest prop increase (San Diego, Boston, Washington (need to check if this means DC), Denver, Buffalo (NY), Detroit, Philly
* And then ones with the largest actual increase are ones which already had quite high numbers - Miami, Atlanta, Dallas, New Orleans, Houston, San Antonio, LA, Chicago -> NB, some of this is probably to do with population (e.g. LA and Chicago); would be good to add in populations (if mapping between county to aor is available)

##### Citizenship country

In [24]:
eq_days_pre_post_trump.groupby('trump')['citizenship_country'].nunique()

trump
False    168
True     181
Name: citizenship_country, dtype: int64

In [24]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_country')

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
citizenship_country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFGHANISTAN,33.0,103.0,0.0739,0.092302,3.121212,70.0
ALBANIA,12.0,28.0,0.026873,0.025092,2.333333,16.0
ALGERIA,7.0,12.0,0.015676,0.010754,1.714286,5.0
ANDORRA,,1.0,,0.000896,,
ANGOLA,23.0,53.0,0.051506,0.047495,2.304348,30.0
ANGUILLA,,1.0,,0.000896,,
ANTIGUA-BARBUDA,1.0,4.0,0.002239,0.003585,4.0,3.0
ARGENTINA,22.0,76.0,0.049267,0.068106,3.454545,54.0
ARMENIA,18.0,45.0,0.040309,0.040326,2.5,27.0
AUSTRALIA,4.0,14.0,0.008958,0.012546,3.5,10.0


In [28]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_continent')

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
citizenship_continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFRICA,559,1738,1.252016,1.557808,3.109123,1179
ASIA,1181,4294,2.645135,3.848808,3.635902,3113
EUROPE,578,1396,1.294571,1.251266,2.415225,818
NORTH AND CENTRAL AMERICA,36597,86135,81.967837,77.204729,2.353608,49538
OCEANIA,85,152,0.190378,0.136241,1.788235,67
SOUTH AMERICA,5648,17852,12.650063,16.001147,3.160765,12204


In [176]:
aor_summary_country = eq_days_pre_post_trump.groupby('apprehension_aor').apply(get_summary_table)

In [27]:
aor_summary_country.sort_values(by='num_increase', ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ATLANTA AREA OF RESPONSIBILITY,MEXICO,1210.0,3356.0,44.831419,40.288115,2.773554,2146.0
MIAMI AREA OF RESPONSIBILITY,GUATEMALA,836.0,2880.0,19.482638,21.568187,3.444976,2044.0
DALLAS AREA OF RESPONSIBILITY,MEXICO,2545.0,4550.0,59.560028,49.677912,1.787819,2005.0
HARLINGEN AREA OF RESPONSIBILITY,MEXICO,1406.0,3196.0,65.243619,81.885729,2.273115,1790.0
MIAMI AREA OF RESPONSIBILITY,MEXICO,1025.0,2753.0,23.887206,20.61709,2.685854,1728.0
LOS ANGELES AREA OF RESPONSIBILITY,MEXICO,928.0,2577.0,54.428152,50.400939,2.77694,1649.0
NEW ORLEANS AREA OF RESPONSIBILITY,MEXICO,1944.0,3542.0,41.142857,37.307773,1.822016,1598.0
HOUSTON AREA OF RESPONSIBILITY,MEXICO,2149.0,3637.0,48.553999,43.194774,1.692415,1488.0
CHICAGO AREA OF RESPONSIBILITY,MEXICO,1437.0,2709.0,42.451994,40.687894,1.885177,1272.0
PHOENIX AREA OF RESPONSIBILITY,MEXICO,1286.0,2454.0,76.275208,69.479049,1.908243,1168.0


In [177]:
aor_summary_country.sort_values(by='fact_increase', ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,False,True,False_perc_arrests,True_perc_arrests,fact_increase,num_increase
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
WASHINGTON AREA OF RESPONSIBILITY,INDIA,1.0,36.0,0.097752,0.884956,36.0,35.0
NEW YORK CITY AREA OF RESPONSIBILITY,HAITI,1.0,36.0,0.124224,1.483924,36.0,35.0
WASHINGTON AREA OF RESPONSIBILITY,AFGHANISTAN,1.0,31.0,0.097752,0.762045,31.0,30.0
WASHINGTON AREA OF RESPONSIBILITY,EGYPT,1.0,30.0,0.097752,0.737463,30.0,29.0
WASHINGTON AREA OF RESPONSIBILITY,CUBA,2.0,53.0,0.195503,1.302852,26.5,51.0
SEATTLE AREA OF RESPONSIBILITY,PERU,1.0,24.0,0.228833,1.848998,24.0,23.0
LOS ANGELES AREA OF RESPONSIBILITY,IRAN,3.0,68.0,0.175953,1.331767,22.666667,65.0
WASHINGTON AREA OF RESPONSIBILITY,"CHINA, PEOPLES REPUBLIC OF",1.0,21.0,0.097752,0.516224,21.0,20.0
DETROIT AREA OF RESPONSIBILITY,SENEGAL,1.0,21.0,0.131062,0.838993,21.0,20.0
SAN FRANCISCO AREA OF RESPONSIBILITY,VENEZUELA,3.0,62.0,0.344037,2.859779,20.666667,59.0


In [212]:
aor_summary_country.groupby(level=0, group_keys=False)['num_increase'].nlargest(5)

apprehension_aor                       citizenship_country       
ATLANTA AREA OF RESPONSIBILITY         MEXICO                        2146.0
                                       GUATEMALA                     1009.0
                                       HONDURAS                       783.0
                                       VENEZUELA                      584.0
                                       COLOMBIA                       230.0
BALTIMORE AREA OF RESPONSIBILITY       GUATEMALA                      273.0
                                       EL SALVADOR                    225.0
                                       HONDURAS                       147.0
                                       MEXICO                         137.0
                                       VENEZUELA                       59.0
BOSTON AREA OF RESPONSIBILITY          BRAZIL                         690.0
                                       GUATEMALA                      616.0
                      

In [None]:
aor_summary_country['True'].quantile(0.99)

Creating a plot to look at the country changes for each AOR:

In [241]:
# so the countries in plots are in the same order, for ease of comparison:
country_order = list(aor_summary_country.reset_index().groupby('citizenship_country')['True'].sum().sort_values(ascending=False).index)

def country_pre_post_plot(aor_str):
    aor_country_plot_df = aor_summary_country.rename(columns={'False':'pre_trump','True':'post_trump'})[['pre_trump','post_trump']].stack().reset_index().rename(
        columns={'level_2':'pre_post',0:'num_arrests'})
    
    chart = alt.layer(
            data=aor_country_plot_df[aor_country_plot_df['apprehension_aor']==aor_str]
        )
    
    chart += alt.Chart().mark_line(color='#9E9EA3').encode(
            x=alt.X('num_arrests:Q'),
            y=alt.Y('citizenship_country:N', sort=country_order),
            detail='citizenship_country:N',
        )
        # Add points for life expectancy in 1955 & 2000
    chart += alt.Chart().mark_point(
            size=100,
            opacity=0.5,
            filled=True
        ).encode(
            x=alt.X('num_arrests:Q', title="Number arrests"),
            y=alt.Y('citizenship_country:N', title="", sort=country_order),
            color=alt.Color('pre_post',
                scale=alt.Scale(
                    domain=['pre_trump','post_trump'],
                    range=['#2a6ca8', '#bf1515']
                )
            )
        ).properties(width=800, title=aor_str)
    return chart
    # return chart.interactive() 
    
    # use chart.interactive() if you want to look into what is happening for the countries with smaller n's. as you can then scroll in

In [242]:
country_pre_post_plot('NEW YORK CITY AREA OF RESPONSIBILITY')

In [243]:
country_pre_post_plot('BOSTON AREA OF RESPONSIBILITY')

In [238]:
country_pre_post_plot('ATLANTA AREA OF RESPONSIBILITY')

##### Observations:

* Seems like there has been a big increase across most countries, but in particular there are some countries which did not have many arrests at all before that have had huge proportional increases.
* Looks like the AORs with the biggest prop increases for these countries with lower numbers of arrests overall are the same AORs which saw the biggest overall proportion increase -> could this be an indication that these areas are arresting people from more countries?

In [34]:
num_countries_pre_post_by_aor = eq_days_pre_post_trump.groupby(['apprehension_aor', 'trump'])['citizenship_country'].nunique().unstack()
num_countries_pre_post_by_aor.columns = [str(c) for c in num_countries_pre_post_by_aor.columns]

In [35]:
num_countries_pre_post_by_aor['delta'] = num_countries_pre_post_by_aor['True'] - num_countries_pre_post_by_aor['False']

In [50]:
num_countries_pre_post_by_aor.sort_values(by='delta', ascending=False)

Unnamed: 0_level_0,False,True,delta
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NEW YORK CITY AREA OF RESPONSIBILITY,50,88,38
HOUSTON AREA OF RESPONSIBILITY,47,84,37
ATLANTA AREA OF RESPONSIBILITY,60,96,36
BOSTON AREA OF RESPONSIBILITY,62,98,36
DALLAS AREA OF RESPONSIBILITY,64,99,35
MIAMI AREA OF RESPONSIBILITY,85,119,34
BUFFALO AREA OF RESPONSIBILITY,44,78,34
BALTIMORE AREA OF RESPONSIBILITY,40,71,31
DENVER AREA OF RESPONSIBILITY,29,58,29
SEATTLE AREA OF RESPONSIBILITY,40,69,29


**What does this mean** In trump's administration the number of countries people are being arrested from has increased in all AORs - this lends to idea of "indiscriminately"

#### Change over time (instead of just pre post)

Overall stats of pre and post trump give us some indications of patterns, but we can't see the true picture of what is happening. Next I am going to look at change over time.

In [37]:
def aor_month_cc(df, top_countries=None, aor_str=None, n=5):
    if aor_str is None:
        aor_str = df['apprehension_aor'].unique().item()
    aor_group = df[df['apprehension_aor'] == aor_str]
    aor_group_cc = aor_group[['apprehension_month_year','citizenship_country']].value_counts().reset_index().rename(columns={'count':'num_arrests'})
    if top_countries is None: # done so I can feed in top countries if I want them to be consisten across aors, rather than unique to the top for that aor
        top_countries = aor_group_cc.groupby('citizenship_country')['num_arrests'].sum().sort_values(ascending=False).head(n).index
    aor_group_cc['summary_country'] = np.where(
            aor_group_cc['citizenship_country'].isin(top_countries), aor_group_cc['citizenship_country'], 'OTHER COUNTRY')
    return aor_group_cc

In [54]:
COUNTRY_ORDER = ['EL SALVADOR','GUATEMALA','HONDURAS','MEXICO','VENEZUELA','OTHER COUNTRY']

def plot_area_chart(df,aor_str,top_countries=None):
    aor_summary = aor_month_cc(df, top_countries, aor_str)
    aor_summary['apprehension_month_year'] = aor_summary['apprehension_month_year'].dt.strftime('%Y-%m')
    return alt.Chart(aor_summary, title=aor_str).mark_bar().encode(
        x='apprehension_month_year:O',
        y='sum(num_arrests)',
        color=alt.Color('summary_country', sort=COUNTRY_ORDER)).properties(width=800, height=400)
    

In [39]:
top_countries_overall = lambda n: arrests_df[arrests_df['trump']==True]['citizenship_country'].value_counts().head(n).index

In [55]:
plot_area_chart(arrests_df, 'WASHINGTON AREA OF RESPONSIBILITY', top_countries_overall(5))

In [57]:
plot_area_chart(arrests_df, 'NEW YORK CITY AREA OF RESPONSIBILITY', top_countries_overall(5)) 

In [59]:
plot_area_chart(arrests_df, 'HOUSTON AREA OF RESPONSIBILITY', top_countries_overall(5)) 

In [60]:
plot_area_chart(arrests_df, 'BOSTON AREA OF RESPONSIBILITY', top_countries_overall(5))

Other ways to look at this would be:
* Number of citizenship countries by month by AOR

In [64]:
num_countries_over_time = arrests_df.groupby(['apprehension_aor','apprehension_month_year'])['citizenship_country'].nunique().reset_index()

num_countries_over_time['apprehension_month_year'] = num_countries_over_time['apprehension_month_year'].dt.strftime('%Y-%m')

In [65]:
num_countries_over_time['trump'] = '2025-01'

In [66]:
alt.Chart(num_countries_over_time).mark_line().encode(
    x=alt.X('apprehension_month_year:O', axis=alt.Axis( grid = False, values=['2025-01'])),
    y='citizenship_country',
    color=alt.Color('apprehension_aor', legend=None),
    tooltip=['apprehension_aor']).properties(width=200,height=200).facet(
                    facet='apprehension_aor',
                    columns=4)
    

#### Next steps for this analysis:

* Check pre-post equal number of days makes sense and adjust if needed to account for holidays, weekends, seasonality etc.
* Significance tests:
  * If the number of citizenships is an area we want to go down, we can identify the areas that have seen a significant shift post Trump (rather than just gradually following the pre trump trend of increasing) - Regression Discontinuity or Interrupted Time Series (probably Interrupted time series because close to the inaugeration could be some overlaps of cases in progress etc.
  * Which country increases in number of arrests are significant, and for which AORs? - paired t-tests

## Question 2
Select one question from Task 3 Steps 2-4 and complete the analysis.

**Question:** are the AORs that have seen the big spike in "other countries" also the places that we are seeing local enforcement partnerships (ie this looks like "rounding up immigrants indiscriminantly")

In [266]:
arrests_df['apprehension_method'].value_counts()

apprehension_method
CAP LOCAL INCARCERATION                        112087
NON-CUSTODIAL ARREST                            56902
LOCATED                                         31394
CAP FEDERAL INCARCERATION                       23408
CAP STATE INCARCERATION                         10417
OTHER EFFORTS                                    9014
ERO REPROCESSED ARREST                           8875
287(G) PROGRAM                                   6313
PROBATION AND PAROLE                             3833
LAW ENFORCEMENT AGENCY RESPONSE UNIT              845
OTHER TASK FORCE                                  505
PATROL BORDER                                     477
OTHER AGENCY (TURNED OVER TO INS)                 444
WORKSITE ENFORCEMENT                              252
INSPECTIONS                                       127
ANTI-SMUGGLING                                     84
TRAFFIC CHECK                                      62
ORGANIZED CRIME DRUG ENFORCEMENT TASK FORCE        52
PATROL I

In [279]:
eq_days_pre_post_trump[['apprehension_method', 'trump']].value_counts().unstack().sort_values(by=True, ascending=False)

trump,False,True
apprehension_method,Unnamed: 1_level_1,Unnamed: 2_level_1
CAP LOCAL INCARCERATION,23345.0,45886.0
NON-CUSTODIAL ARREST,6309.0,27071.0
LOCATED,3221.0,20455.0
CAP FEDERAL INCARCERATION,5049.0,6684.0
OTHER EFFORTS,1298.0,2870.0
CAP STATE INCARCERATION,2353.0,2718.0
287(G) PROGRAM,1432.0,2308.0
PROBATION AND PAROLE,768.0,962.0
ERO REPROCESSED ARREST,627.0,785.0
LAW ENFORCEMENT AGENCY RESPONSE UNIT,105.0,580.0


So there has been a huge increase in CAP LOCAL INCARCERATION since trump

In [294]:
eq_days_pre_post_trumpeq_days_pre_post_trump[eq_days_pre_post_trump['apprehension_method']=='CAP LOCAL INCARCERATION'].groupby(
    'apprehension_aor')[['trump']].value_counts().unstack().rename(columns={False:'pre_trump_CAP_LOCAL',True:'post_trump_CAP_LOCAL'})



trump,pre_trump_CAP_LOCAL,post_trump_CAP_LOCAL
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1
ATLANTA AREA OF RESPONSIBILITY,1389.0,3961.0
BALTIMORE AREA OF RESPONSIBILITY,178.0,312.0
BOSTON AREA OF RESPONSIBILITY,191.0,166.0
BUFFALO AREA OF RESPONSIBILITY,20.0,47.0
CHICAGO AREA OF RESPONSIBILITY,1426.0,3341.0
DALLAS AREA OF RESPONSIBILITY,2584.0,5372.0
DENVER AREA OF RESPONSIBILITY,279.0,721.0
DETROIT AREA OF RESPONSIBILITY,417.0,1160.0
EL PASO AREA OF RESPONSIBILITY,312.0,572.0
HARLINGEN AREA OF RESPONSIBILITY,1612.0,1412.0


In [298]:
get_summary_table(eq_days_pre_post_trump[eq_days_pre_post_trump['apprehension_method']=='CAP LOCAL INCARCERATION'], groupby_col='apprehension_aor').sort_values(by='num_increase')

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,fact_increase,num_increase
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
HARLINGEN AREA OF RESPONSIBILITY,1612.0,1412.0,6.910447,3.079406,0.875931,-200.0
BOSTON AREA OF RESPONSIBILITY,191.0,166.0,0.818794,0.362026,0.86911,-25.0
BUFFALO AREA OF RESPONSIBILITY,20.0,47.0,0.085738,0.102501,2.35,27.0
SEATTLE AREA OF RESPONSIBILITY,22.0,55.0,0.094311,0.119949,2.5,33.0
SAN DIEGO AREA OF RESPONSIBILITY,16.0,56.0,0.06859,0.122129,3.5,40.0
NEW YORK CITY AREA OF RESPONSIBILITY,19.0,76.0,0.081451,0.165747,4.0,57.0
BALTIMORE AREA OF RESPONSIBILITY,178.0,312.0,0.763064,0.680435,1.752809,134.0
SAN FRANCISCO AREA OF RESPONSIBILITY,127.0,275.0,0.544433,0.599743,2.165354,148.0
NEWARK AREA OF RESPONSIBILITY,838.0,1013.0,3.592404,2.209234,1.208831,175.0
LOS ANGELES AREA OF RESPONSIBILITY,98.0,295.0,0.420114,0.64336,3.010204,197.0


In [252]:
arrests_df.head(50)

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier,county,jail,prison,facility,other_facility_loc_info,trump,apprehension_month_year,apprehension_day,citizenship_continent
0,2024-08-07 09:43:00,VIRGINIA,WASHINGTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,NON-CUSTODIAL ARREST,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2024-08-19,HONDURAS,YES,1981,HONDURAS,MALE,"HBG GENERAL AREA, NON-SPECIFIC",0000b34edd657d516c02b13a7c352d62d0effcb6,,,,,,False,2024-08,2024-08-07,NORTH AND CENTRAL AMERICA
1,2024-10-19 08:33:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-10-22,MEXICO,YES,1984,MEXICO,MALE,"HARRIS COUNTY JAIL, HOUSTON, TX",0000ba6e459998a6046d185d82cf4349de1479d0,"HARRIS COUNTY HOUSTON, TX",HARRIS COUNTY JAIL,,HARRIS COUNTY JAIL,"HOUSTON, TX",False,2024-10,2024-10-19,NORTH AND CENTRAL AMERICA
2,2025-04-15 10:08:00,NEW JERSEY,NEWARK AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP FEDERAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2025-06-10,DOMINICAN REPUBLIC,YES,1988,DOMINICAN REPUBLIC,MALE,"FORT DIX EAST, NEW JERSEY",0000c3d23fb0e444864559575900d410c4e8490f,,,,,,True,2025-04,2025-04-15,NORTH AND CENTRAL AMERICA
3,2025-06-03 09:20:00,MINNESOTA,ST. PAUL AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,YES,1985,COLOMBIA,FEMALE,"SPM GENERAL AREA, NON-SPECIFIC",0000d3dbf8033b5f209f6547ffee5b84feb4f599,,,,,,True,2025-06,2025-06-03,SOUTH AMERICA
4,2025-01-21 05:41:00,,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,3-VOLUNTARY DEPARTURE CONFIRMED,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-01,MEXICO,YES,1983,MEXICO,MALE,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,000104d730bf021326c6dc0deb3dd575304136b5,,MIAMI DADE COUNTY JAIL,,MIAMI DADE COUNTY JAIL,,True,2025-01,2025-01-21,NORTH AND CENTRAL AMERICA
5,2025-01-09 10:32:00,FLORIDA,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-19,MEXICO,YES,1982,MEXICO,MALE,TAM-PINELLAS COUNTY JAIL,00011b4e29ae4488b3d8271fe4f456fba18a4a8b,,PINELLAS COUNTY JAIL,,PINELLAS COUNTY JAIL,,False,2025-01,2025-01-09,NORTH AND CENTRAL AMERICA
6,2024-08-13 03:15:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-08-15,MEXICO,YES,1989,MEXICO,MALE,"MTG GENERAL AREA, NON-SPECIFIC",000183e148002809968169256cadf6e64c932881,,,,,,False,2024-08,2024-08-13,NORTH AND CENTRAL AMERICA
7,2025-06-17 01:16:00,CALIFORNIA,SAN FRANCISCO AREA OF RESPONSIBILITY,DETAINED DOCKET CONTROL,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,NO,1994,COLOMBIA,MALE,"SFR GENERAL AREA, NON-SPECIFIC",00018ec9f8dc868818be675ab36d7a545072fff4,,,,,,True,2025-06,2025-06-17,SOUTH AMERICA
8,2024-05-02 09:53:00,,DALLAS AREA OF RESPONSIBILITY,ALTERNATIVES TO DETENTION,CAP LOCAL INCARCERATION,3 OTHER IMMIGRATION VIOLATOR,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-03-15,EL SALVADOR,YES,2005,VENEZUELA,MALE,DALLAS COUNTY GENERAL AREA,00019dde2673785c0559b93654448acd84cba48b,,,,,,False,2024-05,2024-05-02,SOUTH AMERICA
9,2025-01-26 01:46:00,TEXAS,DALLAS AREA OF RESPONSIBILITY,ALTERNATIVES TO DETENTION,LOCATED,3 OTHER IMMIGRATION VIOLATOR,8-EXCLUDED/REMOVED - INADMISSIBILITY,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-03-15,EL SALVADOR,YES,2005,VENEZUELA,MALE,DALLAS COUNTY GENERAL AREA,00019dde2673785c0559b93654448acd84cba48b,,,,,,True,2025-01,2025-01-26,SOUTH AMERICA


In [247]:
arrests_df[['apprehension_aor','facility']].value_counts(dropna=False).tail(50)

apprehension_aor                       facility               
CHICAGO AREA OF RESPONSIBILITY         PIERCE COUNTY JAIL         1
ST. PAUL AREA OF RESPONSIBILITY        CHICKASAW COUNTY JAIL      1
                                       CHARLES MIX COUNTY JAIL    1
CHICAGO AREA OF RESPONSIBILITY         POWELL COUNTY JAIL         1
                                       PUTNAME COUNTY JAIL        1
ST. PAUL AREA OF RESPONSIBILITY        CARLTON COUNTY JAIL        1
CHICAGO AREA OF RESPONSIBILITY         VILAS COUNTY JAIL          1
ST. PAUL AREA OF RESPONSIBILITY        BREMER COUNTY JAIL         1
DALLAS AREA OF RESPONSIBILITY          ARCHER COUNTY JAIL         1
                                       CHOCTAW COUNTY JAIL        1
ATLANTA AREA OF RESPONSIBILITY         BEAUFORT COUNTY JAIL       1
                                       BEN HILL COUNTY JAIL       1
SALT LAKE CITY AREA OF RESPONSIBILITY  CHURCHILL COUNTY JAIL      1
DENVER AREA OF RESPONSIBILITY          LINCOLN COUNTY