# Task 4 - Analysis

## Question 1
From Task 3 Step 1, analyze how the national background of those arrested changed before and after the start of the second Trump administration, broken down by ICE Area of Responsibility (AOR).


### 0- Setup

In [1]:
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
import altair as alt

import process_data

In [2]:
pd.set_option("display.max_rows", 300)
pd.set_option("display.max_columns", 200)

In [3]:
arrests_filename = 'arrests-0923-0625.xlsx'
cwd = Path.cwd()
root = cwd.parent
data = root / "data"

In [4]:
arrests_df = process_data.read_arrests_data(data/arrests_filename)

In [5]:
arrests_df.head()

Unnamed: 0,apprehension_date,apprehension_state,apprehension_aor,final_program,apprehension_method,apprehension_criminality,case_status,case_category,departed_date,departure_country,final_order_yes_no,birth_year,citizenship_country,gender,apprehension_site_landmark,unique_identifier
0,2024-08-07 09:43:00,VIRGINIA,WASHINGTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,NON-CUSTODIAL ARREST,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2024-08-19,HONDURAS,YES,1981,HONDURAS,MALE,"HBG GENERAL AREA, NON-SPECIFIC",0000b34edd657d516c02b13a7c352d62d0effcb6
1,2024-10-19 20:33:00,TEXAS,HOUSTON AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,1 CONVICTED CRIMINAL,6-DEPORTED/REMOVED - DEPORTABILITY,[16] REINSTATED FINAL ORDER,2024-10-22,MEXICO,YES,1984,MEXICO,MALE,"HARRIS COUNTY JAIL, HOUSTON, TX",0000ba6e459998a6046d185d82cf4349de1479d0
2,2025-04-15 10:08:21,NEW JERSEY,NEWARK AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP FEDERAL INCARCERATION,1 CONVICTED CRIMINAL,8-EXCLUDED/REMOVED - INADMISSIBILITY,[16] REINSTATED FINAL ORDER,2025-06-10,DOMINICAN REPUBLIC,YES,1988,DOMINICAN REPUBLIC,MALE,"FORT DIX EAST, NEW JERSEY",0000c3d23fb0e444864559575900d410c4e8490f
3,2025-06-03 09:20:00,MINNESOTA,ST. PAUL AREA OF RESPONSIBILITY,FUGITIVE OPERATIONS,NON-CUSTODIAL ARREST,3 OTHER IMMIGRATION VIOLATOR,ACTIVE,[8G] EXPEDITED REMOVAL - CREDIBLE FEAR REFERRAL,NaT,,YES,1985,COLOMBIA,FEMALE,"SPM GENERAL AREA, NON-SPECIFIC",0000d3dbf8033b5f209f6547ffee5b84feb4f599
4,2025-01-21 17:41:00,,MIAMI AREA OF RESPONSIBILITY,ERO CRIMINAL ALIEN PROGRAM,CAP LOCAL INCARCERATION,2 PENDING CRIMINAL CHARGES,3-VOLUNTARY DEPARTURE CONFIRMED,[8C] EXCLUDABLE / INADMISSIBLE - ADMINISTRATIV...,2025-02-01,MEXICO,YES,1983,MEXICO,MALE,MIAMI DADE COUNTY JAIL TURNER GUILFORD KNIGHT ...,000104d730bf021326c6dc0deb3dd575304136b5


### 1- Data exploration and processing

**Key info:**
* Trump was inaugerated on 20th January 2025

**Dates approach:**
* Will likely want to look at month totals to smooth out -> create month var
* More data pre Trump, so will want to do some analysis looking at same number of days/months pre and post (for overall numbers/proportions etc.)

**National background approach:**
* Have the number of citizenship countries changed?
* What about the specific countries - what has changed about them?
   * Add in continent?
   * Any other country meta data to add in?
* Is this different depending on ICE Area of Responsibility (AOR)?

##### ICE Area of Responsibilitiy (AOR)

In [6]:
arrests_df['apprehension_aor'].isna().sum()

5903

In [7]:
arrests_df['apprehension_aor'].value_counts(dropna=False)

apprehension_aor
MIAMI AREA OF RESPONSIBILITY             26925
NEW ORLEANS AREA OF RESPONSIBILITY       23598
DALLAS AREA OF RESPONSIBILITY            23185
HOUSTON AREA OF RESPONSIBILITY           21785
CHICAGO AREA OF RESPONSIBILITY           18370
ATLANTA AREA OF RESPONSIBILITY           16596
SAN ANTONIO AREA OF RESPONSIBILITY       15161
HARLINGEN AREA OF RESPONSIBILITY         11607
LOS ANGELES AREA OF RESPONSIBILITY       11575
NEWARK AREA OF RESPONSIBILITY             8506
PHOENIX AREA OF RESPONSIBILITY            8351
NEW YORK CITY AREA OF RESPONSIBILITY      8101
SALT LAKE CITY AREA OF RESPONSIBILITY     7664
BOSTON AREA OF RESPONSIBILITY             7148
WASHINGTON AREA OF RESPONSIBILITY         6955
PHILADELPHIA AREA OF RESPONSIBILITY       6240
ST. PAUL AREA OF RESPONSIBILITY           6124
NaN                                       5903
DETROIT AREA OF RESPONSIBILITY            5400
SAN FRANCISCO AREA OF RESPONSIBILITY      5277
DENVER AREA OF RESPONSIBILITY             4

In [8]:
arrests_df['apprehension_aor'].fillna('MISSING', inplace = True) 
# done because I don't want to lose these from the analysis, there may be a reason that AOR is missing

##### Citizenship country

In [9]:
arrests_df['citizenship_country'].isna().sum()

0

In [10]:
arrests_df['citizenship_country'].value_counts().reset_index().head(10)

Unnamed: 0,citizenship_country,count
0,MEXICO,101036
1,GUATEMALA,32638
2,HONDURAS,29628
3,VENEZUELA,15238
4,NICARAGUA,14688
5,EL SALVADOR,12041
6,COLOMBIA,9943
7,ECUADOR,9339
8,CUBA,6205
9,DOMINICAN REPUBLIC,5065


##### Adding continent info

**Note** this is a clunky, quick way to do this just to see whether it is something to include in the analysis. In the next steps of this analysis it would be good to bring in additional country information, and think about what geographic area is interesting (e.g. instead of contintent we could use slightly corser geographic areas that are meaningful, such as "Western Asia", "Central America" etc.)

In [11]:
continent_dict = {
    'NORTH AND CENTRAL AMERICA': ['TURKS AND CAICOS ISLANDS','TRINIDAD AND TOBAGO','ST. VINCENT-GRENADINES', 'ST. LUCIA','ST. KITTS-NEVIS',
                              'SINT EUSTATIUS', 'SINT MAARTEN(DUTCH)', 'PANAMA','NICARAGUA','NETHERLANDS ANTILLES', 'MONTSERRAT','MEXICO',
                                'JAMAICA','HAITI', 'HONDURAS','GRENADA', 'GUADELOUPE', 'GUATEMALA','CURACAO','CANADA', 'ANGUILLA','ANTIGUA-BARBUDA',
                                  'BAHAMAS', 'BARBADOS', 'BELIZE','BOLIVIA','COSTA RICA','CUBA','DOMINICA', 'DOMINICAN REPUBLIC','EL SALVADOR','BRITISH VIRGIN ISLANDS'],
     'SOUTH AMERICA': ['VENEZUELA','URUGUAY','SURINAME','PERU','PARAGUAY','GUYANA','FRENCH GUIANA','ECUADOR','COLOMBIA', 'CHILE','BRAZIL','ARGENTINA'],
     'EUROPE': ['YUGOSLAVIA','USSR','UNITED KINGDOM','UKRAINE','SWITZERLAND','SWEDEN','SPAIN','SLOVENIA','SLOVAKIA', 'SERBIA AND MONTENEGRO','SERBIA',
               'RUSSIA','ROMANIA','PORTUGAL','POLAND', 'NORWAY','NORTH MACEDONIA','NETHERLANDS','MONTENEGRO','MOLDOVA','MALTA','LITHUANIA','LATVIA',
               'KOSOVO','ITALY','IRELAND','ICELAND','HUNGARY', 'GREECE','GERMANY','FRANCE','FINLAND','ESTONIA','DENMARK','CZECHOSLOVAKIA','CZECH REPUBLIC',
               'CYPRUS','CROATIA','BULGARIA','BOSNIA-HERZEGOVINA','BELGIUM','BELARUS','AUSTRIA','ALBANIA', 'ANDORRA'],
     'AFRICA': ['ZIMBABWE','ZAMBIA','UGANDA','TUNISIA','TOGO','TANZANIA','SUDAN','SOUTH SUDAN','SOUTH AFRICA','SOMALIA','SIERRA LEONE','SENEGAL',
               'SAO TOME AND PRINCIPE','RWANDA','NIGERIA','NIGER', 'NAMIBIA','MOZAMBIQUE','MOROCCO','MAURITIUS','MAURITANIA','MALI','LIBYA',
               'LIBERIA','KENYA','IVORY COAST','GUINEA', 'GUINEA-BISSAU','GHANA','GAMBIA','GABON','EGYPT','ETHIOPIA','ESWATINI','ERITREA','EQUATORIAL GUINEA',
               'DJIBOUTI','DEM REP OF THE CONGO','CONGO','CHAD','CENTRAL AFRICAN REPUBLIC','CAPE VERDE','CAMEROON','BURUNDI','BURKINA FASO','BOTSWANA',
               'BENIN','ANGOLA','ALGERIA'],
     'ASIA': ['YEMEN','VIETNAM','UZBEKISTAN','UNITED ARAB EMIRATES','TURKMENISTAN','TURKIYE','THAILAND','TAJIKISTAN','TAIWAN','SYRIA','SRI LANKA',
         'SOUTH KOREA','SAUDI ARABIA','PHILIPPINES','PAKISTAN','OMAN','NEPAL','MONGOLIA', 'MALAYSIA','MALAWI','LEBANON','LAOS','KYRGYZSTAN','KUWAIT',
         'KOREA','KAZAKHSTAN','JORDAN','JAPAN','ISRAEL','IRAQ','IRAN','INDONESIA', 'INDIA','HONG KONG', 'GEORGIA','EAST TIMOR','CHINA, PEOPLES REPUBLIC OF',
             'CAMBODIA','BURMA','BRUNEI','BHUTAN','BANGLADESH','BAHRAIN','AZERBAIJAN','ARMENIA', 'AFGHANISTAN'],
     'OCEANIA': ['TONGA','SAMOA','PAPUA NEW GUINEA','PALAU','NEW ZEALAND','MICRONESIA, FEDERATED STATES OF','MARSHALL ISLANDS','FRENCH POLYNESIA',
                'FIJI', 'AUSTRALIA']}

In [12]:
country_continent_lookup = {}

for cont in continent_dict.keys():
    for coun in continent_dict[cont]:
        country_continent_lookup[coun] = cont
        

In [13]:
arrests_df['citizenship_continent'] = arrests_df['citizenship_country'].map(country_continent_lookup)

##### Date vars


In [14]:
trump_inaugaration_date = datetime.strptime('2025-01-20','%Y-%m-%d').date()

In [15]:
arrests_df['trump'] = np.where(
    arrests_df['apprehension_date'].dt.date >= trump_inaugaration_date, True, False)

In [16]:
arrests_df['apprehension_month_year'] = arrests_df['apprehension_date'].dt.to_period('M')
arrests_df['apprehension_day'] = arrests_df['apprehension_date'].dt.date

**Note** - the below is a rough approach, taking the exact same number of days pre and post the trump inaugeration. In future steps of this research I would check this makes sense, and whether we need to take into account seasonality, day of week, holidays etc. 

In [17]:
number_days_trump_administration = (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days
start_date = trump_inaugaration_date - timedelta(number_days_trump_administration)

eq_days_pre_post_trump = arrests_df[arrests_df['apprehension_day'] >= start_date]


In [18]:
# check equal:

(trump_inaugaration_date - start_date).days == (arrests_df['apprehension_day'].max() - trump_inaugaration_date).days

True

In [19]:
eq_days_pre_post_trump['trump'].value_counts()

trump
True     111590
False     44655
Name: count, dtype: int64

### 2- Analysis

In [20]:
def get_summary_table(df, groupby_col='citizenship_country'):
    num_arrests_by_cc = df.groupby('trump')[groupby_col].value_counts().reset_index().rename(columns={0:'number_arrests'})
    
    pivot_num_by_cc = num_arrests_by_cc.pivot(index=groupby_col, values='count', columns='trump')
    pivot_num_by_cc.columns = [str(x) for x in pivot_num_by_cc.columns]

    for c in pivot_num_by_cc.columns:
        pivot_num_by_cc[f'{c}_perc_arrests'] = (pivot_num_by_cc[c] / pivot_num_by_cc[c].sum()) * 100

    pivot_num_by_cc['prop_increase'] = pivot_num_by_cc['True'] / pivot_num_by_cc['False']
    pivot_num_by_cc['num_increase'] = pivot_num_by_cc['True'] - pivot_num_by_cc['False']

    return pivot_num_by_cc

#### AOR:

First looking to see which AORs have had the largest jump in numbers of arrests

In [21]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='prop_increase', ascending=False)

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SAN DIEGO AREA OF RESPONSIBILITY,241,1303,0.539693,1.167667,5.406639,1062
BOSTON AREA OF RESPONSIBILITY,879,3957,1.968425,3.546017,4.501706,3078
WASHINGTON AREA OF RESPONSIBILITY,1023,4069,2.290897,3.646384,3.977517,3046
DENVER AREA OF RESPONSIBILITY,661,2243,1.480237,2.010037,3.393343,1582
BUFFALO AREA OF RESPONSIBILITY,317,1058,0.709887,0.948114,3.337539,741
DETROIT AREA OF RESPONSIBILITY,763,2503,1.708655,2.243033,3.280472,1740
PHILADELPHIA AREA OF RESPONSIBILITY,1004,3214,2.248348,2.880186,3.201195,2210
HQ AREA OF RESPONSIBILITY,7,22,0.015676,0.019715,3.142857,15
MIAMI AREA OF RESPONSIBILITY,4291,13353,9.609226,11.966126,3.111862,9062
ATLANTA AREA OF RESPONSIBILITY,2699,8330,6.044116,7.464827,3.086328,5631


In [22]:
get_summary_table(eq_days_pre_post_trump, 'apprehension_aor').sort_values(by='num_increase', ascending=False)

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MIAMI AREA OF RESPONSIBILITY,4291,13353,9.609226,11.966126,3.111862,9062
ATLANTA AREA OF RESPONSIBILITY,2699,8330,6.044116,7.464827,3.086328,5631
DALLAS AREA OF RESPONSIBILITY,4273,9159,9.568917,8.207725,2.143459,4886
NEW ORLEANS AREA OF RESPONSIBILITY,4725,9494,10.581122,8.507931,2.009312,4769
HOUSTON AREA OF RESPONSIBILITY,4426,8420,9.911544,7.545479,1.902395,3994
SAN ANTONIO AREA OF RESPONSIBILITY,2479,5946,5.55145,5.328434,2.398548,3467
LOS ANGELES AREA OF RESPONSIBILITY,1705,5113,3.818161,4.581952,2.998827,3408
CHICAGO AREA OF RESPONSIBILITY,3385,6658,7.580338,5.966484,1.966913,3273
BOSTON AREA OF RESPONSIBILITY,879,3957,1.968425,3.546017,4.501706,3078
WASHINGTON AREA OF RESPONSIBILITY,1023,4069,2.290897,3.646384,3.977517,3046


**Observations**
* Interesting to see the differnce in AORs with the largest increase in number of arrests vs proportion of arrests - from eyeballing, it looks like mainly cities are the ones with largest prop increase (San Diego, Boston, Washington (need to check if this means DC), Denver, Buffalo (NY), Detroit, Philly
* And then ones with the largest actual increase are ones which already had quite high numbers - Miami, Atlanta, Dallas, New Orleans, Houston, San Antonio, LA, Chicago -> NB, some of this is probably to do with population (e.g. LA and Chicago); would be good to add in populations (if mapping between county to aor is available)

##### Citizenship country

In [23]:
eq_days_pre_post_trump.groupby('trump')['citizenship_country'].nunique()

trump
False    168
True     181
Name: citizenship_country, dtype: int64

In [24]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_country')

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
citizenship_country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFGHANISTAN,33.0,103.0,0.0739,0.092302,3.121212,70.0
ALBANIA,12.0,28.0,0.026873,0.025092,2.333333,16.0
ALGERIA,7.0,12.0,0.015676,0.010754,1.714286,5.0
ANDORRA,,1.0,,0.000896,,
ANGOLA,23.0,53.0,0.051506,0.047495,2.304348,30.0
ANGUILLA,,1.0,,0.000896,,
ANTIGUA-BARBUDA,1.0,4.0,0.002239,0.003585,4.0,3.0
ARGENTINA,22.0,76.0,0.049267,0.068106,3.454545,54.0
ARMENIA,18.0,45.0,0.040309,0.040326,2.5,27.0
AUSTRALIA,4.0,14.0,0.008958,0.012546,3.5,10.0


In [25]:
get_summary_table(eq_days_pre_post_trump, 'citizenship_continent')

Unnamed: 0_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
citizenship_continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFRICA,560,1738,1.254059,1.557487,3.103571,1178
ASIA,1181,4295,2.644721,3.848911,3.636749,3114
EUROPE,578,1396,1.294368,1.251008,2.415225,818
NORTH AND CENTRAL AMERICA,36601,86157,81.963946,77.208531,2.353952,49556
OCEANIA,85,152,0.190348,0.136213,1.788235,67
SOUTH AMERICA,5650,17852,12.652559,15.997849,3.159646,12202


In [26]:
aor_summary_country = eq_days_pre_post_trump.groupby('apprehension_aor').apply(get_summary_table)

In [27]:
aor_summary_country.sort_values(by='num_increase', ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ATLANTA AREA OF RESPONSIBILITY,MEXICO,1210.0,3356.0,44.831419,40.288115,2.773554,2146.0
MIAMI AREA OF RESPONSIBILITY,GUATEMALA,836.0,2880.0,19.482638,21.568187,3.444976,2044.0
DALLAS AREA OF RESPONSIBILITY,MEXICO,2545.0,4550.0,59.560028,49.677912,1.787819,2005.0
HARLINGEN AREA OF RESPONSIBILITY,MEXICO,1406.0,3196.0,65.243619,81.885729,2.273115,1790.0
MIAMI AREA OF RESPONSIBILITY,MEXICO,1025.0,2753.0,23.887206,20.61709,2.685854,1728.0
LOS ANGELES AREA OF RESPONSIBILITY,MEXICO,928.0,2577.0,54.428152,50.400939,2.77694,1649.0
NEW ORLEANS AREA OF RESPONSIBILITY,MEXICO,1944.0,3542.0,41.142857,37.307773,1.822016,1598.0
HOUSTON AREA OF RESPONSIBILITY,MEXICO,2149.0,3637.0,48.553999,43.194774,1.692415,1488.0
CHICAGO AREA OF RESPONSIBILITY,MEXICO,1437.0,2709.0,42.451994,40.687894,1.885177,1272.0
PHOENIX AREA OF RESPONSIBILITY,MEXICO,1286.0,2454.0,76.275208,69.479049,1.908243,1168.0


In [28]:
aor_summary_country[aor_summary_country['True']>50].sort_values(by='prop_increase', ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,citizenship_country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
WASHINGTON AREA OF RESPONSIBILITY,CUBA,2.0,53.0,0.195503,1.302531,26.5,51.0
LOS ANGELES AREA OF RESPONSIBILITY,IRAN,3.0,68.0,0.175953,1.329943,22.666667,65.0
SAN FRANCISCO AREA OF RESPONSIBILITY,VENEZUELA,3.0,62.0,0.344037,2.85846,20.666667,59.0
SALT LAKE CITY AREA OF RESPONSIBILITY,CUBA,3.0,57.0,0.23753,1.64076,19.0,54.0
LOS ANGELES AREA OF RESPONSIBILITY,"CHINA, PEOPLES REPUBLIC OF",10.0,179.0,0.58651,3.50088,17.9,169.0
SAN FRANCISCO AREA OF RESPONSIBILITY,INDIA,9.0,144.0,1.03211,6.639004,16.0,135.0
SAN DIEGO AREA OF RESPONSIBILITY,VENEZUELA,6.0,80.0,2.489627,6.139678,13.333333,74.0
SAN ANTONIO AREA OF RESPONSIBILITY,CUBA,29.0,345.0,1.169827,5.80222,11.896552,316.0
WASHINGTON AREA OF RESPONSIBILITY,BOLIVIA,16.0,180.0,1.564027,4.423691,11.25,164.0
NEW YORK CITY AREA OF RESPONSIBILITY,SENEGAL,5.0,56.0,0.621118,2.307375,11.2,51.0


In [29]:
aor_summary_continent = eq_days_pre_post_trump.groupby('apprehension_aor').apply(get_summary_table,'citizenship_continent')

In [30]:
aor_summary_continent

Unnamed: 0_level_0,Unnamed: 1_level_0,False,True,False_perc_arrests,True_perc_arrests,prop_increase,num_increase
apprehension_aor,citizenship_continent,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ATLANTA AREA OF RESPONSIBILITY,AFRICA,36.0,109.0,1.333827,1.308523,3.027778,73.0
ATLANTA AREA OF RESPONSIBILITY,ASIA,42.0,160.0,1.556132,1.920768,3.809524,118.0
ATLANTA AREA OF RESPONSIBILITY,EUROPE,17.0,85.0,0.629863,1.020408,5.0,68.0
ATLANTA AREA OF RESPONSIBILITY,NORTH AND CENTRAL AMERICA,2300.0,6698.0,85.216747,80.408163,2.912174,4398.0
ATLANTA AREA OF RESPONSIBILITY,OCEANIA,,3.0,,0.036014,,
ATLANTA AREA OF RESPONSIBILITY,SOUTH AMERICA,304.0,1275.0,11.263431,15.306122,4.194079,971.0
BALTIMORE AREA OF RESPONSIBILITY,AFRICA,34.0,69.0,4.906205,3.833333,2.029412,35.0
BALTIMORE AREA OF RESPONSIBILITY,ASIA,11.0,74.0,1.587302,4.111111,6.727273,63.0
BALTIMORE AREA OF RESPONSIBILITY,EUROPE,12.0,32.0,1.731602,1.777778,2.666667,20.0
BALTIMORE AREA OF RESPONSIBILITY,NORTH AND CENTRAL AMERICA,580.0,1410.0,83.694084,78.333333,2.431034,830.0


In [200]:
num_countries_pre_post_by_aor = eq_days_pre_post_trump.groupby(['apprehension_aor', 'trump'])['citizenship_country'].nunique().unstack()
num_countries_pre_post_by_aor.columns = [str(c) for c in num_countries_pre_post_by_aor.columns]

In [201]:
num_countries_pre_post_by_aor['delta'] = num_countries_pre_post_by_aor['True'] - num_countries_pre_post_by_aor['False']

In [202]:
num_countries_pre_post_by_aor

Unnamed: 0_level_0,False,True,delta
apprehension_aor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ATLANTA AREA OF RESPONSIBILITY,60,96,36
BALTIMORE AREA OF RESPONSIBILITY,40,71,31
BOSTON AREA OF RESPONSIBILITY,62,98,36
BUFFALO AREA OF RESPONSIBILITY,44,78,34
CHICAGO AREA OF RESPONSIBILITY,74,96,22
DALLAS AREA OF RESPONSIBILITY,64,99,35
DENVER AREA OF RESPONSIBILITY,29,58,29
DETROIT AREA OF RESPONSIBILITY,53,81,28
EL PASO AREA OF RESPONSIBILITY,30,40,10
HARLINGEN AREA OF RESPONSIBILITY,25,35,10


In trump's administration the number of countries people are being arrested from has increased in all AORs - this lends to idea of "indiscriminately"

#### Change over time (instead of just pre post)

Overall stats of pre and post trump give us some indications of patterns, but we can't see the true picture of what is happening. Next I am going to look at change over time.

In [154]:
def aor_month_cc(df, top_countries=None, aor_str=None, n=5):
    if aor_str is None:
        aor_str = df['apprehension_aor'].unique().item()
    aor_group = df[df['apprehension_aor'] == aor_str]
    aor_group_cc = aor_group[['apprehension_month_year','citizenship_country']].value_counts().reset_index().rename(columns={'count':'num_arrests'})
    if top_countries is None: # done so I can feed in top countries if I want them to be consisten across aors, rather than unique to the top for that aor
        top_countries = aor_group_cc.groupby('citizenship_country')['num_arrests'].sum().sort_values(ascending=False).head(n).index
    aor_group_cc['summary_country'] = np.where(
            aor_group_cc['citizenship_country'].isin(top_countries), aor_group_cc['citizenship_country'], 'OTHER COUNTRY')
    return aor_group_cc

In [174]:
def plot_area_chart(df,aor_str,top_countries=None):
    aor_summary = aor_month_cc(df, top_countries, aor_str)
    aor_summary['apprehension_month_year'] = aor_summary['apprehension_month_year'].dt.strftime('%Y-%m')
    return alt.Chart(aor_summary, title=aor_str).mark_bar().encode(
        x='apprehension_month_year:O',
        y='sum(num_arrests)',
        color=alt.Color('summary_country', sort=['OTHER COUNTRY'])).properties(width=800, height=400)
    

In [162]:
top_countries_overall = lambda n: arrests_df[arrests_df['trump']==True]['citizenship_country'].value_counts().head(n).index

In [177]:
plot_area_chart(arrests_df, 'WASHINGTON AREA OF RESPONSIBILITY', top_countries_overall(5))

Other ways to look at this would be:
* Number of citizenship countries by month by AOR

In [182]:
num_countries_over_time = arrests_df.groupby(['apprehension_aor','apprehension_month_year'])['citizenship_country'].nunique().reset_index()

num_countries_over_time['apprehension_month_year'] = num_countries_over_time['apprehension_month_year'].dt.strftime('%Y-%m')

In [208]:
num_countries_over_time['trump'] = '2025-01'

In [221]:
alt.Chart(num_countries_over_time).mark_line().encode(
    x=alt.X('apprehension_month_year:O', axis=alt.Axis( grid = False, values=['2025-01'])),
    y='citizenship_country',
    color=alt.Color('apprehension_aor', legend=None),
    tooltip=['apprehension_aor']).properties(width=200,height=200).facet(
                    facet='apprehension_aor',
                    columns=4)
    

## Question 2
Select one question from Task 3 Steps 2-4 and complete the analysis.

**Question:** are the AORs that have seen the big spike in "other countries" also the places that we are seeing local enforcement partnerships (ie this looks like "rounding up immigrants indiscriminantly")