# **Crises Collide:** is there a correlation between rising temperatures and migrant deaths in the Sonoran Desert of Arizona?
## **_Katie Sawyer_**
### December 19, 2022

***

**→** In this notebook, I will be comparing the number of migrant deaths on the Southern Arizona border with the increase in average monthly temberatures in the area due to climate change. I will be using climate data taken from the National Centers for Environmental Information operated by the National Oceanic and Atmospheric Administration (NOAA) at their Ajo, Arizona substation from 2010 to 2022, and an open database by Humane Borders mapping migrant deaths. I will need to rename columns to match the data, remove unnecessary data and columns, and isolate variables to get the comparison I am looking for.

### _The goal of this project is to inform readers of this issue through visualizations of data in my article. In tandem, I hope to find support for my hypothesis: that increased temperatures in the Sonoran Desert due to climate change correlates with the number of migrant deaths due to exposure._

 _Citations for the datasets are listed at the end of this notebook._

In [None]:
import pandas as pd 

In [None]:
dfExposure = pd.read_csv("ExposureMigrantDeaths.csv")

In [None]:
dfAllDeaths = pd.read_csv("AllMigrantDeaths.csv")

**_First, I'm going to change the reporting date to match my other data set, 'Ajo Temps', so that I can later merge them._**

In [None]:
dfExposure['Reporting Date'] = dfExposure['Reporting Date'].apply(lambda x: "-".join(x.split('-')[:-1]))
dfExposure = dfExposure.rename(columns={'Reporting Date':'DATE'})

In [None]:
dfExposure.head()

In [None]:
dfExposure.columns

**_Now I need to get rid of the hypothermia COD and isolate heat-related deaths, as these are the most relevant to my hypothesis._**

In [None]:
dfHyper = dfExposure[dfExposure['OME Determined COD'] != "HYPOTHERMIA"]

In [None]:
dfHyper

***

**_Great! In order to better illustrate my data for my story, now I'd like to create an interactive map that shows where the exposure deaths happened. I will need to import geopandas for this._**

**→** The following steps are attributed to my friend, Jona, who helped me write the code needed to create maps like these.

***

In [None]:
pip install geopandas

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import Point

import plotly.express as px

**_That worked! Before creating my map though, I'd like to first check for missing values just in case my column has NaN data._**

In [None]:
dfHyper.isna().sum()

**_Alright, now lets double check our array of unique elements using .unique._**

In [None]:
dfHyper['Cause of Death'].unique()

**_No problems there! I'm ready to move on. I already created my dataset that includes only the variables I am interested in (heat-exposure related deaths), so now I can create my scatter mapbox using dfHyper._**

In [None]:
fig = px.scatter_mapbox(dfHyper, 
                        lat="Latitude", 
                        lon="Longitude",
                        color='Cause of Death',
                        zoom=8, 
                        height=800,
                        width=800)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})


**_That was cool. Now I'm going to save that using the camera icon and include it in my story to give my readers a visualization of the crisis._**

**→** _Before moving on (and just out of curiosity), I'm also going to make a geopandas scatter.mapbox with all of the causes of death._

In [None]:
fig = px.scatter_mapbox(dfAllDeaths, 
                        lat="Latitude", 
                        lon="Longitude",
                        color='Cause of Death',
                        zoom=8, 
                        height=800,
                        width=800)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

***

**_Next on the task list, I want to combine the data so I can create a scatterplot. To do that, I need to islate the columns I plan to use and then merge them on the same 'Date' column I created earlier._**

In [None]:
dfTemps = pd.read_csv("AjoTemps1914-2022.csv", usecols=['DATE', 'TAVG', 'TMIN', 'TMAX'])

In [None]:
dfTemps

In [None]:
df_merged = dfHyper.merge(dfTemps, on=['DATE'])

In [None]:
df_merged

**_Alright! Now I have a merged file that I can use later to show the correlations between my data. But First, I want to show the general trend upward in temperatures._**

In [None]:
dfTemps

In [None]:
import seaborn as sns

In [None]:
dfTemps.info()

**_I have so much data, I first need to isolate my date range and specify only the summer months in order to get an accurate scatterplot. I also have to narrow down the date range to the dates that align with my migrant deaths dataset._**

In [None]:
dfTemps['dt'] = dfTemps['DATE'].apply(pd.to_datetime)
dfTemps['month'] = dfTemps['dt'].dt.month
dfTemps['year'] = dfTemps['dt'].dt.year
summer_months = [x for x in range(4,10)]

In [None]:
df_summer = dfTemps[dfTemps['month'].isin(summer_months)]
df_summer = df_summer[df_summer['year'] > 2009]
df_summer = df_summer[df_summer['year'] < 2022]

In [None]:
df_summer.to_csv('df_summer.csv')

In [None]:
df_summer.plot.scatter(x='year',y='TMAX')

**_Ok, that's kind of ugly. I'm going to try a different approach to illustrate the increase in temperatures over time._**

In [None]:
sns.regplot(x='year', y='TMAX',data=df_summer)
plt.ylabel('Max Temperature')

**_That's better. The line helps show the increase better than just the scatter points._** 

**→** Notice that I'm using the maximum monthly temperature as opposed to the average here. It makes sense to show the outlier high temperatures in this chart because these anomalies are indicative of global warming. However, when I combine the data sets below, I need to use the average monthly temperatures to get an accurate read on the data correlation — if there is one.

***

**_Ok, so here we are: The visulization we've been waiting for._** 

**→** Now I will combine the data into one plot to hopefully illustrate a correlation between an increase in temperatures and the rise in migrant deaths.

In [None]:
df_merged = dfHyper.merge(dfTemps, on=['DATE'])
df_merged = df_merged[['ML Number', 'DATE', 'TAVG']]
df_merged = df_merged.dropna()

In [None]:
df_deaths=df_merged.groupby('DATE').nunique()['ML Number'].reset_index()
df_deaths = df_deaths.merge(dfTemps.groupby("DATE").mean()['TAVG'].reset_index(), on=['DATE'])
df_deaths.dropna(inplace=True)

In [None]:
df_deaths

In [None]:
df_deaths_long = df_deaths.melt(id_vars='DATE')

In [None]:
sns.regplot(x='TAVG', y='ML Number',data=df_deaths)
plt.xlabel('Average Temperature')
plt.ylabel('Migrant Deaths')
plt.figure(figsize=(10,10))
plt.savefig('test.jpg',dpi=300)

**_& there she is!_**

**→** While this chart does not prove a causal relationship between rising temperatures and an increase in migrant deaths, it does show there is a correlation. It must be noted here that other factors, such as an increase in migration overall, could be confounding variables that are not accounted for here. This could cause a spurious relationship in my data.

***

**_And speaking of spurious variables, this reminds me to check the data for the frequency of deaths that are listed as "unknown" or simply "skeletal remains." This means I'll open the data set with all causes of death listed for migrants._**

In [None]:
dfAllDeaths['OME Determined COD'].value_counts()

**_This is really interesting. The most common cause of death listed is hyperthermia, which is unsurprising at this point. However, the next four largest categories are some form of undetermined cause of death. So, while hyperthermia is the leading cause of death for migrants according to this data, it may be much higher or lower on the list if these other migrants had determined causes of death._**

***

**_Finally, I decided to incorporate the code given to me by Freddy to see if I could plot the two data sets together. This proved difficult, and I was unable to figure out how to include a third axis._**

In [None]:
df_merged['Year'] = df_merged['DATE'].map(lambda x: x.split('-')[0])

In [None]:
df_merged = df_merged[['ML Number', 'Year', 'TAVG']]

In [None]:
df_year=df_merged.groupby('Year').nunique()['ML Number'].reset_index()
df_year = df_year.merge(df_merged.groupby("Year").mean()['TAVG'].reset_index(),on=['Year'])
df_year.dropna(inplace=True)

In [None]:
df_year

In [None]:
df_year_long = df_year.melt(id_vars='Year')

In [None]:
df_year_long

In [None]:
sns.lineplot(x='Year', y='value', hue='variable', data=df_year_long)

**_There should be a third axis here that separates temperature measurements from number of migrant deaths. That being said, while this dataset cannot be used in an article due to its illegibility for readers, I think it still supports my initial hypothesis. It must be noted that this data visualization represents the average of all 12 months, not just summer months — when most migrant deaths happen._**

***

# **FINDINGS & CITATIONS**

### In this notebook, I set out to visualize my datasets for my readership, and to find if there is a correlation between an increase in migrant deaths and an increase in hot temperatures due to climate change. 

**→** Using Geopandas, I was able to create an interactive, color-coded map that showed where each migrant lost their life. I was also able to isolate the variables for exposure and plot those points. Then, to investigate my hypothesis, I merged the data on the date column to see if the dates of high temperatures correlated with a spike in migrant deaths. In [29], we see that this is true. However, as noted above, confounding variables and the lack of concrete cause of death prevents me from finding a solid relationship between these two variables. Despite this, I believe these findings contribute to my story and paint a more vivid picture for my readers. 

### Citations

Arizona OpenGIS Initiative for Deceased Migrants. (n.d.). [Dataset]. https://humaneborders.info/app/map.asp 

Local Climatological Data - AJO 29 subsstation. (n.d.). [Dataset]. In AJO 29 S, AZ US. National Oceanic and Atmospheric Administration - National Centers for Environmental Information. https://www.ncei.noaa.gov/cdo-web/datasets/LCD/stations/WBAN:53168/detail 