# **Missing Migrants - An Analysis on the Dangers of Seeking Asylum**

Crises across the world have always been the spark of mass migrations of refugees seeking asylum. Although, despite many countries having refugee asylum programs, a widespread reluctance to accept refugees persists. Without a safe home to return to and resistant neighboring counties to refugees, many migrants take their chances in one of two options:

1. Apply through a country's refugee program where acceptance isn't guaranteed (and unlikely in many cases)
2. Or, migrate illegally and risk your safety for the possibility of a new start

According to the [United Nations Refugee Agency](https://www.unrefugees.org/refugee-facts/statistics/) - by the end of 2019, 79.9 million individuals were forcibly displaced worldwide as a result of persecution, conflict, violence or human rights violations. And this trend isn't decreasing (8.7 million increase from the previous year). In fact, new displacement remains very high. One person becomes displaced every 3 seconds – less than the time it takes to read this sentence. That’s 20 people who are newly displaced every minute. In 2019, there were over 30,000 new displacements each day.

## Data Scraping

The challenge with studying refugee migration is that there is often a lack of official data due to the many refugees being forced to seek asylum illegally or outside the mainstream process. Despite this challenge, the Missing Migrant Project was started in 2013 after 368 migrants died in two shipwrecks near the Italian island of Lampedusa in order to track the deaths of migrants, including refugees and asylum-seekers, who have gone missing along mixed migration routes worldwide.

The following CSV dataset was posted as open source data on [Kaggle](https://www.kaggle.com/snocco/missing-migrants-project): 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Using the link above we can download the CSV to our local device. Ensuring that the dataset is in the same location as our notebook, we can use pandas to read in our CSV file and print the head of the dataset.

In [7]:
migrantData = pd.read_csv('MissingMigrants-Global-2019-12-31_correct.csv')
print(migrantData.shape)
migrantData.head()

(5987, 20)


Unnamed: 0,Web ID,Region of Incident,Reported Date,Reported Year,Reported Month,Number Dead,Minimum Estimated Number of Missing,Total Dead and Missing,Number of Survivors,Number of Females,Number of Males,Number of Children,Cause of Death,Location Description,Information Source,Location Coordinates,Migration Route,URL,UNSD Geographical Grouping,Source Quality
0,52673,Mediterranean,"December 30, 2019",2019,Dec,1.0,,1,11.0,,,,Hypothermia,Unspecififed location off the coast of Algeria,El Watan,"35.568972356329, -1.289773129748",Western Mediterranean,https://bit.ly/2FqQHo4,Uncategorized,1
1,52666,Mediterranean,"December 30, 2019",2019,Dec,1.0,,1,,,1.0,,Presumed drowning,"Recoverd on Calamorcarro Beach, Ceuta",El Foro de Ceuta,"35.912383552874, -5.357673338898",Western Mediterranean,https://bit.ly/39yKRyF,Uncategorized,1
2,52663,East Asia,"December 27, 2019",2019,Dec,5.0,,5,,,3.0,,Unknown,"Bodies found on boat near Sado Island, Niigata...","Japan Times, Kyodo News, AFP","38.154018233313, 138.086032653130",,"http://bit.ly/2sCnBz1, http://bit.ly/2sEra83, ...",Eastern Asia,3
3,52662,Middle East,"December 26, 2019",2019,Dec,7.0,,7,64.0,,,,Drowning,"Van lake near Adilcevaz, Bitlis, Turkey","EFE, BBC, ARYnews","38.777228612085, 42.739257582031",,"http://bit.ly/2ZG2Y19, http://bit.ly/2MLamDf, ...",Western Asia,3
4,52661,Middle East,"December 24, 2019",2019,Dec,12.0,,12,,,,,Air strike,"Al-Raqw market in Saada, Yemen","UN Humanitarian Coordinator in Yemen, Qatar Tr...","17.245364805636, 43.239093360326",,"http://bit.ly/2FjolvD, http://bit.ly/2sD42GR, ...",Western Asia,4


Now that we have our dataset imported into our notebook, let's take a look at what this dataset contains. We see that the dataset has just under 6000 observations and 20 variables. Some key variables that stand out are where the incident occurred, the date, the number of individuals missing or dead, and causes of death.

Make note of the large amount of NaN values in our dataset. As we discussed earlier, due to the nature of refugee migrations, it is often difficult to find consistent datasets without missing information. Without a standardized method of tracking refugee migration, there is often gaps in data. This will pose a challenge in our analysis if left undealt with. We will address these gaps in data in the next phase - Data Processing.

# Data Processing

Let's first begin by understanding what is in our dataset. This will dictate how we can use this information hypothesize relationships between our data, where we need to fill in gaps, and classify what information is important and what is not needed. 

First we are going to take a look at where a majority of 