# Syrian Immigration Flows

### by Siobhán K Cronin

The UN Refugee Agency is a "global organisation dedicated to saving lives, protecting rights and building a better future for refugees, forcibly displaced communities and stateless people." In regards to continuing Syrian Emergency, the UNHCR has made an appeal this year for $8 billion USD in funding that will go to aid Syrian refugees. Understanding the asylum seeking rates of Syrian refugees in countries of residence would help the UNHCR better understand the how best to allocate these resources, and that is the focus of this EDA. 

In [None]:
%load_ext autoreload
%autoreload 1

import numpy as np
import pandas as pd
import seaborn as sns
import datetime
import matplotlib.pyplot as plt
from IPython.display import display 

plt.style.use('ggplot')

%matplotlib inline

## Data cleaning

In [None]:
seekers = pd.read_csv('../input/asylum_seekers.csv')

In [None]:
# Fill missing values with consistent value
seekers = seekers.fillna(value = "")

In [None]:
# Replace all '*' with ''
seekers = seekers.replace(['*'], ['']) 

In [None]:
# Standardize column names
labels = ['year', 
          'country_of_residence', 
          'country_of_origin', 
          'rsd_type', 
          'total_pending_at_year_start', 
          'total_pending_year_start_UNHCR_assisted', 
          'applied_during_year',
          'decisions_recognized',
          'decisions_other', 
          'rejected',
          'otherwise_closed',
          'total_decisions',
          'total_pending_at_year_end', 
          'total_pending_year_end_UNHCR_assisted']

In [None]:
seekers.columns = labels

In [None]:
# Convert 'applied_during_year' to numeric 
seekers['applied_during_year'] = pd.to_numeric(seekers['applied_during_year'], errors='coerce')

In [None]:
# Convert all numeric columns to floats 
for col in ['total_pending_at_year_start', \
            'total_pending_year_start_UNHCR_assisted', \
            'applied_during_year', \
            'decisions_recognized', \
            'decisions_other', \
            'rejected', \
            'otherwise_closed', \
            'total_decisions', \
            'total_pending_at_year_end', \
            'total_pending_year_end_UNHCR_assisted']:
    seekers[col] = pd.to_numeric(seekers[col], errors='coerce')
    seekers[col] = seekers[col].astype('float')

In [None]:
# Format 'rsd_type' as list of strings
seekers.rsd_type = [[x[:1], x[4:]] for x in seekers.rsd_type]

In [None]:
seekers[0:3]

In [None]:
# Replace lengthy country names for clarity 
seekers = seekers.replace(['Syrian Arab Rep.'],['Syria'])
seekers = seekers.replace(['Serbia and Kosovo (S/RES/1244 (1999))'],['Serbia/Kosovo'])
seekers = seekers.replace(['Venezuela (Bolivarian Republic of)'],['Venezuela'])

## Comparing Country of Origin Application Rates 

In [None]:
# How many unique countries of residence are there?
len(seekers.country_of_residence.unique())

In [None]:
# How many unique countries of origin are there?
len(seekers.country_of_origin.unique())

In [None]:
# Filter seekers by year
seekers_2016 = seekers.query('year == 2016')
seekers_2010 = seekers.query('year == 2010')

In [None]:
# Top 10 countries of origin in 2016
sums_2016 = seekers_2016.groupby(['country_of_origin'])[['applied_during_year']].aggregate('sum')
top_10_countries_of_origin_2016 = sums_2016.applied_during_year.sort_values(ascending=False)[:10]
chart_2016 = top_10_countries_of_origin_2016.plot.barh(
    figsize = [16, 8], 
    fontsize = 14, 
    title = '2016 Applications - Top 10 Countries of Origin', 
    color = 'blue')
chart_2016.set_ylabel('')
chart_2016

In [None]:
# Top 10 countries of origin in 2016

sums_2010 = seekers_2010.groupby(['country_of_origin'])[['applied_during_year']].aggregate('sum')
top_10_countries_of_origin_2010 = sums_2010.applied_during_year.sort_values(ascending=False)[:10]
chart_2010 = top_10_countries_of_origin_2010.plot.barh(
    figsize = [16, 8],
    fontsize = 14,
    title = '2010 Applications - Top 10 Countries of Origin')
chart_2010.set_ylabel('')
chart_2010

Since there are several changes in application trends between 2010 and 2016, it would be interesting to plot countries changes over the 6 years. Let's start by selecting a handful of countries of interest.

Measuring the biggest increase (shades of red) and the biggest decrease (shades of blue)

## Syrian Refugee Destinations

In [None]:
# Create Syrian datafame
syrian = seekers.query("country_of_origin == 'Syria'")

In [None]:
# Number of records
len(syrian)

In [None]:
# Number of countries of residence
len(syrian.country_of_residence.unique())

In [None]:
country_counts = syrian['country_of_residence'].value_counts()
top_10_countries_of_residence = country_counts.sort_values(ascending=False)[:10]
top_10 = top_10_countries_of_residence.plot.barh(figsize =(12,12))
top_10.set_title('Top 10 Countries of Residence for Syrian Refugees')

## Resettlement Timecourse

I'm curious to know how Syrian resettlement to the top three nations (Germany, Cyprus, and the Netherlands) has changed overtime. Is it possible to see antecdedents to the Syrian crisis in asylum-seeking counts in advance of the war's outbreak in 2012? 

In [None]:
resettlement = pd.read_csv('../input/time_series.csv')

In [None]:
labels1 = ['year', 
          'country_of_residence', 
          'country_of_origin', 
          'population_type',
          'population_count']

In [None]:
resettlement.columns = labels1

In [None]:
syrian_resettlement = resettlement.query("country_of_origin == 'Syrian Arab Rep.'")

In [None]:
def select_country(country):
    return syrian_resettlement.query("country_of_residence == {} and population_type == 'Asylum-seekers'").format(country)

In [None]:
germany = syrian_resettlement.query("country_of_residence == 'Germany' and population_type == 'Asylum-seekers'")
cyprus = syrian_resettlement.query("country_of_residence == 'Cyprus' and population_type == 'Asylum-seekers'")
netherlands = syrian_resettlement.query("country_of_residence == 'Netherlands' and population_type == 'Asylum-seekers'")
ireland = syrian_resettlement.query("country_of_residence == 'Ireland' and population_type == 'Asylum-seekers'")
ukraine = syrian_resettlement.query("country_of_residence == 'Ukraine' and population_type == 'Asylum-seekers'")
sweden = syrian_resettlement.query("country_of_residence == 'Sweden' and population_type == 'Asylum-seekers'")
france = syrian_resettlement.query("country_of_residence == 'France' and population_type == 'Asylum-seekers'")
belgium = syrian_resettlement.query("country_of_residence == 'Belgium' and population_type == 'Asylum-seekers'")
norway = syrian_resettlement.query("country_of_residence == 'Norway' and population_type == 'Asylum-seekers'")
denmark = syrian_resettlement.query("country_of_residence == 'Denmark' and population_type == 'Asylum-seekers'")

In [None]:
fig = plt.figure(figsize=(12,8))
ax = plt.axes()

plt.plot(germany.year, germany.population_count, label = 'Germany')
plt.plot(cyprus.year, cyprus.population_count, label = 'Cyprus')
plt.plot(ireland.year, ireland.population_count, label = 'Ireland')
plt.plot(ukraine.year, ukraine.population_count, label = 'Ukraine')
plt.plot(sweden.year, sweden.population_count, label = 'Sweden')
plt.plot(france.year, france.population_count, label = 'France')
plt.plot(belgium.year, belgium.population_count, label = 'Belgium')
plt.plot(norway.year, norway.population_count, label = 'Norway')
plt.plot(denmark.year, denmark.population_count, label = 'Denmark')

plt.title("Assylum-Seeker Counts by Country (2000-2016)")
ax.legend(frameon=False)

While it can be tricky to spot trends for each country in the chart above given Germany's data (where assylum-seeker counts exceeded 1Million in 2016, thereby skewing the upper bounds of this chart's range), we can still pick out steady increases starting before 2012 in both Germany and Sweden. Holding these to aside for amoment, can we observe any other patterns in the remaining 8 nations?

In [None]:
fig = plt.figure(figsize=(12,8))
ax = plt.axes()

plt.plot(cyprus.year, cyprus.population_count, label = 'Cyprus')
plt.plot(ireland.year, ireland.population_count, label = 'Ireland')
plt.plot(ukraine.year, ukraine.population_count, label = 'Ukraine')
plt.plot(france.year, france.population_count, label = 'France')
plt.plot(belgium.year, belgium.population_count, label = 'Belgium')
plt.plot(norway.year, norway.population_count, label = 'Norway')
plt.plot(denmark.year, denmark.population_count, label = 'Denmark')

plt.title("Assylum-Seeker Counts by Country (2000-2016)")
ax.legend(frameon=False)

Now the story is more complex than what we first viewed. We might point out the increase in assylum-seeker counts for several nations starting around 2012. Yet, given Cyprus' proximity to Syria geographically (off-shore neighbor), I think it is interesting to point out the assylum-seeking counts between 2004 and 2010, which seem to anticipate the overall spikes we observe in 2012. If we look at the history of the region, we observe unrest for many years in advance of the current Syrian emergency, and this Cyprus assylum-seeking increase in the mid-2000s may very well relate to some of that activity. 

## More Questions 

* What is the age spread of refugees?
* How do assylum-seeker counts relate to total refugee counts? Can we infer that assylum-seeker tabulations are an effective indicator of refugee populations? If so, at what ratio?

## References

* UNHCR website: http://www.unhcr.org/en-us/about-us.html