# **A data driven analysis of the Refugee Crisis**

Niffyn Trouw - 14625970<br>
Max Kowalchuk – 14032031<br>
Thomas van Zuilen - 14746646

# Introduction

The refugee crisis has been an ongoing topic for decades, causing divides amongst many people across all corners of the world. The refugee crisis entails the forced displacement of large groups of people. It is a complex humanitarian challenge, often caused by both historical as well as current geopolitical conflicts and divisions.

The refugee crisis has many different perspectives, when it comes to the causes, as well as the effects, and the correct way of tackling it.

A common disagreement in this topic regards asylum procedure. One perspective is that the asylum procedure is effective, or good in some countries. Arguments made for this are for example that some countries house a much larger portion of refugees than others. Of course, there are many more nuanced stances and statistics. In order to demonstrate the effectiveness of the asylum procedure, it is essential to assess the number of refugee applications received and the corresponding processing rates within a specific timeframe.

The (generally) opposing stance is that asylum systems like much adjustment to help the problem, or that they even worsen the problem. For many countries the seeking of an asylum, or rather the process of being assigned to one can take absurdly long and/or convoluted. This can result in the accumulation of the queue for refugees, thus worsening the crisis. In order to find out the possible reasons for this, the total number of refugees worldwide and the countries to which they are sent should be illustrated, as some countries may receive too many refugees than others. The logistics of the distribution process can also be inefficient by making it needlessly distant for refugees to travel compared to their country of origin.

# Dataset and preprocessing

**Dataset**

For this data story we use mostly refugee data from the UN High Commissioner for Refugees (UNHCR). This is data on uprooted populations and asylum processing. It can be downloaded here: [Refugee Data](https://www.kaggle.com/datasets/unitednations/refugee-data).

 The dataset consists of 6 csv's with varying data, regarding many demographics around the refugee crisis for many different countries, such as the movement of displaced persons, the amount per month, progress through the refugee system, the location of origin and destination, etc. Besides this data, there is also data regarding the outcomes and responses of asylum petitions for different countries.

We also use some open data from the Centraal Bureau voor de Statistiek (CBS), regarding asylum applications and family reunification; nationality, sex and age, in the Netherlands. It can be downloaded here: [CBS regarding asylum applications](https://opendata.cbs.nl/statline/#/CBS/nl/dataset83102ned/table).

This table contains data on the numbers of submitted asylum applications and following family members (following family members) per month, quarter and year, distinguished by the nationality, gender and age group of the asylum seeker and following family member. In addition to the total number of requests, the first and subsequent requests are also included. As of 2016, the first requests include relocation. Figures are available from January 2013. The figures shown concern asylum seekers who have been formally identified and registered.


**Preprocessing**

Using the data from the "asylum_seekers_monthly.csv" file, we created an overview of the countries of asylum and the number of monthly asylum seekers. By grouping and summing the data by country of origin, we focused on the total number of incoming asylum seekers by country of asylum. This approach allows us to understand the size of the refugee flow by country, without focusing on individual countries of origin.

Using the "asylum_seekers.csv" file, we aggregated data by country of origin and country of asylum. We looked at three key variables: the number of new asylum applications per year, the total number of pending applications at the beginning of the year, and the total number of pending applications at the end of the year. This information gives us insight into the dynamics of asylum applications and the situation of asylum seekers over an entire year.

In [2]:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import io
import matplotlib.pyplot as plt
import numpy as np

### Asylum applications across all countries 2000-2017

In [4]:
# Read the CSV file into a DataFrame
apypc_df = pd.read_csv("data/applied_pending_yearly_per_country.csv")

# Create the figure
fig = go.Figure()

# Add traces for each line
fig.add_trace(go.Scatter(x=apypc_df["Year"], y=apypc_df.filter(regex='end').sum(axis=1), name="Pending end of year"))
fig.add_trace(go.Scatter(x=apypc_df["Year"], y=apypc_df.filter(regex='Applied').sum(axis=1), name="Applied during year"))

# Set labels and titles
fig.update_layout(
    xaxis_title="Year",
    yaxis_title="Number of asylum seekers",
    title="Asylum applications across all countries 2000-2017"
)

# Set x-axis tick labels
fig.update_xaxes(
    tickmode="array",
    tickvals=[2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016],
    ticktext=[str(year) for year in range(2000, 2017, 2)]
)

# Set y-axis tick labels
fig.update_yaxes(tickvals=list(range(0, 10000001, 2000000)))

# Add text
txt = "The total number of recorded applications of asylum seekers summed over all countries between 2000 and 2017, as well as the amount of applications still pending, i.e. not processed, at the end of the year."
fig.add_annotation(
    text=txt,
    xref="paper", yref="paper",
    x=0.5, y=-0.1,
    showarrow=False,
    font=dict(size=10),
    width=600
)

# Display the plot
fig.show()


The period of 2014-2015 witnessed a notable surge in asylum requests worldwide, primarily attributed to military conflicts in the Middle East, notably the Syrian civil war. Consequently, there was a substantial influx of refugees into European countries, as depicted in the graph below. This data is collected by the the UN High Commissioner for Refugees (UNHCR) in the 'applied_pending_yearly_per_country.csv file'.

Implementing stricter rules on follow-up travel is essential for effective immigration management. The data reveals that nearly three-quarters of follow-up travelers arriving in the Netherlands originate from the same country. This observation highlights the necessity of addressing the underlying issue directly at its source. This should be done on a country-by-country basis, particularly in relation to those nations that have the majority of asylum seekers.

By establishing stricter regulations on follow-up travel, authorities can gain better control over the influx of migrants and implement more targeted measures. Focusing efforts on specific countries allows for a more tailored and efficient approach to addressing the root causes of migration. It enables policymakers to concentrate resources, support, and assistance in regions that require immediate attention such as war-torn areas or nations suffering from economic difficulties.

Implementing stricter rules on follow-up travel can help prevent abuses of the asylum system, such as "asylum shopping," where individuals exploit multiple countries' systems to increase their chances of gaining refugee status. By targeting countries that contribute to the majority of asylum seekers, policymakers can work collaboratively with international partners to address the underlying issues, promote stability, and provide sustainable solutions to alleviate the root causes of displacement.

### Asylum seekers in Europe 2015-2017

In [5]:
# Read the CSV file into a DataFrame
pmpc_df = pd.read_csv("data/processed_monthly_per_country.csv", sep=",", low_memory=False)

# Rename the column for the United Kingdom
pmpc_df = pmpc_df.rename(columns={"United Kingdom of Great Britain and Northern Ireland": "United Kingdom"})

# Remove unwanted columns
columns_to_drop = ['Canada', 'Liechtenstein', 'Rep. of Korea', 'New Zealand', 'USA (EOIR)', 'USA (INS/DHS)',
                   'Japan', 'Canada', 'Liechtenstein', 'Rep. of Korea', 'New Zealand', 'USA (EOIR)', 'USA (INS/DHS)',
                   'The former Yugoslav Rep. of Macedonia', 'Serbia and Kosovo: S/RES/1244 (1999)', 'Albania',
                   'Lithuania', 'Montenegro', 'Bosnia and Herzegovina', 'Estonia', 'Slovakia', 'Japan', "Cyprus",
                   "Latvia", "Malta", "Croatia", "Slovenia", "Romania", "Ireland", "Australia"]
pmpc_df = pmpc_df.drop(columns_to_drop, axis=1)
pmpc_df.to_csv("asielzoekers_dataset.csv", index=False)
# Create trace for each year
trace = [
    go.Bar(
        name='2015',
        x=pmpc_df.columns[2:],
        y=pmpc_df[pmpc_df.Year == 2015].iloc[:, 2:].sum(),
        marker_color='rgb(102,194,165)'
    ),
    go.Bar(
        name='2016',
        x=pmpc_df.columns[2:],
        y=pmpc_df[pmpc_df.Year == 2016].iloc[:, 2:].sum(),
        marker_color='rgb(252,141,98)'
    ),
    go.Bar(
        name='2017',
        x=pmpc_df.columns[2:],
        y=pmpc_df[pmpc_df.Year == 2017].iloc[:, 2:].sum(),
        marker_color='rgb(141,160,203)'
    )
]

# Set layout for the plot
layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category'
    )
)

# Create the figure
fig = go.Figure(data=trace, layout=layout)

# Update layout and display the plot
fig.update_layout(
    title='<b>Asylum seekers in Europe 2015-2017</b>',
    yaxis=dict(
        title='Number of new asylum seekers'
    ),
    xaxis=dict(
        title='Country<br><sup>The total number of new asylum seekers in European countries between 2015 and 2017.</sup>',
        titlefont_size=20,
    ),
    height=500
)

# Show the plot
fig.show()

The graph presented above shows a significant divergence among European nations concerning the influx of asylum seekers. Various factors contribute to this contrast, including the size of the countries, their appeal to refugees, and the demand for a younger workforce, as seen in the data of Germany.  Collaboration between nations is necessary to effectively manage the issues caused by the refugee crisis. By pooling resources, sharing best practices, and coordinating efforts, European countries can collectively provide more support to displaced individuals seeking asylum. This collaboration should go beyond mere burden-sharing, countries must also enforce the same policies and strategies.

Better cooperation can create a more equal distribution of responsibilities among countries, ensuring that the burden of hosting and providing assistance to refugees is shared more fairly. It also enables the pooling of expertise and resources to improve reception and integration processes, thereby facilitating the successful integration of refugees into European communities. Countries should also collaborate to tackle the root issues. By working together, countries can contribute to stability and peacebuilding in unsafe regions. This support can focus on sustainable development, fostering good governance, and promoting economic opportunities in countries of origin to reduce the reasons of displacement.

### The Netherlands, initial asylum seekers Origins 2023 april

In [6]:
# Read the CSV file into a DataFrame
anngl_df = pd.read_csv("data/Asielverzoeken_en_nareizigers__nationaliteit__geslacht_en_leeftijd_18062023_133350.csv", low_memory=False, sep=';')

# Define color palette for the pie chart
colors = ['#4C78A8', '#F58518', '#EECA3B', '#54A24B', '#E45756', '#72B7B2']

# Create the pie chart using Plotly Express
fig = px.pie(
    anngl_df[anngl_df["Onderwerp"] == "Eerste asielverzoeken (personen)"][1:],
    values='2023 april*',
    names='Nationaliteit',
    color_discrete_sequence=colors,
    hole=0.6,
    height=600,
    title="<b>The Netherlands, initial asylum seekers Origins 2023 april</b>",
    labels={'x': "Fruits<br><sup>Fruit sales in the month of January</sup>"}
)

# Update chart layout
fig.update(layout_showlegend=False)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(
    height=600,
    annotations=[
        dict(
            text='The ratio of the origins of initial asylum seekers arriving in the Netherlands during April 2023.',
            x=0.5,
            y=-0.1,
            font_size=15,
            showarrow=False
        ),
        dict(
            text='These are people who travel first and are not following family, for example.',
            x=0.5,
            y=-0.15,
            font_size=15,
            showarrow=False
        )
    ]
)

The figure above shows a pie chart showing the origin of current asylum seekers in the Netherlands. Remarkably, only four countries account for about 50 percent of the total number of asylum seekers. This finding underscores the need for a pragmatic approach to the problem, one that emphasizes addressing the root causes of migration rather than focusing only on destination countries.

By focusing on the root causes, we can provide a more sustainable solution and encourage people to stay in their home countries through infrastructure development and job creation.

### The Netherlands, follow up travelers Origins 2023 april

In [7]:
# Define the colors for the pie chart
colors = ['#4C78A8', '#F58518', '#EECA3B', '#54A24B', '#E45756', '#72B7B2']

# Filter the dataframe and create a pie chart
fig = px.pie(
    anngl_df[anngl_df["Onderwerp"] == "Nareizigers (personen)"][1:],
    values='2023 april*',
    names='Nationaliteit',
    color_discrete_sequence=colors,
    hole=0.8,
    height=600,
    title="<b>The Netherlands, follow up travelers Origins 2023 april</b>"
)

# Hide the legend
fig.update(layout_showlegend=False)

# Set the text position and information to show in the chart
fig.update_traces(textposition='inside', textinfo='percent+label')

# Set the height and add annotations to the chart
fig.update_layout(
    height=600,
    annotations=[
        dict(
            text='The ratio of the origins of follow up asylum seekers arriving in the Netherlands during April 2023.',
            x=0.5, y=-0.1, font_size=15, showarrow=False
        ),
        dict(
            text='These are people who do not travel first, and are often following family.',
            x=0.5, y=-0.15, font_size=15, showarrow=False
        )
    ]
)

There should be stricter rules on follow up travel. As we can see, almost three quarter of follow up travelers that arrive in the Netherlands come from the same country. This also supports the idea of tackling the issue at its roots, or more specifically per country, for the countries that make up the majority of asylum seekers.

### Yearly asylum seekers in the Netherlands

In [8]:
# Read the CSV file into a DataFrame
pmpc_df = pd.read_csv("data/processed_monthly_per_country.csv", sep=",", low_memory=False)

# Create a bar chart using Plotly Express
trace = go.Bar(
    x = pmpc_df['Year'].unique(),
    y = pmpc_df.groupby(pmpc_df.Year)['Netherlands'].sum()
)
fig = go.Figure(trace)

# Update chart layout
fig.update_layout(
    title='<b>Yearly asylum seekers in the Netherlands</b>',
    yaxis=dict(
        title='Number of new asylum seekers',
    ),
    xaxis=dict(
        title='Year<br><sup>The total number of new asylum seekers arriving in the Netherlands between 1999 and 2017.</sup>',
        titlefont_size=20,
    ),
    height=400,
)
# Display the figure
fig.show()

In [9]:
# Read the data from the CSV file
pmpc_df = pd.read_csv("data/processed_monthly_per_country.csv", sep=",", low_memory=False)

# Combine 'Year' and 'Month' columns into a single 'Year-Month' column
pmpc_df["Year-Month"] = pmpc_df["Year"].astype(str) + "-" + pmpc_df["Month"].astype(str)

# Reshape the DataFrame from wide to long format
melted_df = pd.melt(pmpc_df, id_vars=["Year-Month"], value_vars=pmpc_df.columns[2:], var_name="Country", value_name="Refugees")

# Bubble Map with animation
fig = px.scatter_geo(melted_df, locations="Country", locationmode="country names",
                     hover_name="Country", color="Refugees",
                     projection="natural earth", title="Aantal vluchtelingen per land en maand",
                     animation_frame="Year-Month")

# Adjust the bubble size based on the number of refugees
fig.update_traces(marker=dict(sizemode="area", sizeref=1, sizemin=3))

# Set layout options
fig.update_layout(height=600, margin=dict(r=0, l=0, b=0))

# Show the plot
fig.show()

### Annual influx of refugees to the Netherlands

In [15]:
# Gegevens inlezen vanuit CSV-bestand
asi_df = pd.read_csv("data/asylum_seekers_info.csv")

asi_df = asi_df.drop(['Year', 'Month'], axis=1)
# Parallel Categories Diagram maken met behulp van Plotly Express
fig = px.parallel_categories(asi_df, dimensions=['Country','Gender','Age'], color_continuous_scale=px.colors.qualitative.Dark24)

# Grafiek opmaken
fig.update_layout(title="Asielzoekers per Land")
fig.update_xaxes(title="Kenmerken")
fig.update_yaxes(title="Aantal")

# Grafiek weergeven
fig.show()


### Asylum applications in Netherlands 2000-2017

In [None]:
# Create a figure
fig = go.Figure()

# Add traces for each line
fig.add_trace(go.Scatter(x=apypc_df.index, y=apypc_df["('pending end-year', 'Netherlands')"], name="pending end-year"))
fig.add_trace(go.Scatter(x=apypc_df.index, y=apypc_df["('Applied during year', 'Netherlands')"], name="Applied during year"))

# Set x-axis and y-axis labels
fig.update_layout(xaxis_title="Year", yaxis_title="Number of asylum seekers")

# Set the title
fig.update_layout(title="Asylum applications in Netherlands 2000-2017")

# Set x-axis tick labels
fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(len(apypc_df.index))),
        ticktext=list(apypc_df.index)
    )
)

# Show the legend
fig.update_layout(showlegend=True)

# Display the plot
fig.show()

The figure above illustrates the efficiency of the asylum system in the Netherlands. From 2000 to 2006, there was a significant decrease in the number of pending end-of-year requests, despite a high influx of new requests throughout the year. This indicates that a large number of requests were successfully processed during this period. The subsequent years also demonstrate that the Netherlands consistently managed to keep the number of pending requests at the end of the year relatively low, even when faced with a surge in applicants in 2015. 

However, it is important to note that this graph alone can be misleading when assessing the overall efficiency of the Dutch asylum system. Compared to other countries such as Germany, France, Italy, and Greece, as shown in previous graphs, the Netherlands has received a relatively smaller number of applicants overall. This could imply that the perceived efficiency of the Dutch asylum system may be partly attributed to the lower volume of requests they have had to handle.