# Canada Refugee Statistics Exploratory Data Analysis

In this project we perform Exploratory Data Analysis **(EDA)** on **UNHCR Refugee Statistics** for **Canada** from **2012-2022.**


_Original dataset can be downloaded from the [unhcr.org](https://www.unhcr.org/refugee-statistics/download/?url=8tIY7I) website._

## Analysis Questions

1. From which countries has Canada admitted the highest number of refugees?
2. What are the total number of resettled refugees in Canada per year?
3. What are the countries of origin for the majority of asylum claims made in Canada?
4. What is the total number of asylum claims made in Canada every year?
5. What are the general trends in refugee and asylum statistics from 2012-2022?

## Important Distinction: Refugees vs. Asylum Seekers

**The 1951 Refugee Convention defines a refugee as:** “A person who, owing to a well-founded fear of being persecuted for reasons of race, religion, nationality, membership of a particular social group or political opinion, is outside the country of his nationality and is unable or, owing to such fear, unwilling to avail himself of the protection of that country”.
> - In this data set, UNHCR-Refugees refers to people who have been resettled to Canada, and arrive to the country as permanent residents.
    
**An asylum seeker (or claimant) on the other hand is defined as** someone who is seeking international protection but has not yet been granted refugee status. 
> - In this data set, asylum-seeker refers to someone who has arrived in Canada as a visitor, worker, student, etc., through official or unofficial ports of entry, and who applied for protection from within Canada (after arrival). 

_For more information and definitions, visit the [unhcr.ca](https://www.unhcr.ca/about-us/frequently-asked-questions/#:~:text=An%20asylum%2Dseeker%20is%20someone,yet%20been%20granted%20refugee%20status.) website._

## Notebook Content

1. **Step 1:** Install + Import Necessary Libraries
2. **Step 2:** Reading, Exploring and Preparing Data
3. **Step 3:** Exploratory Data Analysis + Visualization
4. **2012- 2022 Canada Refugee Statistics EDA Results Summary**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Analyzing Questions

In [3]:
df = pd.read_csv('can-stats-2012-22.csv')
df.sample(10)

Unnamed: 0,Year,Country-of-origin,UNHCR-refugees,Asylum-seekers,total-count
577,2015,Guinea-Bissau,15,6,21
2,2012,Algeria,513,284,797
383,2014,Colombia,16428,274,16702
350,2014,Algeria,346,53,399
998,2017,Spain,6,37,43
1817,2022,Pakistan,2775,2661,5436
215,2013,Djibouti,289,114,403
962,2017,Malaysia,32,15,47
114,2012,Netherlands (Kingdom of the),34,23,57
1806,2022,Malta,0,5,5


In [4]:
df = pd.read_csv('can-stats-2012-22.csv')
df.head()

Unnamed: 0,Year,Country-of-origin,UNHCR-refugees,Asylum-seekers,total-count
0,2012,Afghanistan,2609,411,3020
1,2012,Albania,1764,579,2343
2,2012,Algeria,513,284,797
3,2012,Angola,753,21,774
4,2012,Antigua and Barbuda,40,30,70


In [6]:
# From which countries has Canada admitted the highest number of refugees?
df['Country-of-origin']

0                              Afghanistan
1                                  Albania
2                                  Algeria
3                                   Angola
4                      Antigua and Barbuda
                       ...                
1865    Venezuela (Bolivarian Republic of)
1866                        Western Sahara
1867                                 Yemen
1868                                Zambia
1869                              Zimbabwe
Name: Country-of-origin, Length: 1870, dtype: object

In [7]:
df['total-count']

0       3020
1       2343
2        797
3        774
4         70
        ... 
1865    3601
1866       5
1867     902
1868      48
1869     871
Name: total-count, Length: 1870, dtype: int64

In [48]:
# From which countries has Canada admitted the highest number of refugees?
df[['total-count', 'Country-of-origin']].max()

total-count             75294
Country-of-origin    Zimbabwe
dtype: object

In [52]:
df['total-count'].groupby(by = df['Country-of-origin']).max().sort_values(ascending = False)

Country-of-origin
Ukraine           75294
Mexico            23916
Unknown           20032
China             19978
Nigeria           19468
                  ...  
Kiribati              5
Luxembourg            5
Western Sahara        5
Cabo Verde            5
Bermuda               5
Name: total-count, Length: 184, dtype: int64

In [61]:
top_5_country =df['total-count'].groupby(by = df['Country-of-origin']).sum().sort_values(ascending = False)

In [62]:
top_5_country

Country-of-origin
Colombia          138891
China             120112
Nigeria           112174
Haiti             111864
Mexico             98218
                   ...  
Kiribati              15
Cabo Verde            10
Luxembourg             5
Bermuda                5
Western Sahara         5
Name: total-count, Length: 184, dtype: int64

In [64]:
# From which countries has Canada admitted the highest number of refugees?
top_5_country = df['total-count'].groupby(by=df['Country-of-origin']).max().sort_values(ascending=False)[0:5]

# Highest Number of Refugees
plt.figure(figsize=(10,7))
sns.barplot(x='Country-of-origin', y='total-count', data=top_5_country)
plt.title("Highest Number of Refugees")
plt.xlabel("Country-of-origin")
plt.ylabel("total-count")
plt.xticks(rotation=45)
plt.show()

ValueError: Could not interpret input 'Country-of-origin'

<Figure size 1000x700 with 0 Axes>

In [12]:
What are the total number of resettled refugees in Canada per year?
df['UNHCR-refugees'].sum()

Object `year` not found.


1407060

In [13]:
What are the total number of resettled refugees in Canada per year?
df['UNHCR-refugees'].sum() * 12

Object `year` not found.


16884720

In [14]:
What are the countries of origin for the majority of asylum claims made in Canada?
df[['Country-of-origin', 'Asylum-seekers']]

Object `Canada` not found.


Unnamed: 0,Country-of-origin,Asylum-seekers
0,Afghanistan,411
1,Albania,579
2,Algeria,284
3,Angola,21
4,Antigua and Barbuda,30
...,...,...
1865,Venezuela (Bolivarian Republic of),2104
1866,Western Sahara,0
1867,Yemen,452
1868,Zambia,25


In [19]:
What is the total number of asylum?
df['Asylum-seekers'].sum()

Object `year` not found.


604271

In [20]:
What is the total number of asylum claims made in Canada every year?
df['Asylum-seekers'].sum() * 12

Object `year` not found.


7251252

In [27]:
df[['Year', 'UNHCR-refugees', 'Asylum-seekers']]

Unnamed: 0,Year,UNHCR-refugees,Asylum-seekers
0,2012,2609,411
1,2012,1764,579
2,2012,513,284
3,2012,753,21
4,2012,40,30
...,...,...,...
1865,2022,1497,2104
1866,2022,5,0
1867,2022,450,452
1868,2022,23,25
