# Final Project - Refugee Data Visualization

Ben Heinze, Elizabeth Pauley

CSCI-491 - Data Visualization

3 May 2024

### Dataset
* [UNHCR Refugee Dataset link](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023/2023-08-22)

### Helpful Information
- [What is a Host Community Member?](https://www.unhcr.org/us/publications/unhcr-ngo-toolkit-practical-cooperation-resettlement-community-outreach-outreach-0)
- [What is Statelessness?](https://www.unhcr.org/ibelong/about-statelessness/)

### Dataset Description

| Variable          | Class     | Description                                                     |
|-------------------|-----------|-----------------------------------------------------------------|
| year              | int64     | The year.                                                       |
| coo_name          | character | Country of origin name.                                         |
| coo               | character | Country of origin UNHCR code.                                   |
| coo_iso           | character | Country of origin ISO code.                                     |
| coa_name          | character | Country of asylum name.                                         |
| coa               | character | Country of asylum UNHCR code.                                   |
| coa_iso           | character | Country of asylum ISO code.                                     |
| refugees          | int64     | The number of refugees from COO in COA.                         |
| asylum_seekers    | int64     | The number of asylum-seekers from COO in COA.                   |
| returned_refugees | int64     | The number of refugees returned to COO from COA                 |
| idps              | int64     | The number of internally displaced persons.                     |
| returned_idps     | int64     | The number of returned internally displaced persons.            |
| stateless         | int64     | The number of stateless persons.                                |
| ooc               | int64     | The number of others of concern to UNHCR.                       |
| oip               | float     | The number of other people in need of international protection. |
| hst               | float     | The number of host community members.                           |

### Questions we Hope to Answer
1. Is there correlation between the country of origin of refugees and the countries where they seek aslyum?
2. Which countries have the highest proportion of returned-refugees to refugees?
3. PCA (TBD)
4. Clustering (TBD)
5. Statelessness (TBD)

## Setup

---

In [1]:
import numpy as np
import pandas as dp

data = dp.read_csv("data/population.csv")

#removes repetitive columns (country or origin/asylum initials)
data = data.drop(columns = ['coo','coo_iso','coa','coa_iso'])

# Numerical data
num_data = data.drop(columns = ['coo_name','coa_name'])

## Preprocessing

---

Out of the entire dataset, only two columns has missing values: __oip__ and __hst__. Since they only have 100 and 5964 datapoints respectively out of 64809, we will not use a filling technique as the majority of the data is missing. __hst__ was only captured in 2022 and a fraction of 2021. __Learn more about oip's nulls; Choose to keep or drop these columns.__ 

## Question 1

---

Is there correlation between the country of origin of refugees and the countries where they seek asylum? Analyze this for both refugees and asylum_seekers, then compare and contrast.

In [4]:
# dictionary of COO:COA, returned,
year = 2010
subframe = data[data['year'] == year]   #Gets dataframe of specific year

d_coo = {}
d_coa = {}
result = []
for index, series in subframe.iterrows():
    coo = series['coo_name']
    coa = series['coa_name']
    asylum_seekers = series['asylum_seekers']
    
    # counts the number of asylum_seekers from COO to COA
    if asylum_seekers != 0:
        if coa not in d_coa:
                d_coa[coa] = asylum_seekers
        else:
            d_coa[coa] += asylum_seekers
    
    if coo not in d_coo:
        d_coo[coo] = [d_coa]
    else:
        d_coo[coo].append(d_coa)
#print(d_coo)

for key, value in d_coo.items():
    print(f"{key}: {value}")

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



## Question 2

---

Which countries have the highest proportion of returned-refugees to refugees?

In [3]:
year = 2010
subframe = data[data['year'] == year]   #Gets dataframe of specific year

# Gets total refugees who left from their country of origin; country:number
totalRefugeesFromCOO = {}
for index, series in subframe.iterrows(): 
    coo_name = series['coo_name']
    refugees = series['refugees']

    if coo_name not in totalRefugeesFromCOO:
        totalRefugeesFromCOO[coo_name] = refugees
    else:
        totalRefugeesFromCOO[coo_name] += refugees
print(f"Total Refugees from Country of Origin:\n{totalRefugeesFromCOO}")

# Gets total returned refugees from to country of origin
totalReturnedRefugeesFromCOO = {}
for index, series in subframe.iterrows(): 
    coo_name = series['coo_name']
    returnedRefugees = series['returned_refugees']

    if coo_name not in totalReturnedRefugeesFromCOO:
        totalReturnedRefugeesFromCOO[coo_name] = returnedRefugees
    else:
        totalReturnedRefugeesFromCOO[coo_name] += returnedRefugees
print(f"\n\nTotal Refugees Returned to Country of Origin:\n{totalReturnedRefugeesFromCOO}")


Total Refugees from Country of Origin:
{'Afghanistan': 3054699, 'Iran (Islamic Rep. of)': 68785, 'Iraq': 1683576, 'Pakistan': 39979, 'Egypt': 6903, 'China': 184601, 'Palestinian': 93312, 'Serbia and Kosovo: S/RES/1244 (1999)': 183284, 'Türkiye': 146785, 'Angola': 134851, 'Benin': 438, 'Chad': 53713, 'Cameroon': 14952, 'Congo': 20682, 'Dem. Rep. of the Congo': 476691, 'Guinea': 11978, "Cote d'Ivoire": 41747, 'Liberia': 70133, 'Libya': 2297, 'Niger': 794, 'Nigeria': 15645, 'Somalia': 770141, 'Sudan': 387265, 'Western Sahara': 116411, 'Burundi': 84053, 'Central African Rep.': 164902, 'Eritrea': 222457, 'Ethiopia': 68838, 'Guinea-Bissau': 1117, 'Mauritania': 37721, 'Rwanda': 115519, 'Senegal': 16255, 'Sierra Leone': 11261, 'United Rep. of Tanzania': 1135, 'Unknown': 167115, 'Algeria': 6665, 'Djibouti': 556, 'Kazakhstan': 3631, 'Mali': 3659, 'Russian Federation': 111944, 'Saudi Arabia': 659, 'Syrian Arab Rep.': 18451, 'Tajikistan': 573, 'Turkmenistan': 723, 'Tunisia': 2159, 'Uganda': 6421, 