# 7. Refugees Dataset

Link to the data source - https://www.kaggle.com/datasets/unitednations/refugee-data/data

##### Omri Yanay
##### Talya Bachmann
##### Shir Nina Saban


# <b> </b> <b style='color:#F09454'>Contents</b>
#  <b id="Top"> </b>

___
#  1.<b id="introductions"> </b> <b style='color:black'>Introduction </b> <b> &  Basic <b style='color:#F09454'>EDA. </b>  

### <b> Importing </b> <b style='color:#F09454'>Libraries.</b>

In [1]:
# !pip install geopandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import geopandas as gpd
import seaborn as sns
from IPython.display import Image, display
import plotly.express as px  # similar to seaborn
import plotly.graph_objects as go # more complex and customizable

# Refugees_project

<a id='top'></a>



1. [Read-data](#section1)
  
2. [File 2 - Entry request](#section2)
    
3. [File 3 - Age range of refugees](#section3)
    
4. [File 4 - Population type](#section4)


<div>
  <img src="https://github.com/OmriYanay/OmriYanay/blob/main/New%20Project/Refugees-header.png?raw=true" height="800" />
</div>


In [2]:
url1 = 'https://raw.githubusercontent.com/OmriYanay/OmriYanay/main/New%20Project/asylum_seekers.csv'
##(2000-2016) Yearly progress of asylum-seekers through the refugee status determination process, with data on UNHCR assistance.
url2 = 'https://raw.githubusercontent.com/OmriYanay/new-project/main/demographics.csv'
##(2001-2016) Information on refugees according to residence territory, broken down by regional level, age, and gender demographics. Warning-may be incomplete or otherwise conflict with persons_of_concern.csv and time_series.csv.
url3 = 'https://raw.githubusercontent.com/OmriYanay/new-project/main/asylum_seekers_monthly.csv'
##(2000-2016) Yearly progress of asylum-seekers through the refugee status determination process, with data on UNHCR assistance.
url4 = 'https://raw.githubusercontent.com/OmriYanay/new-project/main/time_series.csv'
##(1951-2016) Yearly population statistics on refugee movement changes from an origin to a destination.

<a id='section1'></a>
### 1. Read data

In [None]:
refugees_df1 = pd.read_csv(url1, low_memory=False)
refugees_df2 = pd.read_csv(url2)
refugees_df3 = pd.read_csv(url3, low_memory=False)
refugees_df4 = pd.read_csv(url4, low_memory=False)

In [None]:
refugees_df3.head()

In [None]:
refugees_df3.rename(columns={'Country / territory of asylum/residence': 'Country'}, inplace=True)
refugees_df3.head()

In [None]:
refugees_df3['Origin'].unique()

##### We added an iso_alpha column to allow the map_country_to_iso function to display the data on the map

In [None]:
origin_iso_mapping ={
    'Afghanistan': 'AFG',
    'Albania': 'ALB',
    'Algeria': 'DZA',
    'Andorra': 'AND',
    'Angola': 'AGO',
    'Antigua and Barbuda': 'ATG',
    'Argentina': 'ARG',
    'Armenia': 'ARM',
    'Australia': 'AUS',
    'Austria': 'AUT',
    'Azerbaijan': 'AZE',
    'Bahamas': 'BHS',
    'Bahrain': 'BHR',
    'Bangladesh': 'BGD',
    'Barbados': 'BRB',
    'Belarus': 'BLR',
    'Belgium': 'BEL',
    'Belize': 'BLZ',
    'Benin': 'BEN',
    'Bhutan': 'BTN',
    'Bolivia (Plurinational State of)': 'BOL',
    'Bosnia': 'BIH',
    'Botswana': 'BWA',
    'Brazil': 'BRA',
    'Brunei Darussalam': 'BRN',
    'Bulgaria': 'BGR',
    'Burkina Faso': 'BFA',
    'Burundi': 'BDI',
    'Cabo Verde': 'CPV',
    'Cambodia': 'KHM',
    'Cameroon': 'CMR',
    'Canada': 'CAN',
    'Central African Rep.': 'CAF',
    'Chad': 'TCD',
    'Chile': 'CHL',
    'China': 'CHN',
    'China, Hong Kong SAR': 'HKG',
    'China, Macao SAR': 'MAC',
    'Colombia': 'COL',
    'Comoros': 'COM',
    'Congo': 'COG',
    'Costa Rica': 'CRI',
    'Croatia': 'HRV',
    'Cuba': 'CUB',
    'Cyprus': 'CYP',
    'Czech Rep.': 'CZE',
    "Côte d'Ivoire": 'CIV',
    "Dem. People's Rep. of Korea": 'PRK',
    'Dem. Rep. of the Congo': 'COD',
    'Denmark': 'DNK',
    'Djibouti': 'DJI',
    'Dominica': 'DMA',
    'Dominican Rep.': 'DOM',
    'Ecuador': 'ECU',
    'Egypt': 'EGY',
    'El Salvador': 'SLV',
    'Equatorial Guinea': 'GNQ',
    'Eritrea': 'ERI',
    'Estonia': 'EST',
    'Ethiopia': 'ETH',
    'Fiji': 'FJI',
    'Finland': 'FIN',
    'France': 'FRA',
    'Gabon': 'GAB',
    'Gambia': 'GMB',
    'Georgia': 'GEO',
    'Germany': 'DEU',
    'Ghana': 'GHA',
    'Greece': 'GRC',
    'Grenada': 'GRD',
    'Guatemala': 'GTM',
    'Guinea': 'GIN',
    'Guinea-Bissau': 'GNB',
    'Guyana': 'GUY',
    'Haiti': 'HTI',
    'Honduras': 'HND',
    'Hungary': 'HUN',
    'India': 'IND',
    'Indonesia': 'IDN',
    'Iran (Islamic Rep. of)': 'IRN',
    'Iraq': 'IRQ',
    'Ireland': 'IRL',
    'Israel': 'ISR',
    'Italy': 'ITA',
    'Jamaica': 'JAM',
    'Japan': 'JPN',
    'Jordan': 'JOR',
    'Kazakhstan': 'KAZ',
    'Kenya': 'KEN',
    'Kuwait': 'KWT',
    'Kyrgyzstan': 'KGZ',
    "Lao People's Dem. Rep.": 'LAO',
    'Latvia': 'LVA',
    'Lebanon': 'LBN',
    'Lesotho': 'LSO',
    'Liberia': 'LBR',
    'Libya': 'LBY',
    'Liechtenstein': 'LIE',
    'Lithuania': 'LTU',
    'Luxembourg': 'LUX',
    'Madagascar': 'MDG',
    'Malawi': 'MWI',
    'Malaysia': 'MYS',
    'Maldives': 'MDV',
    'Mali': 'MLI',
    'Malta': 'MLT',
    'Martinique': 'MTQ',
    'Mauritania': 'MRT',
    'Mauritius': 'MUS',
    'Mexico': 'MEX',
    'Mongolia': 'MNG',
    'Montenegro': 'MNE',
    'Morocco': 'MAR',
    'Mozambique': 'MOZ',
    'Myanmar': 'MMR',
    'Namibia': 'NAM',
    'Nauru': 'NRU',
    'Nepal': 'NPL',
    'Netherlands': 'NLD',
    'New Caledonia': 'NCL',
    'New Zealand': 'NZL',
    'Nicaragua': 'NIC',
    'Niger': 'NER',
    'Nigeria': 'NGA',
    'Norway': 'NOR',
    'Oman': 'OMN',
    'Pakistan': 'PAK',
    'Palestinian': 'PSE',
    'Panama': 'PAN',
    'Papua New Guinea': 'PNG',
    'Paraguay': 'PRY',
    'Peru': 'PER',
    'Philippines': 'PHL',
    'Poland': 'POL',
    'Portugal': 'PRT',
    'Qatar': 'QAT',
    'Rep. of Korea': 'KOR',
    'Rep. of Moldova': 'MDA',
    'Romania': 'ROU',
    'Russian Federation': 'RUS',
    'Rwanda': 'RWA',
    'Saint Kitts and Nevis': 'KNA',
    'Saint Lucia': 'LCA',
    'Saint Vincent and the Grenadines': 'VCT',
    'Samoa': 'WSM',
    'San Marino': 'SMR',
    'Saudi Arabia': 'SAU',
    'Senegal': 'SEN',
    'Serbia': 'SRB',
    'Seychelles': 'SYC',
    'Sierra Leone': 'SLE',
    'Singapore': 'SGP',
    'Slovakia': 'SVK',
    'Slovenia': 'SVN',
    'Solomon Islands': 'SLB',
    'Somalia': 'SOM',
    'South Africa': 'ZAF',
    'South Sudan': 'SSD',
    'Spain': 'ESP',
    'Sri Lanka': 'LKA',
    'Stateless': '',
    'Sudan': 'SDN',
    'Suriname': 'SUR',
    'Swaziland': 'SWZ',
    'Sweden': 'SWE',
    'Switzerland': 'CHE',
    'Syrian Arab Rep.': 'SYR',
    'Tajikistan': 'TJK',
    'Thailand': 'THA',
    'The former Yugoslav Rep. of Macedonia': 'MKD',
    'Tibetan': '',
    'Timor-Leste': 'TLS',
    'Togo': 'TGO',
    'Tonga': 'TON',
    'Trinidad and Tobago': 'TTO',
    'Tunisia': 'TUN',
    'Turkey': 'TUR',
    'Turkmenistan': 'TKM',
    'Uganda': 'UGA',
    'Ukraine': 'UKR',
    'United Arab Emirates': 'ARE',
    'United Kingdom': 'GBR',
    'United Rep. of Tanzania': 'TZA',
    'United States of America': 'USA',
    'Uruguay': 'URY',
    'Uzbekistan': 'UZB',
    'Various/unknown': '',
    'Venezuela (Bolivarian Republic of)': 'VEN',
    'Viet Nam': 'VNM',
    'Western Sahara': 'ESH',
    'Yemen': 'YEM',
    'Zambia': 'ZMB',
    'Zimbabwe': 'ZWE'
}


# Function to map country names to ISO alpha-3 codes
def map_country_to_iso(country):
    return origin_iso_mapping.get(country, '')

# Add a new column 'iso_alpha' with ISO alpha-3 codes 
refugees_df3['iso_alpha'] = refugees_df3['Origin'].apply(map_country_to_iso)

# Save the DataFrame back to your file
refugees_df3.to_csv(url3, index=False)


In [None]:
refugees_df3['Value'] = pd.to_numeric(refugees_df3['Value'], errors='coerce')
group_Origin = refugees_df3.groupby(['Year', 'Origin','iso_alpha'])[['Value']].sum().reset_index()
group_Origin.head()

In [None]:
gapminder_filtered = group_Origin[group_Origin['Year'] <= 2016]  ## we didn't show 2017 because because tha data was missing information
fig = px.choropleth(gapminder_filtered, 
                    color="Value", 
                    locations="iso_alpha", 
                    hover_name="Origin", 
                    animation_frame="Year",
                    height=600)
fig.update_layout(
    font = dict(
            size = 14
            ),    
    title={
        'text': "Origin of refugees",
        'y':0.95,
        'x':0.5
        },
)
# Show the map
fig.show()

##### In the map before us, we can see, according to years, the countries from which the most refugees originated.

* ###  We decided to focus on the year 2015 as we observed a significant increase in the data. From here, we noticed that there are countries that lead in the amount of refugees from which they leave.
* ###  Based on the provided data, it can be concluded that Iran, Syria, Afghanistan, Iraq, and Pakistan are indeed the countries with the highest number of refugees. These are nations that have experienced refugee trends due to geopolitical and internal events such as wars, ethnic conflicts, and shifts in power. The analytical approach to the data strongly supports this assertion, confirming that these countries are in precarious situations in terms of stability and security, prompting their residents to seek safer havens.

In [None]:
refugees_df3['Country'] = refugees_df3['Country'].replace({'Serbia and Kosovo: S/RES/1244 (1999)': 'Serbia', 'Bosnia and Herzegovina': 'Bosnia', 'United Kingdom of Great Britain and Northern Ireland': 'United Kingdom', })
refugees_df3['Country'].unique() 

##### we decided to change the name of different Countries as we saw the had long confusing names.

In [None]:
country_iso_mapping = {
    'Australia': 'AUS',
    'Austria': 'AUT',
    'Belgium': 'BEL',
    'Bulgaria': 'BGR',
    'Canada': 'CAN',
    'Czech Rep.': 'CZE',
    'Denmark': 'DNK',
    'Finland': 'FIN',
    'France': 'FRA',
    'Germany': 'DEU',
    'Greece': 'GRC',
    'Hungary': 'HUN',
    'Ireland': 'IRL',
    'Liechtenstein': 'LIE',
    'Luxembourg': 'LUX',
    'Netherlands': 'NLD',
    'Norway': 'NOR',
    'Poland': 'POL',
    'Portugal': 'PRT',
    'Rep. of Korea': 'KOR',
    'Romania': 'ROU',
    'Slovakia': 'SVK',
    'Slovenia': 'SVN',
    'Spain': 'ESP',
    'Sweden': 'SWE',
    'Switzerland': 'CHE',
    'Turkey': 'TUR',
    'United Kingdom': 'GBR',
    'USA (EOIR)': 'USA',  # Assuming USA for both 'USA (EOIR)' and 'USA (INS/DHS)'
    'New Zealand': 'NZL',
    'USA (INS/DHS)': 'USA',
    'Cyprus': 'CYP',
    'Iceland': 'ISL',
    'Japan': 'JPN',
    'Croatia': 'HRV',  # ISO alpha-3 code for Croatia is HRV
    'Estonia': 'EST',
    'Latvia': 'LVA',
    'Malta': 'MLT',
    'Serbia': 'SRB',
    'Lithuania': 'LTU',
    'Albania': 'ALB',
    'Montenegro': 'MNE',
    'The former Yugoslav Rep. of Macedonia': 'MKD',
    'Bosnia': 'BIH',  # ISO alpha-3 code for Bosnia and Herzegovina is BIH
    'Italy': 'ITA'
}

# Function to map country names to ISO alpha-3 codes
def map_country_to_iso(country):
    return country_iso_mapping.get(country, '')

# Add a new column 'iso_alpha' with ISO alpha-3 codes
refugees_df3['iso_alpha'] = refugees_df3['Country'].apply(map_country_to_iso)

# Save the DataFrame back to your file
refugees_df3.to_csv(url3, index=False)


In [None]:
refugees_df3['Value'] = pd.to_numeric(refugees_df3['Value'], errors='coerce')
group_Country = refugees_df3.groupby(['Year', 'Country','iso_alpha'])[['Value']].sum().reset_index()

In [None]:
gapminder_filtered = group_Country[group_Country['Year'] <= 2016]
fig = px.choropleth(gapminder_filtered, 
                    color="Value", 
                    locations="iso_alpha", 
                    hover_name="Country", 
                    animation_frame="Year",
                    height=600)
fig.update_layout(
    font = dict(
            size = 14
            ),    
    title={
        'text': "Destination country of refugees",
        'y':0.95,
        'x':0.5
        },
)
# Show the map
fig.show()

##### In the map before us, we can see, according to years, the countries that most refugees arrived.

 ##### Following the previous data focusing on the countries from which refugees originate, we decided to show the countries where the most refugees arrive.
 ##### As can be seen on the map, the top five countries receiving refugees are Germany, Hungary, Sweden, Turkey, and Austria.

<a id='section2'></a>
### 2.File 2 - Entry request

#### The following file focuses on the process of submitting an entry request to the destination country of refugees.

In [None]:
refugees_df1.head()

RSD= Refugee Status Determination, is the legal or administrative process by which governments or UNHCR determine whether a person seeking international protection is considered a refugee under international, regional or national law

In [None]:
refugees_df1.shape

In [None]:
refugees_df1.rename(columns={'Country / territory of asylum/residence': 'Country'}, inplace=True)
refugees_df1.head()

In [None]:
refugees_df1['Tota pending start-year'] = refugees_df1['Tota pending start-year'].replace('*', '0')
refugees_df1['Tota pending start-year'].fillna('0', inplace=True)
refugees_df1['Total pending end-year'] = refugees_df1['Total pending end-year'].replace('*','0')
refugees_df1['Total pending end-year'].fillna('0', inplace=True)
refugees_df1['Total decisions'] = refugees_df1['Total decisions'].replace('*', '0')
refugees_df1['Total decisions'].fillna('0', inplace=True)
refugees_df1['of which UNHCR-assisted(end-year)'] = refugees_df1['of which UNHCR-assisted(end-year)'].replace('*','0')
refugees_df1['of which UNHCR-assisted(end-year)'].fillna('0', inplace=True)
refugees_df1['of which UNHCR-assisted(start-year)'] = refugees_df1['of which UNHCR-assisted(start-year)'].replace('*','0')
refugees_df1['of which UNHCR-assisted(start-year)'].fillna('0', inplace=True)
refugees_df1['Applied during year'] = refugees_df1['Applied during year'].replace('*','0')
refugees_df1['Applied during year'].fillna('0', inplace=True)
refugees_df1['decisions_recognized'] = refugees_df1['decisions_recognized'].replace('*','0')
refugees_df1['decisions_recognized'].fillna('0', inplace=True)
refugees_df1['decisions_other'] = refugees_df1['decisions_other'].replace('*','0')
refugees_df1['decisions_other'].fillna('0', inplace=True)
refugees_df1['Rejected'] = refugees_df1['Rejected'].replace('*','0')
refugees_df1['Rejected'].fillna('0', inplace=True)
refugees_df1['Otherwise closed'] = refugees_df1['Otherwise closed'].replace('*','0')
refugees_df1['Otherwise closed'].fillna('0', inplace=True)

In [None]:
refugees_df1.isnull().sum()

In [None]:
refugees_df1['Total decisions'] = pd.to_numeric(refugees_df1['Total decisions'], errors='coerce')

sum_df1D = refugees_df1.groupby('Country')['Total decisions'].sum().reset_index()

sum_df1D.head()

In [None]:
refugees_df1['Rejected'] = pd.to_numeric(refugees_df1['Rejected'], errors='coerce')

sum_df1R = refugees_df1.groupby('Country')['Rejected'].sum().reset_index()

sum_df1R.head()

In [None]:
filtered_df1 = sum_df1R[sum_df1R['Rejected'] > 50000] # 184000 top 10 Countries
filtered_df1

<a id='section2.1'></a>
### 2.1 Merging tables

In [None]:
merged_df1 = pd.merge(filtered_df1, sum_df1D, on='Country', how='left')
merged_df1['Percentage'] = (merged_df1['Rejected'] / merged_df1['Total decisions']) * 100
merged_df1

In [None]:
# Plotting
fig, ax = plt.subplots(figsize=(10, 6))

# Plot bar chart for total requests
ax.bar(merged_df1['Country'], merged_df1['Total decisions'], color='teal', label='Total decisions')

# Plot bar chart for rejected requests
ax.bar(merged_df1['Country'], merged_df1['Rejected'], color='navy', label='Rejected')

# Set labels and title
ax.set_xlabel('Country')
ax.set_ylabel('Number of Requests')
ax.set_title('Total Requests and Rejected Requests by Country (Top 20)')
ax.legend()

plt.xticks(rotation=85)  # Rotate x-labels for better visibility

plt.tight_layout()
plt.show()

##### From the graph above, we can learn which countries have a more lenient refugee acceptance policy and which countries have decided not to allow a high number of refugees to enter.
 
##### As can be seen, there are countries that lead in the number of refugees seeking to come to them, but not all of them lead in the quantity of refugees they approve to enter. Therefore, we decided to examine the percentage of refugees each country approves and rejects, In order to better understand which country has a more lenient refugee acceptance policy.
 

In [None]:
Top5_Req = sum_df1R[sum_df1R['Rejected'] > 450000] 
merged_Top5_Req = pd.merge(Top5_Req, sum_df1D, on='Country', how='left')
merged_Top5_Req['Percentage'] = (merged_Top5_Req['Rejected'] / merged_Top5_Req['Total decisions']) * 100

In [None]:
# Creating DataFrame
df = pd.DataFrame(merged_Top5_Req)

# calculating the Rejected precentage in each Country
df['Percentage'] = (df['Rejected'] / df['Total decisions']) * 100

# Dividing the table in groups of 5 for each line
groups = [df.iloc[i:i+5] for i in range(0, len(df), 5)]

# Creating a pie for each country
for group in groups:
    plt.figure(figsize=(15, 5))
    for index, row in group.iterrows():
        plt.subplot(1, len(group), index % len(group) + 1)
        plt.pie([row['Rejected'], row['Total decisions'] - row['Rejected']], labels=['Rejected', 'Accepted'], autopct='%1.1f%%')
        plt.title(row['Country'])
    plt.show()



* ###  Let's look at the five leading countries in terms of the number of asylum applications, and it appears that the United States and Germany have a more lenient policy of approval and acceptance for refugees compared to France, England, and South Africa, which also have a large number of applications.

<a id='section3'></a>
### 3.File 3 - Age range of refugees

#### We wanted to examine the age range of refugees across all countries in order to analyze the refugee situation more accurately.

In [None]:
refugees_df2.head()

In [None]:
## we wanted delete the 2 columns with same info
# Deleting 'Male 5-17' column
refugees_df2.drop(columns=['Male 5-17'], inplace=True)
# Deleting 'Female 5-17' column
refugees_df2.drop(columns=['Female 5-17'], inplace=True)

In [None]:
refugees_df2.isnull().sum()

In [None]:
refugees_df2['Female 0-4'].replace('*', 0, inplace=True)
refugees_df2['Female 0-4'].fillna(0, inplace=True)

refugees_df2['Female 5-11'].replace('*', 0, inplace=True)
refugees_df2['Female 5-11'].fillna(0, inplace=True)

refugees_df2['Female 12-17'].replace('*', 0, inplace=True)
refugees_df2['Female 12-17'].fillna(0, inplace=True)

refugees_df2['Female 18-59'].replace('*', 0, inplace=True)
refugees_df2['Female 18-59'].fillna(0, inplace=True)

refugees_df2['Female 60+'].replace('*', 0, inplace=True)
refugees_df2['Female 60+'].fillna(0, inplace=True)

refugees_df2['F: Unknown'].replace('*', 0, inplace=True)
refugees_df2['F: Unknown'].fillna(0, inplace=True)

refugees_df2['F: Total'].replace('*', 0, inplace=True)
refugees_df2['F: Total'].fillna(0, inplace=True)

refugees_df2['Male 0-4'].replace('*', 0, inplace=True)
refugees_df2['Male 0-4'].fillna(0, inplace=True)

refugees_df2['Male 5-11'].replace('*', 0, inplace=True)
refugees_df2['Male 5-11'].fillna(0, inplace=True)

refugees_df2['Male 12-17'].replace('*', 0, inplace=True)
refugees_df2['Male 12-17'].fillna(0, inplace=True)

refugees_df2['Male 18-59'].replace('*', 0, inplace=True)
refugees_df2['Male 18-59'].fillna(0, inplace=True)

refugees_df2['Male 60+'].replace('*', 0, inplace=True)
refugees_df2['Male 60+'].fillna(0, inplace=True)

refugees_df2['M: Unknown'].replace('*', 0, inplace=True)
refugees_df2['M: Unknown'].fillna(0, inplace=True)

refugees_df2['M: Total'].replace('*', 0, inplace=True)
refugees_df2['M: Total'].fillna(0, inplace=True)


In [None]:
refugees_df2.dtypes

In [None]:
# המרת העמודות לטיפוס מספרי
refugees_df2['Female 0-4'] = refugees_df2['Female 0-4'].astype(int)
refugees_df2['Female 5-11'] = refugees_df2['Female 5-11'].astype(int)
refugees_df2['Female 12-17'] = refugees_df2['Female 12-17'].astype(int)
refugees_df2['Female 18-59'] = refugees_df2['Female 18-59'].astype(int)
refugees_df2['Female 60+'] = refugees_df2['Female 60+'].astype(int)
refugees_df2['F: Total'] = refugees_df2['F: Total'].astype(int)
refugees_df2['Male 0-4'] = refugees_df2['Male 0-4'].astype(int)
refugees_df2['Male 5-11'] = refugees_df2['Male 5-11'].astype(int)
refugees_df2['Male 12-17'] = refugees_df2['Male 12-17'].astype(int)
refugees_df2['Male 18-59'] = refugees_df2['Male 18-59'].astype(int)
refugees_df2['Male 60+'] = refugees_df2['Male 60+'].astype(int)


In [None]:
refugees_df2.dtypes

In [None]:
# Switching 'M: Total' dtype to numeric
refugees_df2['M: Total'] = pd.to_numeric(refugees_df2['M: Total'], errors='coerce')

# Calculation of the percentage of refugees in each age range
percent_women_0_4 = (refugees_df2['Female 0-4'] / refugees_df2['F: Total']) * 100
percent_women_5_11 = (refugees_df2['Female 5-11'] / refugees_df2['F: Total']) * 100
percent_women_12_17 = (refugees_df2['Female 12-17'] / refugees_df2['F: Total']) * 100
percent_women_18_59 = (refugees_df2['Female 18-59'] / refugees_df2['F: Total']) * 100
percent_women_60_plus = (refugees_df2['Female 60+'] / refugees_df2['F: Total']) * 100

percent_men_0_4 = (refugees_df2['Male 0-4'] / refugees_df2['M: Total']) * 100
percent_men_5_11 = ((refugees_df2['Male 5-11'] + refugees_df2['Male 12-17']) / refugees_df2['M: Total']) * 100
percent_men_12_17 = (refugees_df2['Male 12-17'] / refugees_df2['M: Total']) * 100
percent_men_18_59 = (refugees_df2['Male 18-59'] / refugees_df2['M: Total']) * 100
percent_men_60_plus = (refugees_df2['Male 60+'] / refugees_df2['M: Total']) * 100

percent_women = [percent_women_0_4.mean(), percent_women_5_11.mean(), 
                 percent_women_12_17.mean(), percent_women_18_59.mean(), percent_women_60_plus.mean()]
percent_men = [percent_men_0_4.mean(), percent_men_5_11.mean(), 
               percent_men_12_17.mean(), percent_men_18_59.mean(), percent_men_60_plus.mean()]

# Creating a list of labels for the X-axis
age_ranges = ['0-4', '5-11', '12-17', '18-59', '60+']

# Space on the x-axis for each group of columns
bar_positions_women = np.arange(len(age_ranges))
bar_positions_men = [pos + 0.4 for pos in bar_positions_women]

bar_width = 0.4
plt.figure(figsize=(10, 6))

plt.bar(bar_positions_women, percent_women, color='Teal', label='Women', width=bar_width)
plt.bar(bar_positions_men, percent_men, color='Skyblue', label='Men', width=bar_width)

plt.title('Percentage of Refugees by Age Range and Gender')
plt.xlabel('Age Range')
plt.ylabel('Percentage')
plt.xticks([pos + bar_width / 2 for pos in bar_positions_women], age_ranges)
plt.legend()

plt.show()

<a id='section4'></a>
### 4.File 4 - Population type

#### In the next data we will focus on the demographics of refugees and asylum seekers.

In [None]:
refugees_df4.head()

In [None]:
df15 = refugees_df4[(refugees_df4['Year'] == 2015)]

In [None]:
# Summary of the population types according to the different types
df15['Value'] = pd.to_numeric(df15['Value'], errors='coerce')
population_summary= df15.groupby('Population type')[['Value']].sum().reset_index()
population_summary

In [None]:
df = pd.DataFrame(df15)
# Filtering the data for the years 2015 and the countries with the highest values of the stateless people
filtered_df = df[(df['Year'].isin([2015])) & (df['Population type'] == 'Returnees')].sort_values(by='Value', ascending=False)

In [None]:
filtered_df['Value'] = filtered_df['Value'].fillna(0)
filtered_df['Value'] = filtered_df['Value'].astype(int)

# Find the 10 countries with the highest values in the 'Value' column
top_Origin = filtered_df.nlargest(10, 'Value')

plt.figure(figsize=(10, 6))
plt.bar(top_Origin['Origin'], top_Origin['Value'], color='Maroon')
plt.title('Top 8 Countries with Highest Number of Returnees')
plt.xlabel('Origin')
plt.ylabel('Number of Returnees Persons')
plt.xticks(rotation=45)
plt.show()


##### The graph displays the top 10 countries with the highest number of returning refugees.

* ### We noticed that these countries are third world countries where the number of refugees leaving is also significant, and we observed earlier that in these countries there is a situation that does not allow good and decent conditions for a citizen in the country. Therefore, we can speculate that there have been improvements in these countries and efforts have been made to improve the economic and social situation in these countries in order to assist the local population, which influenced the decision of the refugees to return to their country.

#### During the work we used AI software. such as chat GPT