# Final Project - Explainer

## Authors - Group 48

- Bartosz Ziolkowski, s230080
- Kristoffer Plehn, s203777

## 1. Motivation

We used two datasets from [Statistikbanken.dk: ](https://www.statistikbanken.dk/)
- Immigration by sex, age, country of origin and citizenship (1980-2023)      
- Emigration by sex, age, country of destination and citizenship (1980-2023)

We chose them because they contain data on the number and some metrics of migrants in Denmark, which is what we wanted to analyze. In addition, they come from an authorized source.

Our goal in analyzing the datasets was to acquaint the end user with immigration to Denmark and emigration from Denmark between 2015 and 2023, considering individuals aged 0-80 and EU/EEA countries.

## 2. Basic Stats

The dataset on immigration has a total size of 494 KB and 10.045 rows. In turn, the dataset on emigration has a total size of 491 KB and 10.045 rows as well. 

Both datasets have 13 columns: `OriginCountry`/`Destination`, `Citizenship`, `Sex`, `Age`, `2015`, `2016`, `2017`, `2018`, `2019`, `2020`, `2021`, `2022`, `2023`. 

The user interface of the website with the datasets allows the user to download the data by selecting desired columns and other grouping values e.g. specific countries, years, citizenships, age, and sex. We were interested in the period from 2015-2023, EU/EEA countries and migrants aged 0-80.

## 3. Data Analysis

### 3.1 Code on the initial overview on migration

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

immigration_data = pd.read_csv('Immigration_2015-2023.csv', sep=';')
emigration_data = pd.read_csv('Emigration_2015-2023.csv', sep=';')

# Select only the years' columns
immigration_years = immigration_data.columns[4:]
emigration_years = emigration_data.columns[4:]

# Extract data for selected years
immigration_data = immigration_data[immigration_years]
emigration_data = emigration_data[emigration_years]

# Calculate total immigration and emigration for each year
immigration_total = immigration_data.sum()
emigration_total = emigration_data.sum()

# Calculate migration balance 
migration_balance = immigration_total - emigration_total

fig, axs = plt.subplots(1, 3, figsize=(20, 6)) 

axs[0].plot(immigration_years, immigration_total, color='blue', marker='o', linestyle='-')
axs[0].set_title('Immigration to Denmark', fontsize=18)
axs[0].set_xlabel('Year')
axs[0].set_ylabel('Number of People')

axs[1].plot(emigration_years, emigration_total, color='red', marker='o', linestyle='-')
axs[1].set_title('Emigration from Denmark', fontsize=18)
axs[1].set_xlabel('Year')

axs[2].plot(immigration_years, migration_balance, color='green', marker='o', linestyle='-')
axs[2].set_title('Migration Balance (Immigration - Emigration)', fontsize=18)
axs[2].set_xlabel('Year')

# Adjust y-axis ticks for emigration chart
emigration_start = 25000
emigration_tick_interval = 1000
emigration_ticks = [emigration_start + i * emigration_tick_interval for i in range(8)]
axs[1].set_yticks(emigration_ticks)

# Adjust y-axis ticks for migration balance chart
balance_start = 10000
balance_tick_interval = 2000
balance_ticks = [balance_start + i * balance_tick_interval for i in range(8)]
axs[2].set_yticks(balance_ticks)

plt.tight_layout()
plt.show()

### 3.2 Code on the top 15 countries on migration

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

immigration_data = pd.read_csv('Immigration_2015-2023.csv', sep=';')
emigration_data = pd.read_csv('Emigration_2015-2023.csv', sep=';')

# Select only numeric columns for both immigration and emigration data
numeric_columns_immigration = immigration_data.select_dtypes(include='number')
numeric_columns_emigration = emigration_data.select_dtypes(include='number')

# Calculate total immigration and emigration for each country
total_immigration_by_country = numeric_columns_immigration.groupby(immigration_data['OriginCountry']).sum().sum(axis=1)
total_emigration_by_country = numeric_columns_emigration.groupby(emigration_data['Destination']).sum().sum(axis=1)

# Select the top 15 countries with the highest immigration and emigration numbers
top_15_countries_immigration = total_immigration_by_country.nlargest(15)
top_15_countries_emigration = total_emigration_by_country.nlargest(15)

# Define colors for charts
colors = [
    '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
    '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf',
    '#1a55FF', '#2aFF3A', '#FF5733', '#33FFF6', '#8E44AD'
]

# Calculate migration ratio
migration_ratio = top_15_countries_immigration - top_15_countries_emigration

# Sort migration ratio data in descending order
migration_ratio_sorted = migration_ratio.sort_values(ascending=False)

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 8))

ax1.pie(top_15_countries_immigration, autopct='%1.1f%%', colors=colors, textprops={'color': 'white', 'weight': 'bold'})
ax1.set_title('Top 15 Countries by Total Immigration to Denmark', loc='center', fontsize=18)

ax2.pie(top_15_countries_emigration, autopct='%1.1f%%', colors=colors, textprops={'color': 'white', 'weight': 'bold'})
ax2.set_title('Top 15 Countries by Total Emigration from Denmark', loc='center', fontsize=18)

ax1.legend(top_15_countries_immigration.index, loc='upper left', bbox_to_anchor=(-0.1, 1), ncol=1)
ax2.legend(top_15_countries_emigration.index, loc='upper left', bbox_to_anchor=(-0.1, 1), ncol=1)

ax3.bar(migration_ratio_sorted.index, migration_ratio_sorted, color=colors)
ax3.set_title('Migration Ratio (Immigration - Emigration)', fontsize=18)
ax3.set_xlabel('Country', fontsize=14)
ax3.set_ylabel('Number of People', fontsize=14)
ax3.tick_params(axis='x', rotation=45)
ax3.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

### 3.3 Code on the Bokeh plot on migration

In [None]:
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.models import ColumnDataSource, CustomJS, RadioButtonGroup, FixedTicker, Legend, LegendItem
from bokeh.layouts import column
from bokeh.palettes import Spectral11

# Additional colors for better visualization
Spectral15 = tuple(Spectral11) + ('#6a51a3', '#807dba', '#9e9ac8', '#000000')

emigration_data = pd.read_csv("Emigration_2015-2023.csv", delimiter=';', thousands=',')
immigration_data = pd.read_csv("Immigration_2015-2023.csv", delimiter=';', thousands=',')

# Reshape data for easy plotting
emigration_data = emigration_data.melt(id_vars=['Destination', 'Citizenship', 'Sex', 'Age'], 
                                         var_name='Year', value_name='Emigrants')
immigration_data = immigration_data.melt(id_vars=['OriginCountry', 'Citizenship', 'Sex', 'Age'], 
                                         var_name='Year', value_name='Immigrants')

# Convert 'Year' column to integer type
emigration_data['Year'] = emigration_data['Year'].astype(int)
immigration_data['Year'] = immigration_data['Year'].astype(int)

total_immigration_by_country = immigration_data.groupby(['OriginCountry']).sum()
total_emigration_by_country = emigration_data.groupby(['Destination']).sum()

top_15_countries_immigration = total_immigration_by_country['Immigrants'].nlargest(15).index
top_15_countries_emigration = total_emigration_by_country['Emigrants'].nlargest(15).index

# Filter data for top 15 countries
immigration_data = immigration_data[immigration_data['OriginCountry'].isin(top_15_countries_immigration)]
emigration_data = emigration_data[emigration_data['Destination'].isin(top_15_countries_emigration)]

# Pivot data for easy plotting
immigration_data = immigration_data.pivot_table(index='Year', columns='OriginCountry', values='Immigrants', aggfunc='sum').fillna(0)
emigration_data = emigration_data.pivot_table(index='Year', columns='Destination', values='Emigrants', aggfunc='sum').fillna(0)

# Create ColumnDataSource for both immigration and emigration data
imm_src = ColumnDataSource(immigration_data)
emg_src = ColumnDataSource(emigration_data)

# Create figures for plotting
imm_chart = figure(width=1200, x_axis_label="Year", y_axis_label="Number of Immigrants",
                   title="Immigration to Denmark by Country (2015-2023)", x_range=(2015, 2023))
emg_chart = figure(width=1200, x_axis_label="Year", y_axis_label="Number of Emigrants",
                   title="Emigration from Denmark by Country (2015-2023)", x_range=(2015, 2023), visible=False)

imm_legend = Legend(items=[])
emg_legend = Legend(items=[])

# Plot lines for each country in immigration data
for i, country in enumerate(top_15_countries_immigration):
    line = imm_chart.line(x='Year', y=country, source=imm_src, line_width=2,
                          color=Spectral15[i])
    imm_legend.items.append(LegendItem(label=str(country), renderers=[line]))

# Plot lines for each country in emigration data
for i, country in enumerate(top_15_countries_emigration):
    line = emg_chart.line(x='Year', y=country, source=emg_src, line_width=2,
                          color=Spectral15[i])
    emg_legend.items.append(LegendItem(label=str(country), renderers=[line]))


imm_legend.click_policy = "hide"  
imm_legend.location = "top_left"
emg_legend.click_policy = "hide"
emg_legend.location = "top_right"


imm_chart.add_layout(imm_legend, 'right')
emg_chart.add_layout(emg_legend, 'right')

# Set fixed tickers for x-axis
years = list(range(2015, 2024)) 
imm_chart.xaxis.ticker = FixedTicker(ticks=years)
emg_chart.xaxis.ticker = FixedTicker(ticks=years)

# Define callback function for radio button group
callback = CustomJS(args=dict(imm_chart=imm_chart, emg_chart=emg_chart), code="""
    if (cb_obj.active == 0) {
        emg_chart.visible = false;
        imm_chart.visible = true;
    } else {
        imm_chart.visible = false;
        emg_chart.visible = true;
    }
""")

# Create radio button group for selecting immigration or emigration chart
radio_button_group = RadioButtonGroup(labels=["Immigration", "Emigration"], active=0)
radio_button_group.js_on_change('active', callback)

# Arrange plots and radio button group in a column layout
layout = column(radio_button_group, imm_chart, emg_chart)

output_file("ImmigrationEmigrationByCountry.html")
show(layout)

## 4. Genre

We chose to write the data story in Magazine Style. 

### Visual Narrative: 

#### Visual structuring
* Consistent Visual Platform: The consistent visual platform is maintained throughout the visualization, with a clear layout, axis labels, title, and legend for interpretation to keep a similar structure throughout the post.

#### Highlighting
* Feature Distinction: Features are distinguished using different colors for each country's data in the chart, and allowing readers to switch between charts.
* Zooming: Features on the interactive level can be zoomed in, in order to get further insights from the visualizations. 

### Narrative Structure: 

#### Ordering
* Random Access: The order of the post is to be randomly explored and the viewer is therefore free to determine how they want to access the visualizations. 
  
#### Interactivity
* Hover Highlighting / Details: Hover highlighting or details are implemented in charts to give better viewer experience. 
* Filtering / Selection / Search: Filtering and selection are provided through the buttons and mute policy, allowing users to switch between immigration and emigration data and compare different countries as well. 
* Very Limited Interactivity: The interactivity is limited to hovering and switching between two views of data (immigration and emigration) using buttons. 
* Tacit Tutorial: The buttons serve as a tacit tutorial, intuitively show users how to interact with the visualizations.
* Stimulating Default Views: The default view of the visualization is stimulating as it presents immigration data, where users can switch to emigration data for comparison. Furthermore, that charts with hover highlighting also prompting them to delve deeper into the data.

#### Messaging
* Captions / Headlines: Captions or headlines are included to provide an overview of the visualizations.
* Summary / Synthesis: While the visualization provides a summary of immigration and emigration data, we explicitly synthesize the findings and compare them to the outside world. 

## 5. Visualizations

### 5.1 Visualization on the initial overview on migration

This visualization consists of three line subplots. The first two aim to introduce and provide the reader with an overview of immigration and emigration trends over the period of 2015-2023. Meanwhile, the migration balance plot illustrates the overall migration situation in Denmark. In my opinion, line plots were a suitable choice for facilitating understanding by the average person because, in this case, the numbers of individuals over specific years clearly visualize the data history.

### 5.2 Visualization on the top 15 countries on migration

XYZ

### 5.3 Visualization on the Bokeh plot on migration

XYZ

## 6. Discussion



Our visualization project aimed to explore migration patterns and trends to uncover the reality in Denmark. The chosen visualizations have proven to be effective in conveying the intended message and revealing insightful patterns in the data. However, to fully stimulate the viewer, the plots could have been more interactive showcasing each datapoint to be explored. We still find the level of complexity to be suitable for the story to be told and successfully enough to uncover the patterns and trends in the dataset. Even though, we have looked at the outside world to understand the overall trends in migration patterns, this could have further been explored to look for even more answers, which potentially could have uncovered additional insights. Furthermore, the findings of two datasets could lead to explore other datasets, explicitly looking at some of the most contributing countries' economic indicators to further understand the migration trends. 

Additionally, the visualization techniques effectively conveyed the story, but there may be opportunities to enhance the interactiveness and add more interactive elements to introduce even more playful features to the viewer. It could have been even more messaging or visual structuring tools to allow for more dynamically insights. This enables viewers to engage with the data more actively and customize their viewing experience according to their preferences. Furthermore, effectively guiding viewers through the narrative can ensure that key insights are communicated clearly. 

## 7. Contributions

### 7.1 Explainer Notebook

|   | s203777 | s230080 |
|---|--------|---------|
| Motivation | Reviewer | Main Contributor |
| Basic Stats | Reviewer | Main Contributor |
| Data Analysis | Reviewer | Main Contributor |
| Genre | Main Contributor | Reviewer |
| Visualizations | Main Contributor | Reviewer |
| Discussion | Main Contributor | Reviewer |

### 7.2 Blog Post

|   | s203777 | s230080 |
|---|--------|---------|
| Motivation | Reviewer | Main Contributor |
| Basic Stats | Reviewer | Main Contributor |
| Data Analysis | Reviewer | Main Contributor |
| Genre | Main Contributor | Reviewer |
| Visualizations | Main Contributor | Reviewer |
| Discussion | Main Contributor | Reviewer |