## 1. Introduction

In this project, we explored the **Henley Passport Index** dataset from the TidyTuesday 2025-09-09 release.  
The index ranks passports worldwide based on the number of destinations their holders can enter without obtaining a visa in advance. This includes visa-on-arrival, electronic travel authorization (eTA), and visitor permits. The higher the number, the stronger the passport.

Our question of interest is:  
**How has global visa-free access evolved from 2015 to 2025, and what patterns can we observe across regions in terms of outbound travel freedom and inbound openness?**

We were also interested in whether countries with powerful passports tend to reciprocate by being welcoming to visitors. In other words, do countries that enjoy high travel freedom also grant it?

The dataset is composed of two tables:
- `rank_by_year.csv`: Each country’s yearly ranking, visa-free count, and region.
- `country_lists.csv` – listing detailed passport–destination relationships with visa requirements for each pair of countries and years.

By integrating these two tables, we analyzed both **passport strength** (how many places a country’s citizens can go) and **welcoming score** (how open a country is to others).  
This combination provides a complete picture of global mobility dynamics over the decade.

In [1]:
# -----------------------------
# Data manipulation
# -----------------------------
import pandas as pd
import numpy as np
import json
import ast

# -----------------------------
# Plotting
# -----------------------------
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

import requests
from io import StringIO

## 2. Data Cleaning and Summary

We first loaded both datasets directly from the TidyTuesday GitHub repository using `requests` and `pandas`.

In [2]:
# URL for rank_by_year.csv
url_rank = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-09/rank_by_year.csv"

# Fetch CSV via requests (disable SSL verification temporarily)
response = requests.get(url_rank, verify = False)
csv_data = StringIO(response.text)

# Read into pandas DataFrame
rank_by_year = pd.read_csv(csv_data, keep_default_na = False) # Namibia's 2 digit country code is NA. Thus, we want pandas to treat it as a string.

# Quick check
print("Rank by Year:")
print(rank_by_year)



Rank by Year:
     code                country       region  rank  visa_free_count  year
0      AF            Afghanistan         ASIA   116               26  2021
1      AF            Afghanistan         ASIA   106               26  2020
2      AF            Afghanistan         ASIA   106               30  2018
3      AF            Afghanistan         ASIA   104               24  2017
4      AF            Afghanistan         ASIA   104               25  2016
...   ...                    ...          ...   ...              ...   ...
3945   PS  Palestinian Territory  MIDDLE EAST   102               37  2019
3946   PS  Palestinian Territory  MIDDLE EAST   105               37  2022
3947   PS  Palestinian Territory  MIDDLE EAST   103               38  2023
3948   PS  Palestinian Territory  MIDDLE EAST    98               40  2024
3949   PS  Palestinian Territory  MIDDLE EAST    93               39  2025

[3950 rows x 6 columns]


There were **no missing values**, and the data provided a clear year-by-year ranking for each country.  

In [3]:

# Count NaN (missing) values per column
print("Missing (NaN) values per column:")
print(rank_by_year.isna().sum())

# Count total NaN values
total_missing = rank_by_year.isna().sum().sum()
print(f"\nTotal missing (NaN) values in the DataFrame: {total_missing}")

# Count zero values per column
print("\nZero values per column:")
print((rank_by_year == 0).sum())

# Count total zero values
total_zeros = (rank_by_year == 0).sum().sum()
print(f"\nTotal zero values in the DataFrame: {total_zeros}")


Missing (NaN) values per column:
code               0
country            0
region             0
rank               0
visa_free_count    0
year               0
dtype: int64

Total missing (NaN) values in the DataFrame: 0

Zero values per column:
code                 0
country              0
region               0
rank                 0
visa_free_count    447
year                 0
dtype: int64

Total zero values in the DataFrame: 447


### Cleaning `rank_by_year.csv`
- Replaced zeros with `NaN` and dropped missing rows to remove incomplete records.  
- For each country, we **interpolated small gaps** (up to two missing years) in `rank` and `visa_free_count` using the `groupby().apply(lambda x: x.interpolate())` approach.  
- The result was a smooth time-series dataset covering 2015–2025, suitable for trend analysis.

In [4]:

# Replace all 0s with NaN so we can drop them together
rank_by_year = rank_by_year.replace(0, np.nan)

# Drop all rows that have any NaN (including those converted from 0)
rank_by_year = rank_by_year.dropna()

# Interpolate numeric columns per country (only if the missing portion is small <=2)
for col in ["rank", "visa_free_count"]:
    if col in rank_by_year.columns:
        rank_by_year[col] = (
            rank_by_year.groupby("country")[col]
            .apply(lambda x: x.interpolate(limit = 2, limit_direction = "both"))
            .reset_index(level = 0, drop = True)
        )

# Confirm cleanup
print(f"Remaining NaN values: {rank_by_year.isna().sum().sum()}")
print(f"Remaining zero values: {(rank_by_year == 0).sum().sum()}")
print(rank_by_year.head())

Remaining NaN values: 0
Remaining zero values: 0
  code      country region  rank  visa_free_count  year
0   AF  Afghanistan   ASIA   116             26.0  2021
1   AF  Afghanistan   ASIA   106             26.0  2020
2   AF  Afghanistan   ASIA   106             30.0  2018
3   AF  Afghanistan   ASIA   104             24.0  2017
4   AF  Afghanistan   ASIA   104             25.0  2016


### Processing `country_lists.csv`
This table contains nested lists of visa relationships under keys like `visa_required`, `visa_on_arrival`, and `visa_free_access`.  

In [5]:
# URL for country_lists.csv
url_country = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-09/country_lists.csv"

# Fetch CSV via requests (disable SSL verification temporarily)
response = requests.get(url_country, verify = False)
csv_data = StringIO(response.text)

# Read into pandas DataFrame
country_lists = pd.read_csv(csv_data)

# Quick check
print("Country Lists:")
country_lists.head()



Country Lists:


Unnamed: 0,code,country,visa_required,visa_online,visa_on_arrival,visa_free_access,electronic_travel_authorisation
0,PS,Palestinian Territory,"[[{""code"":""AF"",""name"":""Afghanistan""},{""code"":""...","[[{""code"":""AG"",""name"":""Antigua and Barbuda""},{...","[[{""code"":""BD"",""name"":""Bangladesh""},{""code"":""B...","[[{""code"":""BO"",""name"":""Bolivia""},{""code"":""CK"",...","[[{""code"":""LK"",""name"":""Sri Lanka""},{""code"":""KE..."
1,AD,Andorra,"[[{""code"":""AF"",""name"":""Afghanistan""},{""code"":""...","[[{""code"":""AO"",""name"":""Angola""},{""code"":""AZ"",""...","[[{""code"":""BH"",""name"":""Bahrain""},{""code"":""BD"",...","[[{""code"":""JP"",""name"":""Japan""},{""code"":""AL"",""n...","[[{""code"":""AU"",""name"":""Australia""},{""code"":""CA..."
2,VA,Vatican City,"[[{""code"":""AF"",""name"":""Afghanistan""},{""code"":""...","[[{""code"":""AZ"",""name"":""Azerbaijan""},{""code"":""B...","[[{""code"":""BH"",""name"":""Bahrain""},{""code"":""BD"",...","[[{""code"":""AL"",""name"":""Albania""},{""code"":""AD"",...","[[{""code"":""AU"",""name"":""Australia""},{""code"":""CA..."
3,SM,San Marino,"[[{""code"":""AF"",""name"":""Afghanistan""},{""code"":""...","[[{""code"":""AZ"",""name"":""Azerbaijan""},{""code"":""B...","[[{""code"":""BH"",""name"":""Bahrain""},{""code"":""BD"",...","[[{""code"":""JP"",""name"":""Japan""},{""code"":""AL"",""n...","[[{""code"":""AU"",""name"":""Australia""},{""code"":""CA..."
4,MC,Monaco,"[[{""code"":""AF"",""name"":""Afghanistan""},{""code"":""...","[[{""code"":""AZ"",""name"":""Azerbaijan""},{""code"":""B...","[[{""code"":""BH"",""name"":""Bahrain""},{""code"":""BD"",...","[[{""code"":""JP"",""name"":""Japan""},{""code"":""AL"",""n...","[[{""code"":""AU"",""name"":""Australia""},{""code"":""CA..."


In [6]:
json_cols = [
    "visa_required",
    "visa_online",
    "visa_on_arrival",
    "visa_free_access",
    "electronic_travel_authorisation"
]

def clean_json_field(text):
    """Convert the messy stringified JSON fields into clean Python lists of dicts."""
    if pd.isna(text):
        return []
    try:
        data = json.loads(text)
        # Many fields are [[{...}]] — unwrap the extra list
        if isinstance(data, list) and len(data) == 1 and isinstance(data[0], list):
            data = data[0]
        return data
    except Exception as e:
        return []
    
for col in json_cols:
    country_lists[col] = country_lists[col].apply(clean_json_field)

# Quick check
print("Cleaned Country Lists:")
country_lists.head()


Cleaned Country Lists:


Unnamed: 0,code,country,visa_required,visa_online,visa_on_arrival,visa_free_access,electronic_travel_authorisation
0,PS,Palestinian Territory,"[{'code': 'AF', 'name': 'Afghanistan'}, {'code...","[{'code': 'AG', 'name': 'Antigua and Barbuda'}...","[{'code': 'BD', 'name': 'Bangladesh'}, {'code'...","[{'code': 'BO', 'name': 'Bolivia'}, {'code': '...","[{'code': 'LK', 'name': 'Sri Lanka'}, {'code':..."
1,AD,Andorra,"[{'code': 'AF', 'name': 'Afghanistan'}, {'code...","[{'code': 'AO', 'name': 'Angola'}, {'code': 'A...","[{'code': 'BH', 'name': 'Bahrain'}, {'code': '...","[{'code': 'JP', 'name': 'Japan'}, {'code': 'AL...","[{'code': 'AU', 'name': 'Australia'}, {'code':..."
2,VA,Vatican City,"[{'code': 'AF', 'name': 'Afghanistan'}, {'code...","[{'code': 'AZ', 'name': 'Azerbaijan'}, {'code'...","[{'code': 'BH', 'name': 'Bahrain'}, {'code': '...","[{'code': 'AL', 'name': 'Albania'}, {'code': '...","[{'code': 'AU', 'name': 'Australia'}, {'code':..."
3,SM,San Marino,"[{'code': 'AF', 'name': 'Afghanistan'}, {'code...","[{'code': 'AZ', 'name': 'Azerbaijan'}, {'code'...","[{'code': 'BH', 'name': 'Bahrain'}, {'code': '...","[{'code': 'JP', 'name': 'Japan'}, {'code': 'AL...","[{'code': 'AU', 'name': 'Australia'}, {'code':..."
4,MC,Monaco,"[{'code': 'AF', 'name': 'Afghanistan'}, {'code...","[{'code': 'AZ', 'name': 'Azerbaijan'}, {'code'...","[{'code': 'BH', 'name': 'Bahrain'}, {'code': '...","[{'code': 'JP', 'name': 'Japan'}, {'code': 'AL...","[{'code': 'AU', 'name': 'Australia'}, {'code':..."


To make the data usable, we:
1. **Exploded** each visa-type column into a long format (one row per passport–destination pair).  
2. Extracted destination codes and names.  
3. Concatenated all visa types into a single dataframe `flat_df`.  
4. Dropped duplicate and missing entries to get clean passport–destination relationships.

In [7]:
visa_cols = [
    "visa_required",
    "visa_online",
    "visa_on_arrival",
    "visa_free_access",
    "electronic_travel_authorisation"
]

flat_frames = []

for col in visa_cols:
    temp = country_lists[["code", "country", col]].explode(col)
    temp = temp.dropna(subset = [col])
    temp["visa_type"] = col
    temp["to_code"] = temp[col].apply(lambda x: x.get("code") if isinstance(x, dict) else None)
    temp["to_name"] = temp[col].apply(lambda x: x.get("name") if isinstance(x, dict) else None)
    temp = temp.drop(columns = [col])
    flat_frames.append(temp)

# Combine all into one DataFrame
flat_df = pd.concat(flat_frames, ignore_index = True)

# Remove rows missing target country codes
flat_df = flat_df.dropna(subset = ["to_code"])

# Drop duplicates if any
flat_df = flat_df.drop_duplicates(subset = ["code", "to_code", "visa_type"]).reset_index(drop = True)

# Rename columns for clarity
flat_df.rename(columns = {
    "code": "from_code",
    "country": "from_country"
}, inplace=True)


# Quick check
print("Flattened Visa Data:")
flat_df.head()

Flattened Visa Data:


Unnamed: 0,from_code,from_country,visa_type,to_code,to_name
0,PS,Palestinian Territory,visa_required,AF,Afghanistan
1,PS,Palestinian Territory,visa_required,DZ,Algeria
2,PS,Palestinian Territory,visa_required,AD,Andorra
3,PS,Palestinian Territory,visa_required,AO,Angola
4,PS,Palestinian Territory,visa_required,AI,Anguilla


In [8]:
visa_summary = flat_df.groupby("visa_type").size().reset_index(name = "count")
visa_summary

Unnamed: 0,visa_type,count
0,electronic_travel_authorisation,1382
1,visa_free_access,15066
2,visa_on_arrival,5316
3,visa_online,5817
4,visa_required,17392


This summary provides a clear snapshot of global mobility inequality and sets up later visualizations that focus on how these patterns evolve over time.

In [9]:
visa_by_country = (
    flat_df.groupby(["from_country", "visa_type"])
    .size()
    .reset_index(name = "destination_count")
    .sort_values(["from_country", "visa_type"])
)

visa_pivot = visa_by_country.pivot(
    index = "from_country",
    columns = "visa_type",
    values = "destination_count"
).fillna(0).astype(int)

visa_pivot = visa_pivot.reset_index()

print("Visa Counts by Country:")
visa_pivot.head()

Visa Counts by Country:


visa_type,from_country,electronic_travel_authorisation,visa_free_access,visa_on_arrival,visa_online,visa_required
0,Afghanistan,3,6,16,43,158
1,Albania,6,88,29,29,74
2,Algeria,2,26,27,39,132
3,Andorra,16,120,35,23,32
4,Angola,2,26,20,37,141


### Regional Summary Statistics (2015–2025)

To provide an overview of global mobility patterns, we grouped the `rank_by_year` dataset by **year** and **region** to compute key summary statistics:

- **Average visa-free count (`avg_visa_free`)** – mean number of destinations accessible without a visa.  
- **Median visa-free count (`med_visa_free`)** – midpoint value for each region, showing the typical passport strength.  
- **Maximum visa-free count (`max_visa_free`)** – the region’s strongest passport in a given year.  
- **Top country (`top_country`)** – the country achieving the highest visa-free access within its region.

From the early results, we observed clear regional differences:  
Europe led with an average of around 165 visa-free destinations in 2025, while Africa trailed at roughly 62. The top passports in each region included **Denmark (Europe)**, **Canada (Americas)**, **Barbados (Caribbean)**, **New Zealand (Oceania)**, **Singapore (Asia)**, **UAE (Middle East)**, **Seychelles (Africa)**.

These statistics establish a quantitative baseline for the decade-long analysis that follows, highlighting the persistent mobility gap between regions.


In [10]:
# Group by year and region
region_stats_yearly = rank_by_year.groupby(["year", "region"]).agg(
    avg_visa_free = ("visa_free_count", "mean"),      # average visa-free count
    med_visa_free = ("visa_free_count", "median"),    # median visa-free count
    max_visa_free = ("visa_free_count", "max"),       # strongest passport in the region
    top_country = ("country", lambda x: rank_by_year.loc[x.index, "visa_free_count"].idxmax())
).sort_values(['year', 'avg_visa_free'], ascending=[True, False]).reset_index()

# Fix top_country to show the country name
region_stats_yearly["top_country"] = region_stats_yearly["top_country"].apply(lambda idx: rank_by_year.loc[idx, "country"])

region_stats_yearly[region_stats_yearly['year'] == 2025].head(7)

Unnamed: 0,year,region,avg_visa_free,med_visa_free,max_visa_free,top_country
119,2025,EUROPE,165.265306,183.0,189.0,Denmark
120,2025,AMERICAS,136.545455,139.0,184.0,Canada
121,2025,CARIBBEAN,126.230769,147.0,163.0,Barbados
122,2025,OCEANIA,124.357143,125.0,187.0,New Zealand
123,2025,ASIA,86.0,64.0,193.0,Singapore
124,2025,MIDDLE EAST,77.733333,67.0,184.0,United Arab Emirates
125,2025,AFRICA,62.259259,58.5,156.0,Seychelles


## 3(1). Visualization 1 – Global Changes in Visa-Free Access (2015–2025)

We created an **animated choropleth map** to visualize how visa-free access changed worldwide between 2015 and 2025.  
Each frame represents a year, and countries are shaded according to their total number of visa-free destinations.

- **Variables used**
  - `country` (location, `locationmode="country names"`)
  - `year` (animation frame)
  - `visa_free_count` (color scale)
- **Why this visualization**
  - Our question asks “how has access evolved” and “where”; a choropleth encodes **geography** directly and the **animation** isolates year-to-year change without clutter. The continuous color scale supports quick, pre-attentive comparison across hundreds of countries in each frame.

## 4(1). Discussion 1

### Key Insights
- **Global improvement:** Many regions gradually darken over time; for example, China, Russia, Ukraine, and countries within North America. This reflects broader travel liberalization. 
- **Persistent inequality:** Europe, East Asia, and North America remain dominant. Africa and South Asia show slower progress, staying lighter in color.  
- **Sudden changes:** The biggest jump occured between 2017 and 2018.

**Notable Countries**
- **Singapore** remains very dark from 2018 onward and is still No. 1 in 2025 with 193 visa-free destinations (Henley & Partners, 2025). This matches the high end of our color scale. 
- **United Arab Emirates** clearly darkens across frames. In 2025 it reaches the global top 10 and offers 184 destinations, a dramatic rise over the decade (Nair, 2025).
- **China** shifts from lighter to visibly darker tones, consistent with press coverage noting increased visa-waiver agreements and a ranking uptick in 2025.
- **Ukraine** also darkened by a fair bit and climbed global rankings with 147 visa-free countries in 2025
- **United States** appears relatively flat and eventually lighter than several European and Asian peers. News coverage and Henley press notes document the U.S. slipping out of the top 10 in 2025, tied for ~180 destinations (Henley & Partners, 2025). 

**Why the data appears the way it does?**

**1. Top regions remain dark because diplomacy compounds.**  
Countries already possessing strong networks of visa waivers continued to expand or maintain them.  
- **Europe** stays dark throughout due to the long-standing Schengen framework and intra-European mobility agreements that guarantee reciprocal access among members.  
- **East Asia**, particularly **Japan** and **South Korea**, remain consistently high performers.  
- **Singapore** remains the darkest country from 2018 onward, ranked No. 1 in 2025 with access to 193 destinations — the highest in Henley’s July 2025 release.  
These regions benefit from entrenched diplomatic infrastructure and administrative capacity that reinforce passport strength over time.  

**2. Some countries darken rapidly due to deliberate policy shifts.**  
The **United Arab Emirates** is the clearest case. It darkens dramatically between 2015 and 2025, climbing into the global top 10 with access to 184 destinations.  
This mirrors the UAE’s decade-long visa diplomacy campaign, where reciprocal visa-waiver agreements were strategically negotiated to boost travel freedom and tourism competitiveness (Nair, 2025).

**3. 2017–2018 marks a global inflection point — the fastest collective rise in mobility.**  
The single biggest jump visible in the animation occurs between **2017 and 2018**, when several countries’ passports darken simultaneously.  
According to the official Henley 2018 press release, **China**, **Russia**, **Georgia**, and **Ukraine** were among the most improved nations, gaining the largest number of new visa-free or visa-on-arrival destinations compared with 2017.  
This acceleration coincides with a surge in **bilateral visa-waiver treaties** and the global rollout of **electronic visa systems (e-visas and eTAs)** that reduced administrative barriers (Nesheim, 2018).  

- **China**, for instance, expanded its visa-waiver and visa-on-arrival arrangements during this window, particularly with ASEAN members and select European states. The Henley 2018 and 2019 reports describe China’s “largest one-year jump” in its ranking history, driven by new mutual agreements and growing tourism diplomacy.  
- **India** also benefitted during this period, adding multiple e-visa partnerships and new visa-on-arrival destinations across Asia and Africa.  
- Many **developing countries** joined global e-visa networks or relaxed entry policies for select nationalities, boosting overall counts even without full reciprocal treaties.

This 2017–2018 acceleration explains why the map darkens sharply in many regions at once. It reflects the industry-wide adoption of electronic travel authorization systems, post-2016 tourism recovery, and the geopolitical push for connectivity following international summits and trade expansion.

**4. Eastern Europe’s surge stems from EU visa-liberalization.**  
The map shows **Ukraine** darkening sharply after 2017. It was cited by Henley as the **biggest climbers globally** that year — Ukraine up **14 ranks** — directly following the **EU’s 2017 visa-free regime** for their citizens (Henley & Partners, 2018).  
This policy allowed Ukrainians and Georgians to enter the Schengen Area without visas, dramatically expanding their global mobility and aligning their trajectories with other Eastern European states (Visas: Council Adopts Regulation on Visa Liberalisation for Ukrainian Citizens, 2017).

**5. Why these trends persist.**  
- Regions with **institutionalized frameworks** (Europe, parts of East Asia) exhibit stability and consistently dark colors due to predictable reciprocity and governance capacity.  
- Regions relying on **ad-hoc bilateralism** (Africa, South Asia) advance more slowly, producing visible gaps and lighter tones.  
- Rapid climbers (UAE, China, Ukraine) demonstrate how targeted diplomacy and economic motives can translate into tangible mobility gains within a decade.  


Therefore, this visualization demonstrates that while average travel freedom has improved globally, large regional disparities remain.

In [11]:
# Sort and filter data
rank_filtered = rank_by_year.sort_values(by = ["country", "year"], ascending = True)
filtered_data = rank_filtered [(rank_filtered ["year"] >= 2015) & (rank_filtered ["year"] <= 2025)]

# Create animated choropleth
fig = px.choropleth(
    filtered_data,
    locations = "country",
    locationmode = "country names",
    color = "visa_free_count",
    animation_frame = "year",
    color_continuous_scale = [
        "#fff5f0", "#fcbba1", "#fc9272", "#fb6a4a",
        "#ef3b2c", "#cb181d", "#a50f15", "#67000d"
    ],  # vivid red scale for clearer contrast
    range_color = (filtered_data["visa_free_count"].min(), filtered_data["visa_free_count"].max()),
    projection = "natural earth",
    title = "Global Visa-Free Access Evolution (2015–2025)",
    hover_name = "country",
    hover_data = {"visa_free_count": True, "year": True}
)

# Layout customization
fig.update_layout(
    geo = dict(
        resolution = 50,
        showframe = False,
        showcoastlines = True,
        coastlinecolor = "gray",
        landcolor = "rgb(240,240,240)",
        projection_type = "natural earth"
    ),
    width = 1000,
    height = 500,
    margin = dict(l = 0, r = 0, t = 60, b = 0),
    coloraxis_colorbar = dict(
        title = dict(
            text = "Visa-Free<br>Destinations",
            font = dict(size = 14, family = "Arial Black")
        ),
        tickfont = dict(size = 12)
    ),
    title = dict(
        font = dict(size = 26, family = "Arial Black"),
        x = 0.5
    )
)

# Smooth animation transitions
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800   # ms per frame
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 500
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["easing"] = "cubic-in-out"

fig.show()

  fig = px.choropleth(


In [12]:
# Filter to 2015–2025 only
rank_filtered = rank_by_year[(rank_by_year["year"] >= 2015) & (rank_by_year["year"] <= 2025)]
years = sorted(rank_filtered["year"].unique())
regions = rank_filtered["region"].unique()

# Assign consistent colors per region
colors = px.colors.qualitative.Plotly[:len(regions)]
region_colors = dict(zip(regions, colors))

# Initialize figure
fig = go.Figure()

# Box plots per region per year
for year in years:
    df_year = rank_filtered[rank_filtered["year"] == year]
    for region in regions:
        df_region = df_year[df_year["region"] == region]
        fig.add_trace(go.Box(
            y = df_region["visa_free_count"],
            name = region,
            marker = dict(color = region_colors[region]),
            line = dict(color = region_colors[region], width = 2),
            boxmean = "sd",
            fillcolor = "rgba(0,0,0,0)",
            customdata = df_region["country"],
            hovertemplate = "<b>Region:</b> "+region+
                            "<br><b>Country:</b> %{customdata}"+
                            "<br><b>Visa-Free:</b> %{y}<extra></extra>",
            visible = True if year == years[0] else False
        ))

# Bar plots per region per year (initially hidden)
for year in years:
    df_year = rank_filtered[rank_filtered["year"] == year]
    for region in regions:
        df_region = df_year[df_year["region"] == region].sort_values("visa_free_count", ascending = False)
        fig.add_trace(go.Bar(
            x = df_region["country"],
            y = df_region["visa_free_count"],
            marker_color = region_colors[region],
            name = f"{region} - {year}",
            hovertemplate = "Country: %{x}<br>Visa-Free: %{y}<extra></extra>",
            visible=False  # initially hidden
        ))

# Slider steps
steps = []
box_traces_per_year = len(regions)
bar_traces_per_year = len(regions)

for i, year in enumerate(years):
    visibility = [False] * len(fig.data)
    start = i * box_traces_per_year
    end = start + box_traces_per_year
    for j in range(start, end):
        visibility[j] = True
    steps.append(dict(
        method = "update",
        label = str(year),
        args = [{"visible": visibility},
                {"title": f"Visa-Free Access Distribution by Region — {year}",
                 "yaxis": {"title": "Visa-Free Destinations", "range": [0, 210],
                           "showgrid": True, "gridcolor": "lightgrey", "dtick": 10},
                 "xaxis": {"tickangle": 0, "showgrid": True, "gridcolor": "lightgrey"}}]
    ))

sliders = [dict(
    active = 0,
    currentvalue = {"prefix": "Year: "},
    pad = {"t": 70},
    steps = steps
)]

# Buttons to switch between views
buttons = []

# Box plot button
buttons.append(dict(
    label = "📦 Box Plot",
    method = "update",
    args = [{"visible": [True] * box_traces_per_year + [False] * (len(fig.data) - box_traces_per_year)},
            {"title": f"Visa-Free Access Distribution by Region — {years[0]}",
             "yaxis": {"title": "Visa-Free Destinations", "range": [0, 210],
                       "showgrid": True, "gridcolor": "lightgrey", "dtick": 10},
             "xaxis": {"tickangle": 0, "showgrid": True, "gridcolor": "lightgrey"}}]
))

# Region bar plot buttons (show 2025 data by default)
year_2025_index = years.index(2025)  # find index of 2025
for r_idx, region in enumerate(regions):
    visibility = [False] * len(fig.data)
    # Calculate bar trace index for 2025
    bar_start = len(years) * box_traces_per_year + year_2025_index * bar_traces_per_year + r_idx
    if bar_start < len(fig.data):
        visibility[bar_start] = True
    buttons.append(dict(
        label = f"🏳️ {region}",
        method = "update",
        args = [{"visible": visibility},
                {"title": f"Visa-Free Access by Country — {region} (2025)",
                 "yaxis": {"title": "Visa-Free Destinations", "range": [0, 210],
                           "showgrid": True, "gridcolor": "lightgrey", "dtick": 10},
                 "xaxis": {"tickangle": -45, "showgrid": True, "gridcolor": "lightgrey"}}]
    ))

# Final layout
fig.update_layout(
    sliders = sliders,
    updatemenus = [dict(
        type = "dropdown",
        showactive = True,
        buttons = buttons,
        x = 1.02,
        xanchor = "left",
        y = 1.15,
        yanchor = "top"
    )],
    width = 1600,
    height = 900,
    template = "plotly_white",
    plot_bgcolor = "rgba(245,245,245,1)",
    paper_bgcolor = 'white',
    margin = dict(l = 80, r = 40, t = 100, b = 180),
    xaxis = dict(
        showgrid = True,
        gridcolor = "lightgrey",
        tickangle = 0,
        tickfont = dict(size = 10)
    ),
    yaxis = dict(
        showgrid = True,
        gridcolor = "lightgrey",
        tick0 = 0,
        dtick = 10,
        range = [0, 210]
    )
)

fig.show()

In [13]:
# Create new dataframe for Welcoming Score
visa_dest = pd.DataFrame({"country": country_lists["country"], "visa_free_destination": 0})

# Compute Welcoming Score
for visa_req, country in zip(flat_df["visa_type"], flat_df["to_name"]):
    if visa_req == "visa_free_access" or visa_req == "electronic_travel_authorisation":
        visa_dest.loc[visa_dest["country"] == country, "visa_free_destination"] += 1

# Merge with strength of passport
visa_dest2 = visa_dest.merge(rank_by_year, on = "country")
visa_dest2 = visa_dest2[["country", "region", "visa_free_destination", "visa_free_count", "year"]]
visa_dest2 = visa_dest2[visa_dest2["year"] == 2025]

# Plot
fig = px.scatter(visa_dest2,
           x = "visa_free_destination",
           y = "visa_free_count",
           labels = {"visa_free_destination" : "Welcoming Score", "visa_free_count" : "Strength of Passport"},
           hover_name = "country",
           color = "region",
           facet_col = "region",
           title = "Strength of Passport vs Welcoming Score (2025)")

fig.update_layout(width=1600, height=900)

**4. Overall synthesis**  
From 2015–2025, global mobility improved, but **inequality in movement remained structural**. A handful of states increased access quickly (UAE, China, Ukraine), the top tier consolidated (Singapore, Japan, leading EU states), and some established powers stagnated (U.S.). Our three views jointly show:  
1) Where the map actually darkened
2) Why outbound ≠ inbound
3) How distributions within regions evolved and why the gap hasn’t closed.

## 5. Teamwork

We divided the responsibilities as follows:
- Lu Yu cleaned the data and plotted the box plot.  
- Ryan plotted the choropleth map and Scatter plot
- Darwin did the write ups for the introduction, data cleaning, and choropleth map
- Ilyas did the write up for box and scatter plots. 
- All of us collaborated on analysis, interpretation, and proof reading.

## 6. References

Dong, X. (2018). 2018 China sees most powerful passport in years. Cgtn.com. https://news.cgtn.com/news/3d3d774e324d444e31457a6333566d54/index.html

Henley & Partners. (2018, January 9). Germany secures the top spot in the 2018 Henley Passport Index. Henley & Partners. https://www.henleyglobal.com/newsroom/press-releases/germany-secures-the-top-spot-in-the-2018-henley-passport-index

Henley & Partners. (2025, July 22). Asian Nations Dominate Passport Power Ranking as US and UK Continue to Decline. Henley & Partners. https://www.henleyglobal.com/newsroom/press-releases/henley-global-mobility-report-july-2025

Nair, D. (2025, October 16). UAE passport beats US and Canada to rank eighth worldwide. The National. https://www.thenationalnews.com/business/money/2025/10/16/uae-passport-strength-us/

Nesheim, C. H. (2018, January 9). Release: Germany Takes Top Spot in 2018 Henley & Partners Passport Index: UK,China, Russia Improving - IMI Daily. IMI Daily. https://www.imidaily.com/press-release/release-germany-takes-top-spot-2018-henley-partners-passport-index-ukchina-russia-improving/

Visas: Council adopts regulation on visa liberalisation for Ukrainian citizens. (2017). Europa.eu; European Council. https://www.consilium.europa.eu/en/press/press-releases/2017/05/11/visa-liberalisation-ukraine/

