In [3]:
# import library
import pandas as pd
import folium
from folium.plugins import Fullscreen
import json

# Load dataset
df = pd.read_csv('/content/sample_data/final_merged_times_cleaned.csv')
df['country_y'] = df['country_y'].str.strip()

# Aggregate metrics by country
agg_df = df.groupby('country_y').agg({
    'total_score': 'mean',
    'teaching': 'mean',
    'research': 'mean',
    'citations': 'mean',
    'education_expenditure_gdp': 'mean',
    'tertiary_education_percent': 'mean'
}).reset_index()

agg_df.columns = [
    'name', 'avg_total_score', 'avg_teaching', 'avg_research',
    'avg_citations', 'avg_expenditure_gdp', 'avg_tertiary_percent'
]

# Load GeoJSON
with open('/content/sample_data/countries.geo.json') as f:
    geo_data = json.load(f)

# Added aggregated data and flag to GeoJSON features
data_dict = agg_df.set_index('name').to_dict(orient='index')
for feature in geo_data['features']:
    country = feature['properties'].get('name')
    values = data_dict.get(country)
    if values:
        for k, v in values.items():
            feature['properties'][k] = round(v, 2) if isinstance(v, float) else v
        iso_code = country[:2].lower()
        feature['properties']['flag'] = f"https://flagcdn.com/48x36/{iso_code}.png"
    else:
        for col in agg_df.columns[1:]:
            feature['properties'][col] = None
        feature['properties']['flag'] = ""

# Created folium map
m = folium.Map(
    location=[20, 0],
    zoom_start=2,
    tiles='cartodb positron',
    no_wrap=True,
    max_bounds=True
)

# Added choropleth
folium.Choropleth(
    geo_data=geo_data,
    data=agg_df,
    columns=['name', 'avg_citations'],
    key_on='feature.properties.name',
    fill_color='YlGnBu',
    fill_opacity=0.7,
    line_opacity=0,
    nan_fill_color='lightgray',
    legend_name='Average Citations'
).add_to(m)

# Added tooltip + popup with flags and all values
folium.GeoJson(
    geo_data,
    name='Details',
    tooltip=folium.GeoJsonTooltip(
        fields=[
            'name', 'avg_total_score', 'avg_teaching', 'avg_research',
            'avg_citations', 'avg_expenditure_gdp', 'avg_tertiary_percent'
        ],
        aliases=[
            'Country', 'Total Score', 'Teaching', 'Research',
            'Citations', 'Edu Spend (%GDP)', 'Tertiary Edu (%)'
        ],
        localize=True,
        sticky=True,
        labels=True
    ),
    popup=folium.GeoJsonPopup(
        fields=[
            'name', 'avg_total_score', 'avg_teaching', 'avg_research',
            'avg_citations', 'avg_expenditure_gdp', 'avg_tertiary_percent'
        ],
        aliases=[
            'Country', 'Total Score', 'Teaching', 'Research',
            'Citations', 'Edu Spend (%GDP)', 'Tertiary Edu (%)'
        ],
        localize=True,
        labels=True,
        style="font-size: 13px;"
    ),
    style_function=lambda feature: {
        'fillOpacity': 0,
        'color': 'transparent',
        'weight': 0
    }
).add_to(m)

# Enable fullscreen toggle
Fullscreen().add_to(m)

# Save and display
m.save('/content/university_map_final_enhanced.html')
m

from google.colab import files
files.download("/content/university_map_final_enhanced.html")



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Geospatial Analysis Discussion

### Overview
In this geospatial analysis, we visualized global university rankings and educational indicators using a **folium choropleth map** enriched with country-level metadata. We used a combination of university ranking scores and education-related spending/attainment data aggregated per country.

### What Does the Map Show?

- **Color intensity** represents the **average citations score** for each country.
- **Popups and tooltips** include:
  - Average total score
  - Teaching and research scores
  - Educational expenditure (% of GDP)
  - Tertiary education attainment percentage
  - Country flag for easy recognition

Countries with missing data are filled in **light gray**, clearly separating them from data-rich regions.

### Research Questions Addressed

**Q1: Are better-ranked universities concentrated in countries with higher education spending or attainment?**

- The choropleth shows a pattern: countries with **higher education expenditure and tertiary education levels** (e.g., USA, UK, Germany) often correlate with **higher citation scores**.
- This supports the hypothesis that **stronger education infrastructure** leads to better university performance in global rankings.

### New Questions Raised

- Why are many African and smaller countries missing from the dataset? Can this be improved by enriching the source data?
- Could adding **year-wise filters** reveal changes in ranking trends over time?
- Would it be valuable to separate **public vs private institutions** in the analysis?

### Limitations

- Some countries are missing due to **mismatched country names** or **lack of data**.
- Flags were fetched using a 2-letter approximation and may not be perfect for every country.
- The choropleth focuses on only one metric at a time (citations), although more metrics were shown in popups.

### Summary

This choropleth map effectively combines **quantitative university metrics** with **geographic insights**, making it easier to spot global disparities in higher education. It not only supports the original research goals but also opens up avenues for deeper temporal and categorical analysis.

