---
## 04_Scoring_Composite
---

In [1]:
# Imports

import os
import numpy as np
import pandas as pd

# Visualization (optional, for score distributions & checks)
import matplotlib.pyplot as plt
import seaborn as sns

# Scaling / normalization
from sklearn.preprocessing import MinMaxScaler

# Display settings
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)

In [2]:
# Directory setup & check

base_dir      = "../data"
norm_dir      = os.path.join(base_dir, "normalized")
score_dir     = os.path.join(base_dir, "scored")
composite_dir = os.path.join(base_dir, "composite")

# Quick check: list normalized files (inputs for scoring)
print("Normalized datasets available:")
for f in os.listdir(norm_dir):
    print(" -", f)

Normalized datasets available:
 - electricity_normalized.csv
 - gdp_ppp_normalized.csv
 - gov_effect_normalized.csv
 - internet_normalized.csv
 - literacy_normalized.csv
 - mobile_normalized.csv
 - researchers_normalized.csv
 - rnd_gdp_normalized.csv
 - tertiary_normalized.csv


In [5]:
# Load scoring module (with path fix)

from src.scoring import compute_scores, weights

print("Scoring module loaded successfully.")
print("Default weights being used:")
print(weights)

Scoring module loaded successfully.
Default weights being used:
{'literacy': 0.2, 'tertiary': 0.2, 'electricity': 0.15, 'internet': 0.1, 'mobile': 0.1, 'gov_effect': 0.25}


In [6]:
# Run scoring (equal + weighted)

# Collect normalized DataFrames
norm_dfs = []
for fname in os.listdir(norm_dir):
    if fname.endswith("_normalized.csv"):
        df = pd.read_csv(os.path.join(norm_dir, fname))
        norm_dfs.append(df)

print(f"Loaded {len(norm_dfs)} normalized datasets for scoring.")

# Equal-weight scores
equal_scores = compute_scores(norm_dfs, weights=None)
equal_out = os.path.join(score_dir, "equal_scores.csv")
equal_scores.to_csv(equal_out, index=False)
print(f"✔ Equal-weight scores saved to {equal_out}")

# Weighted scores
weighted_scores = compute_scores(norm_dfs, weights=weights)
weighted_out = os.path.join(score_dir, "weighted_scores.csv")
weighted_scores.to_csv(weighted_out, index=False)
print(f"✔ Weighted scores saved to {weighted_out}")

Loaded 9 normalized datasets for scoring.
✔ Equal-weight scores saved to ../data\scored\equal_scores.csv
✔ Weighted scores saved to ../data\scored\weighted_scores.csv


In [7]:
# Preview scored outputs

print("Equal-weight scores (sample):")
display(equal_scores.head())

print("\nWeighted scores (sample):")
display(weighted_scores.head())

Equal-weight scores (sample):


Indicator,Country Name,Country Code,electricity,gdp_ppp,gov_effect,internet,literacy,mobile,researchers,rnd_gdp,tertiary,AI_Readiness_Score
0,Afghanistan,AFG,58.042657,1.082799,19.060242,7.238882,25.750502,12.950465,,,3.161234,18.183826
1,Africa Eastern and Southern,AFE,32.181506,1.778221,,,61.084013,15.146055,,10.208343,3.540216,20.656392
2,Africa Western and Central,AFW,42.039038,2.000922,,,45.943121,19.180076,,2.439173,4.077507,19.279973
3,Albania,ALB,99.813711,5.550425,44.327369,32.59368,97.437496,28.405637,0.74496,1.919003,21.185499,36.88642
4,Algeria,DZA,99.128681,7.367193,38.476788,21.692335,71.743774,27.251417,2.444925,3.860621,17.613081,32.175424



Weighted scores (sample):


Indicator,Country Name,Country Code,electricity,gdp_ppp,gov_effect,internet,literacy,mobile,researchers,rnd_gdp,tertiary,AI_Readiness_Score
0,Afghanistan,AFG,58.042657,1.082799,19.060242,7.238882,25.750502,12.950465,,,3.161234,21.272741
1,Africa Eastern and Southern,AFE,32.181506,1.778221,,,61.084013,15.146055,,10.208343,3.540216,
2,Africa Western and Central,AFW,42.039038,2.000922,,,45.943121,19.180076,,2.439173,4.077507,
3,Albania,ALB,99.813711,5.550425,44.327369,32.59368,97.437496,28.405637,0.74496,1.919003,21.185499,55.87843
4,Algeria,DZA,99.128681,7.367193,38.476788,21.692335,71.743774,27.251417,2.444925,3.860621,17.613081,47.254245


In [8]:
# Mark baseline scoring complete

print("✔ Baseline scoring complete.")
print("Equal-weight and weighted baseline scores are saved in '../data/scored/'.")
print("These represent the unanalyzed baseline AI readiness scores.")
print("Next step → Composite Index: refining scores, adding ranking, and preparing for visualization & clustering.")

✔ Baseline scoring complete.
Equal-weight and weighted baseline scores are saved in '../data/scored/'.
These represent the unanalyzed baseline AI readiness scores.
Next step → Composite Index: refining scores, adding ranking, and preparing for visualization & clustering.


---
## Composite Section
---

In [9]:
# Setup for composite step

# Composite directory 
print("Composite directory:", composite_dir)

# Bring in scored datasets
equal_scores = pd.read_csv(os.path.join(score_dir, "equal_scores.csv"))
weighted_scores = pd.read_csv(os.path.join(score_dir, "weighted_scores.csv"))

print("✔ Scored datasets loaded for composite analysis.")
print("Equal scores shape:", equal_scores.shape)
print("Weighted scores shape:", weighted_scores.shape)

Composite directory: ../data\composite
✔ Scored datasets loaded for composite analysis.
Equal scores shape: (275, 12)
Weighted scores shape: (275, 12)


In [11]:
# Add ranking and save composite baselines (handling NaN properly)

# Rank equal scores (ignore NaNs)
equal_scores["Rank"] = (
    equal_scores["AI_Readiness_Score"]
    .rank(method="dense", ascending=False, na_option="keep")
)

# Rank weighted scores (ignore NaNs)
weighted_scores["Rank"] = (
    weighted_scores["AI_Readiness_Score"]
    .rank(method="dense", ascending=False, na_option="keep")
)

# Sort only by rank where available
equal_scores = equal_scores.sort_values("Rank", na_position="last")
weighted_scores = weighted_scores.sort_values("Rank", na_position="last")

# Save to composite folder
equal_out = os.path.join(composite_dir, "equal_composite.csv")
weighted_out = os.path.join(composite_dir, "weighted_composite.csv")

equal_scores.to_csv(equal_out, index=False)
weighted_scores.to_csv(weighted_out, index=False)

print("✔ Ranked composite baselines saved:")
print(" -", equal_out)
print(" -", weighted_out)

✔ Ranked composite baselines saved:
 - ../data\composite\equal_composite.csv
 - ../data\composite\weighted_composite.csv


In [12]:
# Preview top 10 ranked countries

print("Top 10 (Equal-weight composite):")
display(equal_scores[["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]].head(10))

print("\nTop 10 (Weighted composite):")
display(weighted_scores[["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]].head(10))

Top 10 (Equal-weight composite):


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
45,Channel Islands,CHI,100.0,1.0
119,Isle of Man,IMN,100.0,1.0
235,St. Martin (French part),MAF,99.937904,2.0
124,"Jersey, Channel Islands",JEY,75.370652,3.0
8,Anguilla,AIA,74.540926,4.0
145,Liechtenstein,LIE,73.835153,5.0
245,"Taiwan, China",TWN,73.023626,6.0
86,French Guiana,GUF,70.953707,7.0
208,Reunion,REU,69.715357,8.0
159,Martinique,MTQ,66.50519,9.0



Top 10 (Weighted composite):


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
220,Singapore,SGP,78.647585,1.0
130,"Korea, Rep.",KOR,71.737309,2.0
151,"Macao SAR, China",MAC,70.936909,3.0
213,San Marino,SMR,70.403869,4.0
260,United Arab Emirates,ARE,70.342009,5.0
231,Spain,ESP,69.445378,6.0
73,Estonia,EST,69.313656,7.0
223,Slovenia,SVN,68.835449,8.0
146,Lithuania,LTU,67.950449,9.0
41,Cayman Islands,CYM,67.822958,10.0


In [13]:
# Preview bottom 10 ranked countries

print("Bottom 10 (Equal-weight composite):")
display(equal_scores[["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]].tail(10))

print("\nBottom 10 (Weighted composite):")
display(weighted_scores[["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]].tail(10))

Bottom 10 (Equal-weight composite):


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
50,"Congo, Dem. Rep.",COD,13.445432,263.0
5,American Samoa,ASM,13.318485,264.0
156,Mali,MLI,12.867081,265.0
75,Ethiopia,ETH,12.683341,266.0
35,Burundi,BDI,11.08969,267.0
34,Burkina Faso,BFA,10.4417,268.0
42,Central African Republic,CAF,10.16673,269.0
183,Niger,NER,10.078455,270.0
230,South Sudan,SSD,8.588627,271.0
44,Chad,TCD,7.216514,272.0



Bottom 10 (Weighted composite):


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
242,Sweden,SWE,,
243,Switzerland,CHE,,
245,"Taiwan, China",TWN,,
256,Turks and Caicos Islands,TCA,,
257,Tuvalu,TUV,,
261,United Kingdom,GBR,,
262,United States,USA,,
263,Upper middle income,UMC,,
269,Virgin Islands (U.S.),VIR,,
271,World,WLD,,


In [15]:
# Split ranked vs unranked countries

# Equal-weight split
equal_ranked = equal_scores[equal_scores["Rank"].notna()][
    ["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]
]
equal_unranked = equal_scores[equal_scores["Rank"].isna()][
    ["Country Name", "Country Code", "AI_Readiness_Score"]
]

# Weighted split
weighted_ranked = weighted_scores[weighted_scores["Rank"].notna()][
    ["Country Name", "Country Code", "AI_Readiness_Score", "Rank"]
]
weighted_unranked = weighted_scores[weighted_scores["Rank"].isna()][
    ["Country Name", "Country Code", "AI_Readiness_Score"]
]

print(f"Equal-weight: {len(equal_ranked)} ranked, {len(equal_unranked)} unranked")
display(equal_ranked.head(5))
display(equal_unranked.head(5))

print(f"\nWeighted: {len(weighted_ranked)} ranked, {len(weighted_unranked)} unranked")
display(weighted_ranked.head(5))
display(weighted_unranked.head(5))

Equal-weight: 275 ranked, 0 unranked


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
45,Channel Islands,CHI,100.0,1.0
119,Isle of Man,IMN,100.0,1.0
235,St. Martin (French part),MAF,99.937904,2.0
124,"Jersey, Channel Islands",JEY,75.370652,3.0
8,Anguilla,AIA,74.540926,4.0


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score



Weighted: 155 ranked, 120 unranked


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score,Rank
220,Singapore,SGP,78.647585,1.0
130,"Korea, Rep.",KOR,71.737309,2.0
151,"Macao SAR, China",MAC,70.936909,3.0
213,San Marino,SMR,70.403869,4.0
260,United Arab Emirates,ARE,70.342009,5.0


Unnamed: 0,Country Name,Country Code,AI_Readiness_Score
1,Africa Eastern and Southern,AFE,
2,Africa Western and Central,AFW,
5,American Samoa,ASM,
6,Andorra,AND,
8,Anguilla,AIA,


---
## Summary
---

#Scoring & Composite – Summary


### What We Did
- Combined normalized indicators into baseline scores (equal-weight & weighted).  
- Saved baseline outputs in `../data/scored/`.  
- Added ranking to both equal and weighted scores.  
- Split countries into ranked vs. unranked for transparency.  
- Saved final composites in `../data/composite/`.  

### Key Discoveries
- Equal-weight: All 275 entities received a score (inclusive baseline).  
- Weighted: 155 entities ranked, 120 excluded due to missing data.  
- Equal-weight bottom 10 includes fragile states (Chad, South Sudan, CAR, etc.).  
- Weighted scores excluded high-income countries (USA, UK, Sweden, etc.) because of missing literacy/tertiary or governance data.  
- Small, infrastructure-rich territories float to the top under equal-weight scoring.  

### Next Steps
- Move into Notebook 05: Visualization & Clustering.  
- Update `README.md` with:
  - Folder purposes (`raw`, `clean`, `normalized`, `scored`, `composite`).  
  - Note on ranked vs. unranked split in composites.  