# Global EV Transition Dashboard

This notebook provides a multi-layered exploration of the global shift toward electric mobility.  
Using three coordinated visual analytics tools, you can examine:

- **Where** EV adoption and charging infrastructure are strongest  
- **How** countries compare on transition pressure and infrastructure adequacy  
- **Why** specific patterns exist, revealed through clustering, correlations, and trend models  

The goal is to allow you to move seamlessly from **high-level geography**,  
to **current-year diagnostics**,  
to **deep structural explanations**.

Use the interactive widgets to explore countries, compare patterns, and reveal the dynamics behind the global EV transition.


In [10]:
# ---- Minimal EV scatter prep (Region×Mode; exact column names) ----
import pandas as pd
import re
from Isea.scatter import ScatterBrush

df = pd.read_csv("Global_EV_clean.csv")

# Helper: pivot (region,mode,year) → (region,mode) with <name>__FYYYY columns
def pivot_param(data, parameter, powertrain=None, name="Val"):
    q = data[data["parameter"] == parameter]
    if powertrain is not None:
        q = q[q["powertrain"] == powertrain]
    q = q[q["mode"] != "EV"]                                 # drop generic mode
    if q.empty:
        return pd.DataFrame(columns=["region","mode"])
    g = q.groupby(["region","mode","year"], as_index=False)["value"].sum()
    p = g.pivot(index=["region","mode"], columns="year", values="value").sort_index(axis=1)
    p.columns = [f"{name}__F{int(y)}" for y in p.columns]
    return p.reset_index()

# ChargingStations: sum fast+slow per region-year, replicate to all non-EV modes, then pivot
cp = df[(df["parameter"] == "EV charging points") & (df["powertrain"].isin(["fast","slow"]))]
rgy = (cp.groupby(["region","year"], as_index=False)["value"].sum()
         .rename(columns={"value":"ChargingStations"}))
rgy["year"] = rgy["year"].astype(int)
base_modes = df.loc[df["mode"] != "EV", ["region","mode"]].drop_duplicates()
rep = base_modes.merge(rgy, on="region", how="left")
cs = (rep.pivot_table(index=["region","mode"], columns="year", values="ChargingStations", aggfunc="first")
         .sort_index(axis=1)
         .reset_index())
cs.columns = [f"ChargingStations__F{int(c)}" if c not in ("region","mode") else c for c in cs.columns]

# Metrics for X/Y
cfg = {
    "StockBEV":   ("EV stock",       "BEV"),
    "StockFCEV":  ("EV stock",       "FCEV"),
    "StockPHEV":  ("EV stock",       "PHEV"),
    "SalesBEV":   ("EV sales",       "BEV"),
    "SalesFCEV":  ("EV sales",       "FCEV"),
    "SalesPHEV":  ("EV sales",       "PHEV"),
    "SalesShare": ("EV sales share", "EV"),
    "StockShare": ("EV stock share", "EV"),
}
blocks = [pivot_param(df, p, pt, name=k) for k,(p,pt) in cfg.items()] + [cs]

# Merge on (region, mode)
wide = base_modes.copy()
for b in blocks:
    if not b.empty:
        wide = wide.merge(b, on=["region","mode"], how="outer")

# Year range + initialize bare columns for latest year (widget expects this)
yrs = sorted({int(m.group(1)) for c in wide.columns for m in [re.search(r"__F(\d{4})$", str(c))] if m})
yearMin, yearMax = (min(yrs), max(yrs)) if yrs else (None, None)

xyVars = list(cfg.keys()) + ["ChargingStations"]
if yearMax is not None:
    for v in xyVars:
        col = f"{v}__F{yearMax}"
        wide[v] = pd.to_numeric(wide[col], errors="coerce").fillna(0.0) if col in wide.columns else 0.0

# Minimal id/label; color by mode
wide["id"] = wide["region"] + "|" + wide["mode"]
wide["label"] = wide["region"] + " • " + wide["mode"]

wide

Unnamed: 0,region,mode,StockBEV__F2010,StockBEV__F2011,StockBEV__F2012,StockBEV__F2013,StockBEV__F2014,StockBEV__F2015,StockBEV__F2016,StockBEV__F2017,...,StockFCEV,StockPHEV,SalesBEV,SalesFCEV,SalesPHEV,SalesShare,StockShare,ChargingStations,id,label
0,Australia,Cars,,49.0,220.0,410.0,780.0,1500.0,2200.0,3400.0,...,65.0,31000.0,87000.0,6.0,11000.0,12.0,1.20,2760.0,Australia|Cars,Australia • Cars
1,Austria,Cars,350.0,990.0,1400.0,2100.0,3400.0,5000.0,9100.0,15000.0,...,89.0,61000.0,48000.0,9.0,17000.0,26.0,4.40,17500.0,Austria|Cars,Austria • Cars
2,Belgium,Buses,3.0,3.0,3.0,3.0,7.0,7.0,7.0,10.0,...,0.0,220.0,340.0,0.0,110.0,62.0,6.00,44000.0,Belgium|Buses,Belgium • Buses
3,Belgium,Cars,61.0,320.0,830.0,1200.0,2200.0,3300.0,5200.0,7500.0,...,99.0,280000.0,93000.0,9.0,100000.0,41.0,8.20,44000.0,Belgium|Cars,Belgium • Cars
4,Belgium,Trucks,2.0,6.0,7.0,7.0,11.0,12.0,13.0,14.0,...,0.0,8.0,160.0,0.0,8.0,1.7,0.17,44000.0,Belgium|Trucks,Belgium • Trucks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107,United Arab Emirates,Cars,,,,,,,,,...,0.0,0.0,23000.0,0.0,5900.0,13.0,0.00,0.0,United Arab Emirates|Cars,United Arab Emirates • Cars
108,United Kingdom,Buses,80.0,88.0,100.0,120.0,170.0,200.0,260.0,300.0,...,180.0,220.0,1300.0,43.0,0.0,17.0,4.50,53000.0,United Kingdom|Buses,United Kingdom • Buses
109,United Kingdom,Cars,1500.0,2600.0,4100.0,6200.0,12000.0,21000.0,30000.0,42000.0,...,260.0,600000.0,310000.0,25.0,140000.0,24.0,5.00,53000.0,United Kingdom|Cars,United Kingdom • Cars
110,United Kingdom,Trucks,940.0,860.0,700.0,570.0,500.0,410.0,400.0,350.0,...,0.0,0.0,1400.0,0.0,0.0,2.7,0.31,53000.0,United Kingdom|Trucks,United Kingdom • Trucks


In [11]:
import pandas as pd
import numpy as np
import re

# === Carga
ev2 = pd.read_csv("Global_EV_clean.csv")
ev2["year"] = ev2["year"].astype(int)

# --- helper: pivotea (region, mode, year) a columnas <name>__FYYYY
def pivot_param2(data2, parameter2, powertrain2=None, name2="Val"):
    q2 = data2[data2["parameter"] == parameter2]
    if powertrain2 is not None:
        q2 = q2[q2["powertrain"] == powertrain2]
    q2 = q2[q2["mode"] != "EV"]  # quitamos el genérico
    if q2.empty:
        return pd.DataFrame(columns=["Country","mode"])
    g2 = q2.groupby(["region","mode","year"], as_index=False)["value"].sum()
    p2 = g2.pivot(index=["region","mode"], columns="year", values="value").sort_index(axis=1)
    p2.columns = [f"{name2}__F{int(y)}" for y in p2.columns]
    p2 = p2.reset_index().rename(columns={"region":"Country"})
    return p2

# --- ChargingStations: fast+slow por región-año replicado a cada mode != EV
cp2 = ev2[(ev2["parameter"] == "EV charging points") & (ev2["powertrain"].isin(["fast","slow"]))]

rgy2 = (cp2.groupby(["region","year"], as_index=False)["value"].sum()
          .rename(columns={"region":"Country","value":"ChargingStations"}))

base_modes2 = ev2.loc[ev2["mode"] != "EV", ["region","mode"]].drop_duplicates().rename(columns={"region":"Country"})

rep2 = base_modes2.merge(rgy2, on=["Country"], how="left")

cs2 = (rep2.pivot_table(index=["Country","mode"], columns="year", values="ChargingStations", aggfunc="first")
          .sort_index(axis=1).reset_index())
cs2.columns = [f"ChargingStations__F{int(c)}" if c not in ("Country","mode") else c for c in cs2.columns]

# --- Bloques métrica×tren
cfg2 = {
    "StockBEV":   ("EV stock",       "BEV"),
    "StockPHEV":  ("EV stock",       "PHEV"),
    "StockFCEV":  ("EV stock",       "FCEV"),
    "SalesBEV":   ("EV sales",       "BEV"),
    "SalesPHEV":  ("EV sales",       "PHEV"),
    "SalesFCEV":  ("EV sales",       "FCEV"),
    "StockShare": ("EV stock share", "EV"),
    # "SalesShare": ("EV sales share", "EV"),
}
blocks2 = [pivot_param2(ev2, p, pt, name2=k) for k,(p,pt) in cfg2.items()] + [cs2]

# --- Merge a ancho por país (una fila por Country×mode)
wide2 = base_modes2.copy()
for b2 in blocks2:
    if not b2.empty:
        wide2 = wide2.merge(b2, on=["Country","mode"], how="outer")

# --- Para energy_quad trabajaremos a nivel país: agregamos sobre modes
value_cols2 = [c for c in wide2.columns if re.search(r"__F\d{4}$", str(c))]
wide_ev2 = (wide2.groupby("Country", as_index=False)[value_cols2].sum(min_count=1))

# --- Años detectados
YEARS_ALL2 = sorted({int(m.group(1)) for c in value_cols2 for m in [re.search(r"__F(\d{4})$", c)] if m})
print(f"[OK2] wide_ev2 listo: {len(wide_ev2)} países | años {min(YEARS_ALL2)}–{max(YEARS_ALL2)} | cols={len(wide_ev2.columns)}")


[OK2] wide_ev2 listo: 49 países | años 2010–2023 | cols=113


## 1. Global Adoption & Infrastructure: WorldMapLineChart

The world map provides a geographic overview of electric vehicle adoption and charging infrastructure.  
You can switch between several metrics, such as:

- **EV Stock Share** — proportion of the national fleet that is electric  
- **EV Sales Share** — proportion of new vehicle sales that are electric  
- **Charging infrastructure quantity**  

Clicking a country displays its **entire historical trajectory** in the time-series line chart.

This view helps identify:

- Early adoption clusters (Nordics, China, etc.)
- Fast risers in recent years
- Regions where charging infrastructure is lagging adoption
- Outliers that deviate from their neighbors

Use this as the **global entry point** before zooming in on specific patterns.

**Insights:**
- Looking at e.g. stockshare, there are always some clear leaders, but these shift over time (EU, US --> China)
- Which stockshare car type is the highest in a country differs globally, but not so much over time. E.g. In the Netherlands the buses have shown the most rapid growth, showing the effects of an evident state and regional company policies. 
- Since entering the dataset in 2014, China instantly dominated in terms of absolute installed charging stations.


In [12]:
import ipywidgets as widgets
from Isea.worldmaplinechart import WorldMapLineChart

# All metrics the user can choose from
metrics = [
    "StockBEV","StockFCEV","StockPHEV",
    "SalesBEV","SalesFCEV","SalesPHEV",
    "SalesShare","StockShare",
    "ChargingStations"
]

metric_dropdown = widgets.Dropdown(
    options=metrics,
    value="StockShare",
    description="Metric:",
    layout=widgets.Layout(width="300px")
)

# Create the visualization widget
w_world = WorldMapLineChart(
    df=wide,
    metric=metric_dropdown.value,
    region_col="region",
    label_col="label",
    id_col="id",
    title="World EV Map",
    subtitle="Hover, click, compare countries over time."
)

# Callback: when dropdown changes → update widget
def on_metric_change(change):
    new_metric = change["new"]
    w_world.set_metric(new_metric)

metric_dropdown.observe(on_metric_change, names="value")

widgets.VBox([metric_dropdown, w_world])


VBox(children=(Dropdown(description='Metric:', index=7, layout=Layout(width='300px'), options=('StockBEV', 'St…

## 2. Transition Dynamics: ScatterBrush Diagnostic View

This scatterplot reveals the *current behavior* of countries in the EV transition.
The scatter includes:
- Zooming to identify a specific subgroup based on characteristics in a specific year
- Selection to follow these countries / car types
- Hiding specific car types by selecting the legend
- True geometric diagonal (Recommended)
- Optional 0–100 scaling (Recommended)
- Exploring year to year behaviour with axis lock (Recommended)
- **Using the py <-> js communication**, select a specific subset of points for a focused analysis in a new graph.

Two analytic modes are highlighted:

### **A. Transition Pressure Mode**
- **X-axis:** EV Stock Share (%)
- **Y-axis:** EV Sales Share (%)

Interpretation:
- **Above diagonal (Y > X):** Sales are more electric than the fleet → electric fleet growing faster than conventional.
- **Below diagonal (Y < X):** Sales are less electric → electric fleet growing slower than conventional.

**Analytical questions, methods and answers:**
- **Q1:** In which countries do car sales show the fastest relative growth in year (e.g.) 2021? Is the situation different for buses? 
- **M1:** Move to 2021, press 0-100, x=y, and then lock zoom buttons. Identify the countries the with the furthest y difference from the X=y diagonal
- **A1:** top 5 car countries: Norway, Iceland, Sweden, Denmark, Finland. Top 5 bus countries: China, Netherlands, Finland, Sweden, Belgium <br>
- **Q2:** Were these countries always the quickest growers (cars and buses)? 
- **M2:** Select these countries and move the slider. Use zoom to inspect the smaller percentages, but make sure the axes were locked. 
- **A2:** No, going back in time shows that e.g. the Netherlands were early adopters of EV cars but got overtaken in the top 5 in 2021. And that some of the selected countries were below the x=y line (= electric fleet is growing slower than conventional) for quite some years, whilst a lot of other countries performed better. In 2023, buses in Norway and Switserland showed higher relative growth for electrification of the Busfleet, where the other countries slowed down. <br>
- **Q3:** How did the sale/stockshare of the vans and busses in countries with 0.1+% stockshare in 2014 (=earliest adopters) change over time? Are there patterns in the vans vs buses visible in this subgroup?
- **M3:** Go to 2014. Zoom to this specific area of the graph. Use the legend filter to only visualize vans and trucks. Select the relevant points and scroll down to the next cell. Inspect how countries move over time, make sure to lock zoom at the last years level. x=y button can also be useful here. 
- **A3:** In the earliest years (<2014) the vans were the quickest growers and Buses were not even electrifying overall (below diagonal). The vans were overtaken by buses in China in 2014, the Netherlands in 2015, and later the others. Whilst all countries kept electrifying their bus and vans fleet, buses ultimately took the lead in quickest electrification. This is likely due to certain government policies, which can be interesting to look at for deeper causal analysis. It could also be interesting to find other countries that had policies in place, and compare them with the earlier adopters. Futhermore, another interesting insight can be gained by using the slider in the first scatter shows that the earliest adopters were also overtaken by other countries with regards to buses and vans. 

### **B. Infrastructure Alignment Mode**
- **X-axis:** ChargingStations 
- **Y-axis:** StockBEV  

Interpretation:
- **High ratio (>3):** Publicly available infrastructure is lagging behind EV adoption rate
- **Moderate ratio (1-3):** Infrastructure is keeping pace   
- **low ratio (<1):** Policy-driven overbuilding, or adoption is not solely determined by available infrastructure 

**Analytical questions, methods and answers:**
- **Q1:** How do the ratios compare between the different car types and move over time?
- **M1:** Select x=y and scroll through the years. Use the legend filters to focus on a specific car type.
- **A1:** During the years, the cars almost always have a moderate to high ratio. This makes sense, as there is no need for (more than) 1 publicly available charger per car. Countries that have (close to) low ratios for EV cars highlights strong overbuilding of infrastructure to stimulate BEV adoption rates. Naturally, buses are will always have a lower ratio, they aren't charged using public infrastructure anyway. Vans have moderate ratios in the early years, but after 2014 they almost all fall below the x=y line, indicating that over time, available infrastructure becomes less of a bottleneck for electrification. The same conclusion can be drawn for trucks, and further research is needed to find what the bottleneck is that determines the adoption rate. <br>

### Interaction
**Select countries here to feed the Advanced Insights Panel below.**  
This creates a powerful diagnostic→explanation workflow.


In [13]:
# Scatter
w = ScatterBrush(
    wide,
    x="StockShare",   # defaults; switch in UI
    y="SalesShare",
    key="id",
    label="label",
    color="mode",
    legend=True,
    legendPosition="right",
    xyVars=xyVars,
    yearMin=yearMin,
    yearMax=yearMax,
    width=1080,
    height=520,
    panel_position="right",
    panel_width=340,
)
w

ScatterBrush(data=[{'region': 'Australia', 'mode': 'Cars', 'StockBEV__F2010': None, 'StockBEV__F2011': 49.0, '…

In [14]:
# --- Minimal linked view: second scatter mirrors selection + axes (options-based) ---
from Isea.scatter import ScatterBrush

def _get_axes(widget):
    opts = getattr(widget, "options", {}) or {}
    return opts.get("x"), opts.get("y")

x0, y0 = _get_axes(w)

# empty second widget; same config as the first
w_link = ScatterBrush(
    wide.iloc[0:0].to_dict("records"),
    x=x0, y=y0,
    key="id", label="label", color="mode",
    legend=True, legendPosition="right",
    xyVars=xyVars, yearMin=yearMin, yearMax=yearMax,
    width=1080, height=420, panel_position="right", panel_width=340,
)
display(w_link)

def _sync(*_):
    # current axes from the first widget
    x, y = _get_axes(w)
    # current selection
    sel = w.selection if isinstance(w.selection, dict) else {}
    keys = sel.get("keys", []) or []
    sub = wide[wide["id"].isin(keys)] if keys else wide.iloc[0:0]

    # push data
    w_link.data = sub.to_dict("records")
    # push axes via options (keep other options intact)
    w_link.options = {**(w_link.options or {}), "x": x, "y": y}
    # optional: clear selection inside the second chart each update
    w_link.selection = {"type": None, "keys": [], "rows": [], "epoch": 0}

# react to selection and option (axis) changes
w.observe(lambda ch: _sync(), names="selection")
w.observe(lambda ch: _sync(), names="options")

# initial paint
_sync()


ScatterBrush(options={'x': 'StockShare', 'y': 'SalesShare', 'label': 'label', 'color': 'mode', 'key': 'id', 'w…

In [15]:
from sklearn.metrics import silhouette_score, davies_bouldin_score

def analyze_clustering_quality(df_numeric, kmeans_model, scaler=None):
    """
    Calcula métricas de validación para el clustering.
    
    Returns:
    - silhouette_avg: Score promedio (rango -1 a 1; > 0.5 es bueno)
    - silhouette_samples: Score por cada punto
    - davies_bouldin: Índice de separación (< 1 es excelente)
    - cluster_sizes: Tamaño de cada cluster
    - inertia: Suma de distancias intra-cluster
    """
    # Silhouette Score (rango -1 a 1; mayor es mejor)
    silhouette_avg = silhouette_score(df_numeric, kmeans_model.labels_)
    
    # Davies-Bouldin Index (menor es mejor; < 1 excelente)
    davies_bouldin = davies_bouldin_score(df_numeric, kmeans_model.labels_)
    
    unique, counts = np.unique(kmeans_model.labels_, return_counts=True)
    cluster_sizes = dict(zip(unique, counts))
    
    inertia = kmeans_model.inertia_
    
    return {
        'silhouette_avg': silhouette_avg,
        'davies_bouldin': davies_bouldin,
        'cluster_sizes': cluster_sizes,
        'inertia': inertia,
        'n_clusters': len(unique)
    }


def calculate_elbow_curve(df_numeric, k_range=range(2, 8)):
    """
    Calcula inertia para diferentes valores de k.
    Útil para visualizar el "codo" (elbow method).
    
    Returns:
    - Dictionary con k: inertia
    """
    inertias = {}
    silhouette_scores_dict = {}
    
    for k in k_range:
        kmeans_temp = KMeans(n_clusters=k, random_state=42, n_init=10)
        kmeans_temp.fit(df_numeric)
        inertias[k] = kmeans_temp.inertia_
        silhouette_scores_dict[k] = silhouette_score(df_numeric, kmeans_temp.labels_)
    
    return {
        'inertias': inertias,
        'silhouette_scores': silhouette_scores_dict
    }


def characterize_cluster(cluster_id, X_df, latest_cols, cluster_col='Cluster'):
    """
    Genera descripción textual de qué caracteriza a cada cluster.
    
    Parameters:
    - cluster_id: ID del cluster
    - X_df: DataFrame con los datos (debe tener 'Cluster' column)
    - latest_cols: Lista de NOMBRES DE COLUMNAS COMPLETOS (ej: 'Bio (MW)__F2023')
    - cluster_col: nombre de la columna con cluster assignments
    
    Ejemplo output:
    "Cluster 0: SOLAR-DOMINANT (Solar=450 MW, 65% del total)
              Low Hydro (15 MW, 2% del total)
              Small scale (Total=690 MW)"
    """
    cluster_data = X_df[X_df[cluster_col] == cluster_id][latest_cols]
    
    means = cluster_data.mean()
    total_capacity = means.sum()
    
    percentages = (means / total_capacity * 100) if total_capacity > 0 else means * 0
    
    sorted_techs = means.sort_values(ascending=False)
    
    tech_names_clean = [c.split('__F')[0].replace('(MW)', '').replace('(GWh)', '').strip() 
                        for c in latest_cols]
    
    description = f"Cluster {cluster_id}:\n"
    for col, tech_clean in zip(sorted_techs.index, tech_names_clean):
        pct = percentages[col]
        val = means[col]
        description += f"  • {tech_clean}: {val:.0f} MW ({pct:.1f}%)\n"
    
    description += f"  Total Capacity: {total_capacity:.0f} MW\n"
    
    top_col = sorted_techs.index[0]
    top_tech = top_col.split('__F')[0].replace('(MW)', '').replace('(GWh)', '').strip()
    top_pct = percentages[top_col]
    
    if top_pct > 60:
        description += f"  → TYPE: {top_tech.upper()}-DOMINATED"
    elif top_pct > 40:
        description += f"  → TYPE: {top_tech.upper()}-MAJORITY"
    else:
        description += f"  → TYPE: MIXED/BALANCED"
    
    return description

## 3. Advanced Insights Panel (Clustering • Regression • Correlation)

This panel explains **why** countries appear where they do in the scatter.
The runs analysis based on your scatter selection for the selected year. The insights below are based upon selecting all countries, but this needn't be done. 

#### **Clustering**
Groups countries into adoption profiles:
- Leaders  
- Fast risers  
- Infrastructure laggers  
- Early-stage adopters  

**Insights:**
The model found 4 clusters (Silhouette Score: 0.656) (Davies-Bouldin Index: 0.658):
0. **Early stage & infrastructure laggards:** lowest stockshare (1.4%), moderate sale share (9.0%) and very limited infrastructure (avg. 41k)
1. **Early stage & infrastructure laggards:** similar shares as 0 (2.1% & 9.5%), although higher absolutes (stock, sales and infrastructure). Infrastructure is relatively still lagging. 
2. **Fast risers:** high stockshare (9.8%), really high sales share (46%), where infrastructure is not keeping up compared to absolute stock BEV
3. **Leaders:** similar stockshare (9.9%), but sales are not growing as quick (21.6%). Very high amount of infrastructure compared to absolute stock of BEV, indicating that there is likely a stronger policy push for EV adoption. 

#### **Regression Trend Analysis**
Shows long-term growth trajectories (e.g., BEV sales or stock).  
Useful for understanding:
- Whether current behavior is consistent with the past  
- Whether growth of the electric vehicle share is faster or slower than the one of conventional vehicles

**Insights:**
Shows how BEV sales in China, USA, Germany, France and the UK have developed over time. 
The USA and China are clearly standing out in terms of recent salesBEV growth. USA display a reletively early peak in 2018.

#### **Correlation Analysis**
Reveals structural relationships:
- Does infrastructure growth track BEV growth?  
- How tightly are stock share and sales share coupled?  
- Are PHEVs behaving fundamentally differently?  

**Insights:**
The correlation matrix mainly shows consistent correlations between absolute and shares of stocks and sales, which in the first place makes sense as the stock is a cumulative of the sales, althought the sales in a year does not have to be related to the stocksize the country started the year with. The analysis shows that this however is the case. 

In [20]:
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import silhouette_score, davies_bouldin_score, r2_score
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, HTML
import ipywidgets as widgets

# Assuming 'wide' 

# ========== 1) Clustering Functions for EV Adoption Profiles ==========

def get_latest_year_data(df_wide):
    """Extracts data for the most recent year from the DataFrame."""
    ev_cols = [c for c in df_wide.columns if '__F' in c]
    if not ev_cols:
        return None, []
    
    latest_year = max([int(c.split('__F')[1]) for c in ev_cols])
    
    # Select only relevant metrics for EV clustering
    metrics_to_cluster = ['StockBEV', 'SalesBEV', 'StockShare', 'SalesShare', 'ChargingStations']
    relevant_cols = [f"{metric}__F{latest_year}" for metric in metrics_to_cluster if f"{metric}__F{latest_year}" in df_wide.columns]
    
    df_latest = df_wide[['id', 'region', 'mode'] + relevant_cols].copy()
    df_latest.rename(columns={col: col.split('__F')[0] for col in relevant_cols}, inplace=True)
    
    return df_latest, relevant_cols

def cluster_regions_by_ev_profile(df_latest, n_clusters=4):
    """
    Clusters regions/modes based on their EV adoption profile.
    """
    if df_latest is None or len(df_latest) < n_clusters:
        print(f"Not enough data to cluster (at least {n_clusters} data points required).")
        return None, None, None

    features = [col for col in ['StockBEV', 'SalesBEV', 'StockShare', 'SalesShare', 'ChargingStations'] if col in df_latest.columns]
    if not features:
        print("No feature columns found for clustering.")
        return None, None, None
        
    X_numeric = df_latest[features].fillna(0)
    
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X_numeric)
    
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    df_latest['Cluster'] = kmeans.fit_predict(X_scaled)
    
    # Clustering quality analysis
    quality_metrics = {
        'silhouette_avg': silhouette_score(X_scaled, kmeans.labels_),
        'davies_bouldin': davies_bouldin_score(X_scaled, kmeans.labels_),
        'inertia': kmeans.inertia_
    }
    
    cluster_stats = df_latest.groupby('Cluster')[features].mean()
    
    return df_latest[['id', 'region', 'mode', 'Cluster']], cluster_stats, quality_metrics

def characterize_ev_cluster(cluster_id, cluster_stats):
    """Generates a textual description for each EV cluster."""
    stats = cluster_stats.loc[cluster_id]
    description = f"**Cluster {cluster_id}:**\n"
    
    # Logic to describe the cluster
    if stats.get('StockShare', 0) > 5 and stats.get('ChargingStations', 0) > 50000:
        description += "  - **Profile:** Established Leaders\n"
        description += "  - **Characteristics:** High BEV penetration, robust sales, and excellent charging infrastructure.\n"
    elif stats.get('SalesShare', 0) > 10 and stats.get('StockShare', 0) < 5:
        description += "  - **Profile:** Fast-Emerging Markets\n"
        description += "  - **Characteristics:** Very rapid sales growth, indicating an acceleration in adoption.\n"
    elif stats.get('SalesBEV', 0) > 10000 and stats.get('ChargingStations', 0) < 10000:
        description += "  - **Profile:** Growth with Infrastructure Lag\n"
        description += "  - **Characteristics:** Significant BEV sales, but charging infrastructure appears to be lagging.\n"
    else:
        description += "  - **Profile:** Laggards or Early Stage\n"
        description += "  - **Characteristics:** Low EV penetration and sales, with limited infrastructure.\n"

    description += f"  - *Key Metrics (average): Stock Share: {stats.get('StockShare', 0):.2f}%, Sales Share: {stats.get('SalesShare', 0):.2f}%, Charging Stations: {stats.get('ChargingStations', 0):.0f}*\n"
    return description

# ========== 2) Linear Regression Function for Adoption Trends (CORRECTED) ==========

def predict_ev_trends(df_wide, selection_ids, metric='SalesBEV', top_n=5):
    """
    Performs linear regression to predict trends for a given EV metric.
    """
    df_selected = df_wide[df_wide['id'].isin(selection_ids)]
    if df_selected.empty:
        print("No selections to analyze.")
        return None

    metric_cols = sorted([c for c in df_selected.columns if metric in c and '__F' in c])
    if not metric_cols:
        print(f"No data found for the metric '{metric}'.")
        return None

    years = np.array([int(c.split('__F')[1]) for c in metric_cols])
    
    latest_col = metric_cols[-1]
    top_selections = df_selected.nlargest(top_n, latest_col)
    
    results = []
    
    plt.figure(figsize=(10, 6))
    colors = plt.cm.viridis(np.linspace(0, 1, len(top_selections)))
    
    for idx, (i, row) in enumerate(top_selections.iterrows()):
        values = pd.to_numeric(row[metric_cols], errors='coerce').values
        
        # Filter years with valid data (at least 3 data points required for regression)
        mask = ~np.isnan(values) & (values > 0)
        if mask.sum() < 3:
            continue
            
        X = years[mask].reshape(-1, 1)
        y = values[mask]
        
        model = LinearRegression()
        model.fit(X, y)
        
        future_years = np.arange(years.max() + 1, years.max() + 4).reshape(-1, 1)
        predictions = model.predict(future_years)
        
        r2 = r2_score(y, model.predict(X))
        
        results.append({
            'label': row['label'],
            'growth_rate': model.coef_[0],
            'r2': r2,
            'prediction_next_year': predictions[0]
        })
        
        plt.plot(X.flatten(), y, 'o-', color=colors[idx], label=f"{row['label']} (R²={r2:.2f})")
        plt.plot(future_years.flatten(), predictions, '--', color=colors[idx], alpha=0.7)

    plt.xlabel('Year', fontsize=12)
    plt.ylabel(metric, fontsize=12)
    plt.title(f'Trend and Forecast for {metric}', fontsize=14, fontweight='bold')
    plt.legend(loc='best', fontsize=9)
    plt.grid(alpha=0.4)
    plt.axvline(years.max(), color='grey', linestyle=':', label='Last Year')
    plt.show()

    print(f"\nRegression Analysis for '{metric}':")
    for r in sorted(results, key=lambda x: x['growth_rate'], reverse=True):
        print(f"  - **{r['label']}**: Growth of **{r['growth_rate']:.0f} units/year**. "
              f"Prediction for next year: {r['prediction_next_year']:.0f} units.")
    
    return pd.DataFrame(results)

# ========== 3) Correlation Analysis Function ==========

def analyze_ev_correlations(df_latest):
    """
    Analyzes and visualizes the correlation between key EV metrics.
    """
    if df_latest is None:
        return

    features = [col for col in ['StockBEV', 'SalesBEV', 'StockShare', 'SalesShare', 'ChargingStations'] if col in df_latest.columns]
    if not features:
        print("No features found for correlation analysis.")
        return

    corr_matrix = df_latest[features].corr()
    
    plt.figure(figsize=(8, 6))
    sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0, vmin=-1, vmax=1)
    plt.title('Correlation Matrix of EV Metrics', fontsize=14, fontweight='bold')
    plt.show()
    
    print("\nCorrelation Insights:")
    # Filter to avoid duplicate insights
    checked_pairs = set()
    for col1 in corr_matrix.columns:
        for col2 in corr_matrix.columns:
            if col1 != col2 and frozenset([col1, col2]) not in checked_pairs:
                correlation_value = corr_matrix.loc[col1, col2]
                checked_pairs.add(frozenset([col1, col2]))
                if correlation_value > 0.7:
                    print(f"  - Strong positive correlation between **{col1}** and **{col2}** ({correlation_value:.2f}). "
                          "This suggests that as one grows, so does the other.")
                elif correlation_value < -0.5:
                    print(f"  - Negative correlation between **{col1}** and **{col2}** ({correlation_value:.2f}).")


# ========== Control Panel to Run the Analysis ==========

button = widgets.Button(
    description='Run EV Adoption Analysis',
    button_style='success',
    tooltip='Click to analyze the scatter plot selections',
    icon='car',
    layout=widgets.Layout(width='300px', height='40px')
)

output_ml = widgets.Output()

def run_full_analysis(b):
    with output_ml:
        output_ml.clear_output()
        
        # Use the selection from the 'w' widget
        try:
            selection_keys = w.selection.get('keys', []) if isinstance(w.selection, dict) else []
            if not selection_keys:
                 print("INFO: No region/mode selected. Running analysis on all data.")
                 df_selected = wide
            else:
                df_selected = wide[wide['id'].isin(selection_keys)]
        except NameError:
            print("ERROR: The visualization widget 'w' was not found. Please ensure the previous cell has been executed.")
            return

        print(f"\n{'='*80}")
        print(f"  EV ADOPTION ANALYSIS FOR {len(df_selected)} SELECTED REGIONS/MODES")
        print(f"{'='*80}\n")
        
        df_latest, _ = get_latest_year_data(df_selected)

        # 1) Clustering
        print("\n" + "="*80)
        print(" 1. CLUSTERING ANALYSIS: ADOPTION PROFILES")
        print("="*80)
        num_clusters = min(4, len(df_latest) - 1 if df_latest is not None and len(df_latest) > 1 else 1)
        if num_clusters < 2:
            print("Not enough data points to form clusters.")
            clusters_df, cluster_stats, quality_metrics = None, None, None
        else:
            clusters_df, cluster_stats, quality_metrics = cluster_regions_by_ev_profile(df_latest, n_clusters=num_clusters)
        
        if clusters_df is not None:
            print("\n**Clustering Quality:**")
            print(f"  - Silhouette Score: {quality_metrics['silhouette_avg']:.3f} (higher is better, >0.5 is good)")
            print(f"  - Davies-Bouldin Index: {quality_metrics['davies_bouldin']:.3f} (lower is better, <1.0 is excellent)\n")

            print("**Cluster Characterization:**")
            for cluster_id in sorted(cluster_stats.index):
                print(characterize_ev_cluster(cluster_id, cluster_stats))
                
            plt.figure(figsize=(12, 7))
            sns.heatmap(cluster_stats, annot=True, fmt=".1f", cmap="viridis")
            plt.title("Average Characteristics by EV Adoption Cluster", fontsize=14, fontweight='bold')
            plt.show()

        # 2) Regression
        print("\n" + "="*80)
        print(" 2. REGRESSION ANALYSIS: BEV SALES ADOPTION SPEED")
        print("="*80)
        predict_ev_trends(df_selected, df_selected['id'].tolist(), metric='SalesBEV', top_n=min(5, len(df_selected)))

        # 3) Correlations
        print("\n" + "="*80)
        print(" 3. CORRELATION ANALYSIS: ADOPTION DRIVERS")
        print("="*80)
        analyze_ev_correlations(df_latest)

        print(f"\n{'='*80}")
        print("  Analysis complete.")
        print(f"{'='*80}\n")

button.on_click(run_full_analysis)

display(widgets.VBox([
    widgets.HTML("<h3>EV Adoption Analysis Panel</h3>"),
    widgets.HTML("<p>Select regions/modes in the scatter plot above, then click the button to generate the analysis.</p>"),
    button,
    output_ml
]))

VBox(children=(HTML(value='<h3>EV Adoption Analysis Panel</h3>'), HTML(value='<p>Select regions/modes in the s…

## 4. Energy Quad - Parallel lines, table and technology presence insight
 - BEV adoption shows consistent and an accelerating growth in leading markets, with cumulative stock rising faster than any other technology  that means that the sells keep growing very fast.
- PHEV stock is still increasing in absolute terms, but its share of total EVs is flattening or declining, confirming its role its an only transitional option.
- Mode-specific charts show that passenger cars are the main force driving the global EV transition, while hydrogen vehicles (FCEVs) remain very small in number and have not yet reached meaningful adoption in most markets. Although Korea shows as one clear outlier, leading the world by far in terms of hydrogen vehicle stock (and FCEVs only).
- Playing with the selection tool shows that all other top countries in terms of absolute BEV stock have a lower stock / charging ratio than China. 
- The amount of crossings between stockBEV and PHEV also indicate a somewhat negative correlation between the two variables. If a country has a higher BEV stock it likely transitioned past the PHEV, confirming the earlier conclusions. 

Watching the graph year by year, we can see that countries are steadily migrating toward BEVs.  
This thing becomes evident as cumulative BEV stock grows faster than any other technology, while PHEV shares begin to go down and being transformed to BEV as the main technology in the countries and FCEVs remain minimal.  


In [27]:
from Isea import ParallelEnergy  # tu clase que usa energy_quad.js

# -------- parámetros de visual ----------
dims_all2 = (
    "StockBEV","StockPHEV","StockFCEV",
    "ChargingStations",
    # "SalesShare",
)
YEARS2 = [y for y in YEARS_ALL2 if y >= 2010] or YEARS_ALL2
latest2 = max(YEARS2)
topN2 = 60  

# -------- top países por StockBEV en el año más reciente ----------
col_latest2 = f"StockBEV__F{latest2}"
wide_top2 = (wide_ev2.assign(**{col_latest2: pd.to_numeric(wide_ev2.get(col_latest2), errors="coerce").fillna(0.0)})
                        .sort_values(col_latest2, ascending=False)
                        .head(topN2)
                        .reset_index(drop=True))

# -------- builder de PACK para energy_quad.js ----------
def build_energy_pack2(wide_in2, dims2, years2, key_col2="Country"):
    records2 = []
    for _, row2 in wide_in2.iterrows():
        rec2 = {"label": row2[key_col2]}  # <- energy_quad busca r.label
        has_data2 = False
        for d2 in dims2:
            series2 = []
            for y2 in years2:
                v2 = row2.get(f"{d2}__F{y2}", 0.0)
                v2 = 0.0 if (pd.isna(v2) or not np.isfinite(v2)) else float(v2)
                if v2 != 0.0:
                    has_data2 = True
                series2.append(v2)
            rec2[d2] = series2
        if has_data2:
            records2.append(rec2)
    return {"years": [f"F{y}" for y in years2], "dims": list(dims2), "records": records2}

pack2 = build_energy_pack2(wide_top2, dims_all2, YEARS2)

# -------- Inicialización mínima del widget (dummy) ----------
dummy2 = pd.DataFrame({
    "Country": ["__init__"] * len(dims_all2),
    "Technology_std": list(dims_all2),
    f"F{latest2}": [0.0] * len(dims_all2),
})

w2 = ParallelEnergy(
    dummy2, [f"F{latest2}"],
    tech_col="Technology_std",   # distinto de label_col
    label_col="Country",
    dims=dims_all2,
    year_start=latest2,
    width=1280,
    reorder=True, normalize=False, log_axes=False
)

# -------- Opciones para energy_quad.js ----------
w2.options = {
    **w2.options,
    "title": f"EV landscape — {latest2}",
    "unit": "",
    "right_share": 0.36, "right_width": 520,
    "left_height": 540, "table_height": 230,
    "reorder": True, "normalize": False, "log_axes": False,
}

# Inyecta el pack y renderiza
w2.data = pack2
w2


ParallelEnergy(data={'years': ['F2010', 'F2011', 'F2012', 'F2013', 'F2014', 'F2015', 'F2016', 'F2017', 'F2018'…

# Conclusion
Across the full notebook, a clear picture emerges of how the EV transition unfolds globally and why countries diverge so sharply in their trajectories. The map and scatter views show that leadership in electrification is fluid: early European front-runners are later overtaken by China, and even within technologies (cars, vans, buses) the hierarchy shifts over time. By combining sales share, fleet share and infrastructure metrics, the diagnostic scatter makes visible which countries are genuinely accelerating and which are simply coasting on past gains. The interaction modes reveal that vans once led early growth while buses lagged, but targeted national policies flipped this pattern and pushed buses into the fastest-growing segment in several markets. Infrastructure alignment further exposes where public charging lags behind electric fleet expansion, and where governments have been overbuilding as a strategic push—insights that only emerge when sales, stock and chargers are analysed together year by year.

The advanced analytics panel reinforces these patterns by showing that countries naturally cluster into four adoption profiles, ranging from early-stage laggards to true leaders with aggressive infrastructure deployment. Regression and correlation analyses confirm that BEVs are now the dominant long-term path, that PHEVs function as a transitional technology with a declining share, and that infrastructure growth does not always track BEV scaling evenly across markets. The EnergyQuad ties all of this together: viewing parallel technology development over time makes the structural transition unmistakable. BEV stock grows exponentially, PHEVs plateau, and FCEVs remain marginal except in a few specialised cases. When you watch the lines shift year by year, you see a global system steadily converging on one outcome—full BEV dominance—while each country takes its own detours, bottlenecks and accelerations along the way.