
# World Bank Data Visualization Playground

This notebook shows how to fetch data using **`wbdata`** (World Bank Data) and visualize it with multiple Python libraries:

- Matplotlib
- Seaborn
- Plotly (Graph Objects & Express)
- Altair
- Bokeh

It also provides starter app files for **Streamlit** and **Dash** for quick interactive dashboards, plus a brief note on vendor BI tools.



## 1) Environment setup

Run the following cell to install dependencies (safe to re-run). If you're on a managed environment (e.g., Codespaces, Colab), you may already have some of these.


In [None]:

%pip -q install wbdata pandas numpy matplotlib seaborn plotly altair bokeh jinja2 --upgrade



## 2) Imports & Configuration


In [None]:

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import wbdata as wb
import datetime as dt

import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objects as go
import plotly.express as px

import altair as alt

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import HoverTool
output_notebook()

# pandas display
pd.set_option("display.max_rows", 8)
pd.set_option("display.max_columns", None)



## 3) Fetch World Bank data with `wbdata`

We'll pull **GDP per capita (current US$)** (`NY.GDP.PCAP.CD`) for a handful of countries from **2000 through the most recent available year**.


In [None]:

# Choose a few countries by ISO codes
countries = ["USA", "GBR", "DEU", "IND", "CHN", "BRA", "ZAF"]

# Indicator: GDP per capita (current US$)
indicator = {"NY.GDP.PCAP.CD": "gdp_per_capita_usd"}

# Date range (2000 -> today)
data_date = (dt.datetime(2000, 1, 1), dt.datetime.today())

# Fetch
df_raw = wb.get_dataframe(indicator, country=countries, data_date=data_date, convert_date=True)

# Tidy up the multi-index (country, date)
df = df_raw.reset_index().rename(columns={"country": "country", "date": "date"}).sort_values(["country", "date"])

# Peek
df.head()



## 4) Quick EDA


In [None]:

summary = (
    df.groupby("country")["gdp_per_capita_usd"]
    .agg(["count", "min", "median", "mean", "max"])
    .round(2)
    .sort_values("mean", ascending=False)
)
summary



## 5) Matplotlib line chart


In [None]:

plt.figure(figsize=(10,6))
for c in countries:
    sub = df[df["country"] == c]
    plt.plot(sub["date"], sub["gdp_per_capita_usd"], label=c)
plt.title("GDP per capita (current US$)")
plt.xlabel("Year")
plt.ylabel("USD")
plt.legend()
plt.grid(True)
plt.show()



## 6) Seaborn lineplot + facet


In [None]:

sns.lineplot(data=df, x="date", y="gdp_per_capita_usd", hue="country")
plt.title("GDP per capita by Country")
plt.xlabel("Year")
plt.ylabel("USD")
plt.show()

# FacetGrid example (small multiples)
g = sns.FacetGrid(df, col="country", col_wrap=3, sharey=False, height=3)
g.map_dataframe(sns.lineplot, x="date", y="gdp_per_capita_usd")
g.set_titles("{col_name}")
for ax in g.axes.flatten():
    ax.set_xlabel("Year")
    ax.set_ylabel("USD")
plt.show()



## 7) Plotly (Graph Objects) interactive line


In [None]:

fig = go.Figure()
for c in countries:
    sub = df[df["country"] == c]
    fig.add_trace(go.Scatter(x=sub["date"], y=sub["gdp_per_capita_usd"], mode="lines", name=c))
fig.update_layout(
    title="GDP per capita (current US$)",
    xaxis_title="Year",
    yaxis_title="USD",
    hovermode="x unified"
)
fig.show()



## 8) Plotly Express quick chart


In [None]:

fig = px.line(df, x="date", y="gdp_per_capita_usd", color="country",
              title="GDP per capita (current US$) — Plotly Express")
fig.show()



## 9) Altair


In [None]:

alt.data_transformers.disable_max_rows()
chart = (
    alt.Chart(df)
    .mark_line()
    .encode(
        x="date:T",
        y=alt.Y("gdp_per_capita_usd:Q", title="USD"),
        color="country:N",
        tooltip=["country", alt.Tooltip("date:T", title="Year"), alt.Tooltip("gdp_per_capita_usd:Q", format=",.0f", title="USD")]
    )
    .properties(width=700, height=400, title="GDP per capita (current US$) — Altair")
    .interactive()
)
chart



## 10) Bokeh


In [None]:

p = figure(title="GDP per capita (current US$) — Bokeh", x_axis_type="datetime", width=800, height=400, tooltips=[("Country","$name"),("Year","@x{%Y}"),("USD","@y{0,0}")])
p.add_tools(HoverTool(tooltips=None, mode='vline'))

for c in countries:
    sub = df[df["country"] == c]
    p.line(x=sub["date"], y=sub["gdp_per_capita_usd"], legend_label=c, name=c)

p.xaxis.axis_label = "Year"
p.yaxis.axis_label = "USD"
p.legend.click_policy = "hide"
show(p)



## 11) Note on vendor BI tools (Power BI, Tableau, etc.)

For production dashboards or where enterprise governance, data modeling, and distribution are key, consider vendor BI solutions:
- **Power BI**: tight Microsoft 365/Teams integration, robust DAX modeling, row-level security, and enterprise deployment pipelines.
- **Tableau**: powerful visual analytics & storytelling, strong viz grammar, and mature Server/Cloud for sharing.
- **Looker/Looker Studio**: semantic modeling (LookML), Google Cloud integration, and lightweight reporting.
Python notebooks remain excellent for rapid prototyping, data science exploration, and custom analytics that exceed the typical scope of canned visuals.



## 12) Appendix — Change indicators or countries

- To use a different indicator, replace `NY.GDP.PCAP.CD` with any World Bank indicator code (e.g., `SP.POP.TOTL` for population).
- To change countries, update the `countries = [...]` list with ISO3 codes.


In [None]:

# Example: switch to population and re-run the EDA/plot cells
# indicator = {"SP.POP.TOTL": "population"}
# df_raw = wb.get_dataframe(indicator, country=countries, data_date=data_date, convert_date=True)
# df = df_raw.reset_index().rename(columns={"country": "country", "date": "date"}).sort_values(["country", "date"])
# df.head()



## 13) Optional extras — more visualization libraries

Run this once if you want the additional approaches below.


In [None]:

%pip -q install hvplot holoviews panel datashader geopandas folium networkx pyvis plotnine --upgrade



## 14) hvPlot / HoloViews (interactive, declarative)

`hvPlot` gives you a **high-level, pandas-like** plotting API on top of **HoloViews** with Bokeh/Plotly backends.


In [None]:

import hvplot.pandas  # registers .hvplot accessor
import holoviews as hv
hv.extension('bokeh')

# Interactive line using hvPlot
(df
 .set_index('date')
 .pivot(columns='country', values='gdp_per_capita_usd')
 .hvplot.line(width=800, height=400, legend='top_left', title='GDP per capita — hvPlot'))



## 15) Panel mini-dashboard (HoloViz)

A lightweight dashboard alternative to Streamlit/Dash that works with Bokeh/Plotly/Altair/HoloViews.


In [None]:

import panel as pn
pn.extension('plotly')

countries_select = pn.widgets.MultiChoice(name='Countries', value=['USA','GBR','IND'], options=sorted(df['country'].unique().tolist()))
@pn.depends(countries_select)
def panel_plot(countries):
    sub = df[df['country'].isin(countries)]
    return px.line(sub, x='date', y='gdp_per_capita_usd', color='country', title='GDP per capita — Panel + Plotly')

pn.Row(countries_select, pn.bind(panel_plot, countries_select))



## 16) Folium (interactive maps)

Choropleth map of the **latest GDP per capita** by country using `folium` + a public world GeoJSON.


In [None]:

import folium, json, pandas as pd, numpy as np

# Latest year per country
latest = df.sort_values('date').groupby('country', as_index=False).tail(1)
latest = latest[['country','gdp_per_capita_usd']].rename(columns={'country':'ISO_A3','gdp_per_capita_usd':'value'})

# Load world boundaries
import urllib.request, ssl
ssl._create_default_https_context = ssl._create_unverified_context
url = "https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json"
with urllib.request.urlopen(url) as response:
    world_geo = json.loads(response.read().decode())

m = folium.Map(location=[20,0], zoom_start=2, tiles="cartodbpositron")
folium.Choropleth(
    geo_data=world_geo,
    data=latest,
    columns=['ISO_A3','value'],
    key_on='feature.id',
    fill_opacity=0.8, line_opacity=0.2,
    legend_name='GDP per capita (current US$) — latest',
).add_to(m)
m



## 17) Network visualization (NetworkX + PyVis)

Build a simple **similarity network**: connect countries whose GDP per capita **correlation** exceeds a threshold.


In [None]:

import networkx as nx
from pyvis.network import Network

# Pivot to wide for correlation
wide = (df.pivot_table(index='date', columns='country', values='gdp_per_capita_usd'))
corr = wide.corr(min_periods=10)

G = nx.Graph()
for c in corr.columns:
    G.add_node(c)
threshold = 0.95
for i in corr.columns:
    for j in corr.columns:
        if i < j and corr.loc[i,j] >= threshold:
            G.add_edge(i,j, weight=float(corr.loc[i,j]))

net = Network(height="500px", width="100%", notebook=True)
net.from_nx(G)
net.show("network.html")
from IPython.display import IFrame
IFrame(src="network.html", width="100%", height=520)



## 18) Plotnine (ggplot-style)

A grammar-of-graphics API similar to R's ggplot2.


In [None]:

from plotnine import ggplot, aes, geom_line, facet_wrap, labs, theme_minimal

p = (ggplot(df, aes('date','gdp_per_capita_usd', color='country'))
     + geom_line()
     + labs(title='GDP per capita — Plotnine', x='Year', y='USD')
     + theme_minimal())
p



## 19) Datashader (very large data)

For millions of points/lines, use **Datashader** to rasterize efficiently. Example below simulates data for demonstration.


In [None]:

import datashader as ds
import datashader.transfer_functions as tf
from datashader.multipolygon import _

# Simulate large time series
n = 1_000_00
rng = pd.date_range('2000-01-01', periods=n, freq='D')
vals = np.cumsum(np.random.randn(n))
big = pd.DataFrame({'date': rng, 'value': vals})

canvas = ds.Canvas(x_range=(big['date'].min().value, big['date'].max().value), y_range=(big['value'].min(), big['value'].max()), plot_width=800, plot_height=300)
agg = canvas.line(big, 'date', 'value')
tf.shade(agg)



## 20) Notes on other approaches (at a glance)

- **Geospatial**: GeoPandas + Cartopy (projection-aware), Kepler.gl (Jupyter widget), Deck.gl via `pydeck`.
- **3D & scientific**: PyVista, Mayavi, VTK for volumetric/mesh data; Plotly 3D for quick surfaces.
- **Interactive widgets**: `ipywidgets`, `ipympl`, `ipycytoscape` for rich Jupyter interactivity.
- **Notebook → app**: Voila (turn notebooks into apps), **Shiny for Python** (R's Shiny port), **Gradio** (fast ML demos).
- **Reporting**: `nbconvert`/`papermill`, and static site builds with Quarto.

### Google visualization approaches
- **Google Charts (Visualization API)**: Rich interactive charts for the web; you can serve data from Python (e.g., with `gviz_api`) and render in HTML/JS.
- **Looker Studio (formerly Data Studio)**: Free browser-based dashboards connecting to Sheets, BigQuery, and more.
- **Looker (LookML)**: Enterprise semantic layer + governed dashboards; strong integration with BigQuery.
- **Google Sheets**: Built-in charts; use **Connected Sheets** for BigQuery-scale analysis without SQL.
- **Colab**: Good default table/plot experiences; supports rendering Plotly/Altair/Bokeh; easy to share notebooks with viewers in Google Drive.
