# Session 5: Mapping Data Centers

**Goal:** Map and analyze the geographic distribution of data centers.

In this notebook, we will:
1.  Load a real dataset of data center locations.
2.  Convert raw coordinates (Lat/Lon) into spatial geometries using DuckDB.
3.  Explore patterns in Northern Virginia and across the US.

## 1. Setup and Data Loading

We use `ibis` to interact with DuckDB. DuckDB has a powerful `spatial` extension.

### Action Item 1: Setup Environment

> **Prompt your Agent:**
> "Import ibis, pandas, anymap, and geopandas. Connect to a local DuckDB instance and ensure the spatial extension is installed and loaded."

In [7]:
try:
    import anymap
except ModuleNotFoundError:
    import sys
    import subprocess
    try:
        import ensurepip
        ensurepip.bootstrap()
    except Exception:
        pass
    subprocess.check_call([sys.executable, "-m", "pip", "install", "anymap"])
    import anymap

import ibis
import pandas as pd
import geopandas as gpd

ibis.options.interactive = True

con = ibis.duckdb.connect()
con.raw_sql("INSTALL spatial")
con.raw_sql("LOAD spatial")

con

<ibis.backends.duckdb.Backend at 0x1f329cafc20>

### Action Item 2: Load Data

We need to load the data center locations from a public CSV file.

**URL:** `https://s3-west.nrp-nautilus.io/public-datacenters/data_centers.csv`

> **Prompt your Agent:**
> "Load the CSV file from the URL provided into an Ibis table. Inspect the first few rows and check the column names."

In [8]:
url = "https://s3-west.nrp-nautilus.io/public-datacenters/data_centers.csv"
data_centers = con.sql(f"SELECT * FROM read_csv_auto('{url}')")

(data_centers.head(5), data_centers.columns)

(┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━
 ┃[1m [0m[1mprovider[0m[1m           [0m[1m [0m┃[1m [0m[1mregion_name[0m[1m                   [0m[1m [0m┃[1m [0m[1mtype[0m[1m        [0m[1m [0m┃[1m [0m[1mmetro[0m[1m       [0m[1m [0m┃[1m [0m[1mcountry[0m[1m  [0m[1m [0m┃[1m [0m[1mlatitude[0m[1m  [0m[1m [0m┃[1m [0m[1mlon[0m
 ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━
 │ [2mstring[0m              │ [2mstring[0m                         │ [2mstring[0m       │ [2mstring[0m       │ [2mstring[0m    │ [2mstring[0m     │ [2mflo[0m
 ├─────────────────────┼────────────────────────────────┼──────────────┼──────────────┼───────────┼────────────┼────
 │ [32mHuawei Cloud       [0m │ [32mLA Buenos Aires               [0m │ [32mCloud Region[0m │ [32mBuenos Aires[0m │ [32mArgentina[0m │ [32m-

## 2. Creating Spatial Points

Latitude and Longitude are just numbers. To do spatial analysis, we need to convert them into **Geometries** (Points).

### Action Item 3: Create Geometries

> **Prompt your Agent:**
> "Create a new column `geom` by converting the `longitude` and `latitude` columns into points using the appropriate spatial function. Make sure to cast latitude/longitude to float if necessary. Filter the data to only include data centers in the 'United States'."

In [9]:
us_data_centers = (
    data_centers
    .mutate(
        geom=data_centers.longitude.cast("float").point(
            data_centers.latitude.cast("float")
        )
    )
    .filter(data_centers.country == "United States")
)

us_data_centers.head(5)

## 3. Visualize with AnyMap

`anymap` allows us to visualize thousands of points interactively.

### Action Item 4: Interactive Map

> **Prompt your Agent:**
> "Convert the Ibis table to a GeoDataFrame. Use `anymap` to create an interactive map of the data center locations. Save the map as '01-data_centers.html' and display it."

In [10]:
from IPython.display import HTML, display

us_gdf = us_data_centers.execute()
us_gdf = gpd.GeoDataFrame(us_gdf, geometry="geom", crs="EPSG:4326")

def _zones_to_count(value):
    if value is None:
        return 1
    if isinstance(value, (list, tuple)):
        return max(len(value), 1)
    if isinstance(value, str):
        cleaned = [item for item in value.split(",") if item.strip()]
        return max(len(cleaned), 1)
    try:
        return max(int(value), 1)
    except Exception:
        return 1

us_gdf["size_value"] = us_gdf["zones"].apply(_zones_to_count)

top_providers = us_gdf["provider"].value_counts().head(5).index.tolist()
us_gdf["provider_group"] = us_gdf["provider"].where(us_gdf["provider"].isin(top_providers), "Other")

provider_colors = {
    "Amazon Web Services": "#003262",
    "Microsoft Azure": "#FDB515",
    "Google Cloud": "#3B7EA1",
    "Oracle Cloud": "#C4820E",
    "IBM Cloud": "#6CACE4",
    "Other": "#B3B3B3",
}

color_match = ["match", ["get", "provider_group"]]
for provider, color in provider_colors.items():
    color_match.extend([provider, color])
color_match.append("#B3B3B3")

m = anymap.Map(center=[39, -98], zoom=4, height="600px")
m.add_vector(
    us_gdf,
    layer_type="circle",
    paint={
        "circle-radius": ["interpolate", ["linear"], ["get", "size_value"], 1, 3, 3, 6, 6, 9],
        "circle-color": color_match,
        "circle-opacity": 0.85,
    },
    name="us_data_centers",
)

m.to_html("01-data_centers.html", title="US Data Centers")

legend_items = "".join(
    f"<div><span style='display:inline-block;width:12px;height:12px;background:{color};margin-right:6px;'></span>{provider}</div>"
    for provider, color in provider_colors.items()
)

display(
    HTML(
        f"""
        <div style='font-family: Arial; font-size: 12px; line-height: 1.4;'>
          <div style='font-weight: 600; margin-bottom: 4px;'>Owner (top providers)</div>
          {legend_items}
        </div>
        """
    )
)

m

<anymap.maplibre.MapLibreMap object at 0x000001F329F94320>