# Client-side Lonboard in WebAssembly

[Pyodide](https://pyodide.org/) is a project to compile Python to WebAssembly for use in the browser. [JupyterLite](https://jupyterlite.readthedocs.io/en/stable/) is a version of JupyterLab that uses Pyodide, so it runs entirely in the browser. You're likely reading this notebook from inside JupyterLite right now!

As of v0.10, Lonboard works in Pyodide! This notebook is a port of the [DataFilterExtension notebook](https://developmentseed.org/lonboard/latest/examples/data-filter-extension/) to run in Pyodide. It uses [arro3](https://github.com/kylebarron/arro3) and [geoarrow-rust](https://geoarrow.org/geoarrow-rs/python/latest/) instead of pandas, GeoPandas, and pyarrow. These dependencies work in Pyodide and are more memory efficient.

This notebook is still using quite a lot of data! It worked on my laptop, but it's possible it could crash your browser tab!

## Dependencies

Any non-pure-Python libraries need to be installed from special wheels instead of directly from PyPI. Some of Lonboard's dependencies (e.g. `arro3`) and related libraries (`geoarrow.rust`) have portions of compiled code.

The below wheels are not yet included in the pyodide distribution as of September 4, 2024, so we install them from specific URLs in an S3 bucket we control. In the future (with the next pyodide release), these specific URLs will not be necessary.

In [1]:
import micropip

In [2]:
deps = [
    "https://ds-wheels.s3.amazonaws.com/arro3_core-0.3.0-cp312-cp312-emscripten_3_1_58_wasm32.whl",
    "https://ds-wheels.s3.amazonaws.com/arro3_compute-0.3.0-cp312-cp312-emscripten_3_1_58_wasm32.whl",
    "https://ds-wheels.s3.amazonaws.com/arro3_io-0.3.0-cp312-cp312-emscripten_3_1_58_wasm32.whl",
    "https://ds-wheels.s3.amazonaws.com/geoarrow_rust_core-0.3.0b1-cp38-abi3-emscripten_3_1_58_wasm32.whl",
    "palettable",
    "matplotlib",
    "lonboard==0.10.0b2"
]
await micropip.install(deps)

## Imports

In [3]:
from io import BytesIO
from pathlib import Path

import numpy as np
import requests
from arro3.core import ChunkedArray, fixed_size_list_array
from arro3.io import read_parquet
from geoarrow.rust.core import ChunkedGeometryArray, centroid, from_wkt
from ipywidgets import FloatRangeSlider, jsdlink
from lonboard import Map, ScatterplotLayer
from lonboard.colormap import apply_continuous_cmap
from lonboard.controls import MultiRangeSlider
from lonboard.layer_extension import DataFilterExtension
from palettable.colorbrewer.diverging import BrBG_10
from pyodide.http import pyfetch

## Fetch data

Similarly to the upstream example, we'll use Parquet data directly from S3. It's not currently possible to compile async Rust-Python code for use in Pyodide, so we need to fetch the entire Parquet file content first, and then parse it into an Arrow table:

In [4]:
url = "https://ookla-open-data.s3.us-west-2.amazonaws.com/parquet/performance/type=mobile/year=2019/quarter=1/2019-01-01_performance_mobile_tiles.parquet"
local_path = Path("data-filter-extension.parquet")
r = requests.get(url)

Then parse this into an Arrow table using arro3:

In [5]:
table = read_parquet(BytesIO(r.content)).read_all()
table

arro3.core.Table
-----------
quadkey: Utf8
tile: Utf8
avg_d_kbps: Int64
avg_u_kbps: Int64
avg_lat_ms: Int64
tests: Int64
devices: Int64

In Pyodide, we're very memory constrained, and when working with large data, we should delete old references as soon as we no longer need them:

In [6]:
del r

Parse the Well-known Text geometry in the `"tile"` column to GeoArrow geometries:

(note, this list comprehension is a hack, `from_wkt` will be updated to support chunked input)

In [7]:
geometry = ChunkedArray([from_wkt(chunk) for chunk in table["tile"].chunks])

Then compute the centroid of each of these geometries:

In [8]:
centroids = centroid(geometry)
del geometry

Now create a new table with our geometry column and with specific attribute columns:

In [9]:
geo_table = table.select(["avg_d_kbps", "avg_u_kbps", "avg_lat_ms"]).append_column("geometry", centroids)
geo_table
del centroids, table

Now we create a DataFilterExtension, just like the upstream notebook.

In [10]:
filter_extension = DataFilterExtension(filter_size=3)

In [11]:
avg_d_kbps = geo_table["avg_d_kbps"].to_numpy()
avg_u_kbps = geo_table["avg_u_kbps"].to_numpy()
avg_lat_ms = geo_table["avg_lat_ms"].to_numpy()

In [12]:
min_bound = 5000
max_bound = 50000
normalized_download_speed = (avg_d_kbps - min_bound) / (max_bound - min_bound)
fill_color = apply_continuous_cmap(normalized_download_speed, BrBG_10)
radius = normalized_download_speed * 200

In [13]:
filter_values = np.column_stack(
    [avg_d_kbps, avg_u_kbps, avg_lat_ms]
)
initial_filter_range = [
    [10_000, 50_000],
    [1000, 10_000],
    [0, 100],
]

We create a ScatterplotLayer with the filter extension applied

In [14]:
layer = ScatterplotLayer(
    table=geo_table,
    extensions=[filter_extension],
    get_fill_color=fill_color,
    get_radius=radius,
    get_filter_value=filter_values,
    filter_range=initial_filter_range,
    radius_units="meters",
    radius_min_pixels=0.9,
)
m = Map(layer)
m

  warn(


Map(custom_attribution='', layers=(ScatterplotLayer(extensions=(DataFilterExtension(filter_size=3),), filter_r…

Now we create our sliders to manage filter state between Python and JavaScript:

In [15]:
download_slider = FloatRangeSlider(value=initial_filter_range[0], min=0, max=70_000, step=0.1, description="Download: ")
upload_slider = FloatRangeSlider(value=initial_filter_range[1], min=0, max=50_000, step=1, description="Upload: ")
latency_slider = FloatRangeSlider(value=initial_filter_range[2], min=0, max=500, step=1, description="Latency: ")
multi_slider = MultiRangeSlider([download_slider, upload_slider, latency_slider])
jsdlink((multi_slider, "value"), (layer, "filter_range"))
multi_slider

MultiRangeSlider(children=(FloatRangeSlider(value=(10000.0, 50000.0), description='Download: ', max=70000.0), …