Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document Mosaic #1012

Merged
merged 6 commits into from Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
82 changes: 82 additions & 0 deletions docs/lib/mosaic.md
@@ -0,0 +1,82 @@
---
sql:
trips: nyc-taxi.parquet
---

# Mosaic

[Mosaic](https://uwdata.github.io/mosaic/) is a system for linking data visualizations, tables, and input widgets, all leveraging a database (DuckDB) for scalable processing. With Mosaic, you can interactively visualize and explore millions —and even billions— of data points.

The example below shows 1 million taxi rides pickup and dropoff points in New York City on Jan 1–3, 2010. The dataset is stored in a 8MB [Apache Parquet](./arrow#apache-parquet) file, generated with a data loader.

${maps}

${histogram}

The map views for pickups and dropoffs are coordinated. You can also select an interval in the histogram to filter the maps. _What spatial patterns can you find?_

---

The code below creates three views, coordinated by Mosaic’s [crossfilter](https://uwdata.github.io/mosaic/api/core/selection.html#selection-crossfilter) helper.

```js echo
// Create a shared filter
const $filter = vg.Selection.crossfilter();

// Create the maps
const defaultAttributes = [
vg.width(335),
vg.height(550),
vg.margin(0),
vg.xAxis(null),
vg.yAxis(null),
vg.xDomain([297000, 297000 + 28.36 * 335]),
vg.yDomain([57900, 57900 + 28.36 * 550]), // ensure aspect ratio of 1
vg.colorScale("symlog")
];

const maps = vg.hconcat(
Fil marked this conversation as resolved.
Show resolved Hide resolved
vg.plot(
vg.raster(vg.from("trips", {filterBy: $filter}), {x: "px", y: "py", imageRendering: "pixelated"}),
vg.intervalXY({as: $filter}),
vg.text([{label: "Taxi Pickups"}], {
dx: 10,
dy: 10,
text: "label",
fill: "black",
fontSize: "1.2em",
frameAnchor: "top-left"
}),
...defaultAttributes,
vg.colorScheme("blues")
),
vg.hspace(10),
vg.plot(
vg.raster(vg.from("trips", {filterBy: $filter}), {x: "dx", y: "dy", imageRendering: "pixelated"}),
vg.intervalXY({as: $filter}),
vg.text([{label: "Taxi Dropoffs"}], {
dx: 10,
dy: 10,
text: "label",
fill: "black",
fontSize: "1.2em",
frameAnchor: "top-left"
}),
...defaultAttributes,
vg.colorScheme("oranges")
)
);

// Create the histogram
const histogram = vg.plot(
vg.rectY(vg.from("trips"), {x: vg.bin("time"), y: vg.count(), fill: "steelblue", inset: 0.5}),
vg.intervalX({as: $filter}),
vg.yTickFormat("s"),
vg.xLabel("Pickup Hour"),
vg.yLabel("Number of Rides"),
vg.width(680),
vg.height(100)
);
```

For more Mosaic examples, see the [Mosaic + Framework](https://uwdata.github.io/mosaic-framework-example/) website.
Binary file added docs/lib/nyc-taxi.parquet
Binary file not shown.
25 changes: 25 additions & 0 deletions docs/lib/nyc-taxi.parquet.sh
@@ -0,0 +1,25 @@
duckdb :memory: << EOF
Copy link
Member

@mbostock mbostock Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a shebang? One of these probably…

Suggested change
duckdb :memory: << EOF
#!/usr/bin/env bash
duckdb :memory: << EOF
Suggested change
duckdb :memory: << EOF
#!/bin/sh
duckdb :memory: << EOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; shebang are only used for .exe?

Copy link
Member

@mbostock mbostock Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so even if you have a shebang, it’s ignored because we explicitly run it with sh? If it’s not ignored, we should add one just to document our expectations around how the script is interpreted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sh ignores the shebang and interprets the rest of the code directly

❯ chmod +x ./test.py
❯ cat ./test.py
#! /usr/bin/env python3
print("hello, world")
❯ bash ./test.py
./test.py: line 2: syntax error near unexpected token `"hello, world"'
./test.py: line 2: `print("hello, world")'
❯ ./test.py
hello, world

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for educating me. 😄

-- Load spatial extension
INSTALL spatial; LOAD spatial;

-- Project coordinates, following the example at https://github.com/duckdb/duckdb_spatial
CREATE TEMP TABLE rides AS SELECT
pickup_datetime::TIMESTAMP AS datetime,
ST_Transform(ST_Point(pickup_latitude, pickup_longitude), 'EPSG:4326', 'EPSG:32118') AS pick,
ST_Transform(ST_Point(dropoff_latitude, dropoff_longitude), 'EPSG:4326', 'EPSG:32118') AS drop
FROM 'https://uwdata.github.io/mosaic-datasets/data/nyc-rides-2010.parquet';

-- Write output parquet file
COPY (SELECT
HOUR(datetime) + MINUTE(datetime) / 60 AS time,
ST_X(pick)::INTEGER AS px, -- extract pickup x-coord
ST_Y(pick)::INTEGER AS py, -- extract pickup y-coord
ST_X(drop)::INTEGER AS dx, -- extract dropff x-coord
ST_Y(drop)::INTEGER AS dy -- extract dropff y-coord
FROM rides
ORDER BY 2,3,4,5,1 -- optimize output size by sorting
) TO 'trips.parquet' (COMPRESSION 'ZSTD', row_group_size 10000000);
EOF

cat trips.parquet >&1 # Copy payload to stdout
rm trips.parquet # Clean up
1 change: 1 addition & 0 deletions observablehq.config.ts
Expand Up @@ -70,6 +70,7 @@ export default {
{name: "Mapbox GL JS", path: "/lib/mapbox-gl"},
{name: "Mermaid", path: "/lib/mermaid"},
{name: "Microsoft Excel (XLSX)", path: "/lib/xlsx"},
{name: "Mosaic", path: "/lib/mosaic"},
{name: "Observable Generators", path: "/lib/generators"},
{name: "Observable Inputs", path: "/lib/inputs"},
{name: "Observable Plot", path: "/lib/plot"},
Expand Down