Skip to content
forked from vega/falcon

Cross-filter millions (or even billions) of data entries with no interaction delay

License

Notifications You must be signed in to change notification settings

cmudig/falcon-vis

 
 

npm version Tests code style: prettier

FalconVis is a JavaScript library that links your own custom visualizations at scale! We also support a variety of data formats for different scales of data (e.g., Apache Arrow, DuckDB WASM, backend servers, and more).

You can cross-filter billions of data entries in the browser with no interaction delay by using the Falcon data index.

FalconVis was created by Donny Bertucci and Dominik Moritz because the previous implementation (vega/falcon) could not be used as a library or with custom visualizations.

Table of Contents

Examples

Github Pages

Data Type Count Live Demo
Movies Arrow 3k Click to open on Github Pages
Movies JSON 3k Click to open on Github Pages
Movies DuckDB WASM 3k Click to open on Github Pages
Flights (with US Map) DuckDB WASM 3m Click to open on Github Pages
Flights (comparison with crossfilter fork) DuckDB WASM 3m Click to open on Github Pages
Flights (comparison with crossfilter fork) HeavyAI 7m Click to open on Github Pages
Flights (comparison with crossfilter fork) HeavyAI 196m Click to open on Github Pages

ObservableHQ

Data Type Count Live Demo
Flights Arrow 1m Click to open on ObservableHQ
Flights DuckDB WASM 3m Click to open on ObservableHQ
Flights DuckDB WASM 10m Click to open on ObservableHQ

Other

Data Type Count Live Demo
Flights (with US Map) HTTP to DuckDB Python 10m Click to open on HuggingFace🤗 Spaces

Usage

Install FalconVis via npm.

npm install falcon-vis

Data

Before you filter your data, you need to tell FalconVis about your data.

FalconVis currently supports JavaScript objects, Apache Arrow tables, DuckDB WASM, and HTTP GET Requests. For different data sizes, or if you want the computation to take place in the browser, different data types are recommended.

DB Recommended Data Size Memory/Computation Description
JsonDB up to 500k Browser Takes JavaScript object
ArrowDB up to 1m Browser Takes Apache Arrow table
DuckDB up to 10m Browser Queries DuckDB WASM database
HeavyaiDB whatever your backend can handle Backend Queries HeavyAI database connection
HttpDB whatever your backend can handle Backend Sends GET request to a backend server (sends SQL queries and expects arrow tables in response)

They are all typed as FalconDB.

import { JsonDB, ArrowDB, DuckDB, HttpDB } from "falcon-vis";

Linking Views

First initialize the FalconVis instance with your data. I will use the ArrowDB for this example for the 1M flights dataset.

import { tableFromIPC } from "@apache-arrow";
import { FalconVis, ArrowDB } from "falcon-vis";

// load the flights-1m.arrow data into memory
const buffer = await (await fetch("data/flights-1m.arrow")).arrayBuffer();
const arrowTable = await tableFromIPC(buffer);

// initialize the falcon instance with the data
const db = new ArrowDB(arrowTable);
const falcon = new FalconVis(db);

Next, create views that contain the data dimension and what happens when the cross-filtered counts change (onChange). FalconVis supports 0D and 1D views.

Note that your specified onChange function is called every time the cross-filtered counts change so that you can update your visualization with the new filtered counts.

Distance View

dist

const distanceView = await falcon.view1D({
	type: "continuous",
	name: "Distance",
	bins: 25,
	resolution: 400,
});
distanceView.onChange((counts) => {
	updateDistanceBarChart(counts);
});

Arrival Delay View

delay

const arrivalDelayView = await falcon.view1D({
	type: "continuous",
	name: "ArrDelay",
	range: [-20, 140],
	bins: 25,
	resolution: 400,
});
arrivalDelay.onChange((counts) => {
	updateDelayBarChart(counts);
});

Total Count

Screenshot 2023-05-20 at 5 32 33 PM
const countView = await falcon.view0D();
countView.onChange((count) => {
	updateCount(count);
});

Link the views together to fetch the initial counts (outputs are shown above).

await falcon.link();

Cross-Filtering Views

Now, you can cross-filter the views by calling .select() on a view. FalconVis uses the Falcon data index to cross-filter the views.

Falcon works by activating a single view that you plan to interact with. In the background, we compute the Falcon data index when you activate a view. Then, when you .select() on an activated view, in we fetch the cross-filtered counts for the other views in constant time.

For Example

I directly .activate() the distanceView from before to prefetch the Falcon data index.

await distanceView.activate();

Then, I can apply a filter with .select([rangeStart, rangeEnd]) for continuous data

await distanceView.select([1000, 2000]); // 1k to 2k miles

Which automatically cross-filters and updates the counts for other views in constant time (onChange is called for each other view).

In the live example, you can take mouse events to call the select() with user selected filters as shown in the video

1m-demo.mov

API Reference

# class JsonDB(object)

Takes a JavaScript object and attaches FalconVis data index methods to it. Under the hood, it converts into a ArrowDB class.

The JsonDB supports row-wise or column-wise object formats, but it is recommended to use column-wise format because the row-wise format converts to column-wise with a copy.

Columns JSON Example

import { JsonDB } from "falcon-vis";

const columnarJson = {
	names: ["bob", "billy", "joe"],
	ages: [21, 42, 40],
};

const db = new JsonDB(columnarJson); // ⬅️

Rows JSON Example

import { JsonDB } from "falcon-vis";

const rowJson = [
	{ name: "bob", age: 21 },
	{ name: "billy", age: 42 },
	{ name: "joe", age: 40 },
];

const db = new JsonDB(rowJson); // ⬅️, but does a copy over rowJson


# class ArrowDB(table)

Takes an Apache Arrow Table created using the apache-arrow package and attaches FalconVis data index methods to it.

Example

import { ArrowDB } from "falcon-vis";
import { tableFromIPC } from "apache-arrow";

const buffer = await (await fetch("data/flights-1m.arrow")).arrayBuffer();
const table = await tableFromIPC(buffer);

const db = new ArrowDB(table); // ⬅️

Arrow Shorthand Example

import { ArrowDB } from "falcon-vis";

const db = await ArrowDB.fromArrowFile("data/flights-1m.arrow"); // ⬅️


# class DuckDB(duckdb, table)

Takes a @duckdb/duckdb-wasm db and table name within the db and attaches FalconVis data index methods to it.

Example

import { DuckDB } from "falcon-vis";
import * as duckdb from "@duckdb/duckdb-wasm";

// duckdb setup
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker = await duckdb.createWorker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const flightsDb = new duckdb.AsyncDuckDB(logger, worker);
await flightsDb.instantiate(bundle.mainModule, bundle.pthreadWorker);
const c = await flightsDb.connect();
// load parquet file into table called flights
await c.query(
	`CREATE TABLE flights
     AS SELECT * FROM parquet_scan('${window.location.href}/data/flights-1m.parquet')`
);
c.close();

const db = new DuckDB(flightsDb, "flights"); // ⬅️

Parquet Shorthand Example

If you just want to load one parquet file, you can use the shorthand method DuckDB.fromParquetFile().

import { DuckDB } from "falcon-vis";

const db = await DuckDB.fromParquetFile("data/flights-1m.parquet"); // ⬅️


# class HeavyaiDB(session, table)

Takes in a session from @heavyai/connector with a given table name.

Example

import { HeavyaiDB } from "falcon-vis";
import HeavyaiCon from "@heavyai/connector";

const connector = new HeavyaiCon();
const conn = {
	host: "your host url address",
	dbName: "db name",
	user: "user name",
	password: "password",
	protocol: "https",
	port: 443,
};
const connection = connector
	.protocol(conn.protocol)
	.host(conn.host)
	.port(conn.port)
	.dbName(conn.dbName)
	.user(conn.user)
	.password(conn.password);

const session = await connection.connectAsync();

const tableName = "flights";
const db = new HeavyaiDB(session, tableName); // ⬅️

Session Connection Shorthand

import { HeavyaiDB } from "falcon-vis";

const tableName = "flights";
const db = await HeavyaiDB.connectSession({
    host: "your host url address",
    dbName: "db name",
    user: "user name",
    password: "password",
    protocol: "https",
    port: 443
  }, tableName); // ⬅️


# class HttpDB(url, table, encodeQuery?)

HttpDB sends SQL queries (from table name) over HTTP GET to the url and hopes to receive an Apache Arrow table bytes in response.

encodeQuery is an optional parameter that encodes the SQL query before sending it over HTTP GET. By default it uses the encodeURIComponent function on the SQL query so that it can be sent in the url.

Example

import { HttpDB } from "falcon-vis";

const tableName = "flights";
const db = new HttpDB("http://localhost:8000", tableName); // ⬅️


# class FalconVis(db)

The main logic that orchestrates the cross-filtering between views.

Takes in the data (JsonDB, ArrowDB, DuckDB, HeavyaiDB, or HttpDB).

Example

import { FalconVis } from "falcon-vis";

// given a db: FalconDB
const falcon = new FalconVis(db); // ⬅️


# function falcon.view0D(onChangeCallback?)

Adds a 0D view onto an existing FalconVis instance named falcon and describes what to execute when the counts change.

Takes an onChangeCallback function that is called whenever the view count changes (after cross-filtering).

Returns a View0D instance (you can add more onChange callbacks to it later).

The onChangeCallback gives you access to the updated filtered count and total count of the rows (View0DState) object as a parameter.

interface View0DState {
	total: number | null;
	filter: number | null;
}

Example

import { FalconVis } from "falcon-vis";

const falcon = new FalconVis(db);

const countView = falcon.view0D((count) => {
	console.log(count.total, count.filter); // gets called every cross-filter
}); // ⬅️

Example multiple and disposable onChangeCallbacks

import { FalconVis } from "falcon-vis";

const falcon = new FalconVis(db);

// create view0D
const countView = falcon.view0D();
// add onChange callbacks
const disposeA = countView.onChange((count) => {
	console.log("A", count.total, count.filter);
}); // ⬅️
const disposeB = countView.onChange((count) => {
	console.log("B", count.total, count.filter);
}); // ⬅️

// then can be disposed later to stop listening for onChange
disposeA();
disposeB();


# function falcon.view1D(dimension, onChangeCallback?)

Adds a 1D view onto an existing FalconVis instance named falcon and describes what to execute when the counts change. A 1D view is a histogram of the data with counts per bin.

dimension is a Dimension object that defines which data column to use for the 1D view. (more info below)

Takes an onChangeCallback function that is called whenever the view count changes (after cross-filtering).

Returns a View1D instance (you can add more onChange callbacks to it later).

The dimension can be type: "categorical" for discrete values or type: "continuous" for ranged values.

A continuous Dimension can be defined as follows (with ? being optional parameters):

interface ContinuousDimension {
	/* continuous range of values */
	type: "continuous";

	/* column name in the data table */
	name: string;

	/**
	 * resolution of visualization brushing (e.g., histogram is 400px wide, resolution: 400)
	 * a smaller resolution than the brush will approximate the counts, but be faster
	 */
	resolution: number;

	/**
	 * max number of bins to create, the result could be less bins
	 *
	 * @default computed from the data using scotts rule
	 */
	bins?: number;

	/**
	 * forces the specified number bins to use exactly
	 * otherwise, will use the specified number of bins as a suggestion
	 *
	 * @default false
	 */
	exact?: boolean;

	/**
	 * [min, max] extent to limit the range of data values
	 * @default computed from the data
	 */
	range?: [number, number];

	/* should format for dates */
	time?: boolean;
}

A categorical dimension can be defined as follows:

interface CategoricalDimension {
	/* categorical values */
	type: "categorical";

	/* column name in the data table */
	name: string;

	/**
	 * categorical values to include
	 *
	 * @default computed from the data
	 */
	range?: string[];
}

The onChangeCallback gives you access to the updated counts per bin (View1DState) object as a parameter.

If the view is type continuous:

interface ContinuousView1DState {
	/* total counts per bin */
	total: Float64Array | null;
	/* filtered counts per bin */
	filter: Float64Array | null;
	/* continuous bins */
	bin: { binStart: number; binEnd: number }[] | null;
}

If the view is type categorical:

interface CategoricalView1DState {
	/* total counts per bin */
	total: Float64Array | null;
	/* filtered counts per bin */
	filter: Float64Array | null;
	/* categorical bin labels */
	bin: string[] | null;
}

Initialization

import { FalconVis } from "falcon-vis";

const falcon = new FalconVis(db);

// continuous
const distanceView = await falcon.view1D(
	{
		type: "continuous",
		name: "Distance",
		resolution: 400,
		bins: 25,
	},
	(counts) => {
		console.log(counts.total, counts.filter, counts.bin); // gets called every cross-filter
	}
); // ⬅️

// categorical
const originStateView = await falcon.view1D(
	{
		type: "categorical",
		name: "OriginState",
	},
	(counts) => {
		console.log(counts.total, counts.filter, counts.bin);
	}
); // ⬅️

Interaction

# function view.activate()

You must .activate() a view before .select()ing it. .activate() computes the Falcon index so that subsequent .select()s are fast (constant time). More details on the Falcon index can be found in the paper.

# function view.select(filter)

You can directly interact with you View1D (view) instance to filter the dimension and automatically cross-filter all other views on the same FalconVis instance.

You only have to call .activate() everytime before you interact with a new view, but only once!

The index changes when new filters are present, so if you .activate() a view, then .activate() a different view and filter that view, when you come back to the original view you have to call .activate() again.

Continuous view selection:

await distanceView.activate(); // compute Falcon index
await distanceView.select([0, 1000]); // filter to only flights with distance between 0 and 1000 miles
await distanceView.select([600, 800]); // change filter
await distanceView.select(); // deselect all

Categorical view selection:

await originStateView.activate(); // compute Falcon index
await originStateView.select(["CA", "PA", "OR"]); // select California, Pennsylvania, and Oregon
await originStateView.select(["FL"]); // change filter
await originStateView.select(); // deselect all

After each .select() the onChangeCallback will be called with the updated counts on all other views.


# function view.detach() Detach is how you remove your view from the FalconVis instance. Note that you directly call this on the view instance, not the FalconVis instance.


# function view.attach() Attach is how you add your view back onto the FalconVis instance. Note that you directly call this on the view instance, not the FalconVis instance.


# function falcon.link()

The link function takes the added views and links them together. This is required before cross-filtering.

link also initializes the counts for all views.

Call link whenever you add or remove views. Calling link once will suffice after adding (or removing) multiple views.

Example

import { FalconVis } from "falcon-vis";

const falcon = new FalconVis(db);

const distanceView = await falcon.view1D(
	{
		type: "continuous",
		name: "Distance",
		resolution: 400,
		bins: 25,
	},
	(counts) => {
		console.log(counts.total, counts.filter, counts.bin);
	}
);
const countView = falcon.view0D((count) => {
	console.log(count.total, count.filter);
});

await falcon.link(); // 🔗⬅️

Which then proceeds to call the onChangeCallback for each view with the initial counts. So you will see two console.logs from this particular example to start.


# function falcon.entries(location)

This gives you access to the filtered entries. So after cross-filtering you need to manually call this if you want to extract the filtered entries.

Takes a location defined by

interface Location {
	/* defaults to 0 */
	offset?: number;

	/* defaults to Infinity (all) */
	length?: number;
}

Where offset refers to the offset in the data table and length refers to the number of rows to return.

Note that offset refers to the filtered data table, so if you have a filter applied, the offset will be relative to the filtered data table.

Returns an Iterator over the entries in the data table as Iterable<Row> where Row is an object with key names corresponding to the column names in the data table.

Example

import { FalconVis } from "falcon-vis";

const falcon = new FalconVis(db);

const entries = await falcon.entries({
	offset: 0,
	length: 25,
}); // first 25 entries ⬅️

// print out first 25 distances
for (const entry of entries) {
	console.log(entry["Distance"]);
}

You can easily use offset to shift over 25, to then get the second 25 entries. (or by whatever amount you want).

const entries = await falcon.entries({
	offset: 25, // start after 25 entries
	length: 25,
}); // second 25 entries ⬅️

// print out second 25 distances
for (const entry of entries) {
	console.log(entry["Distance"]);
}

Languages

  • Jupyter Notebook 99.7%
  • Other 0.3%