# What if millions of points are not enough?

## Vaex: Visualization and eXploration

https://github.com/maartenbreddels/vaex

A data decimation engine for Jupyter interactive widgets, by **Maarten Breddels**


 - Vaex is a an Out-of-Core DataFrames (similar to Pandas) to visualize and explore big tabular datasets.
 
 - It can calculate statistics such as `mean`, `sum`, `count`, standard deviation etc, on an N-dimensional grid up to **a billion (109) objects/rows per second**.
 - Visualization is done using histograms, density plots and 3-D volume rendering, allowing interactive exploration of big data.
 
Vaex uses **memory mapping**, **zero memory copy policy** and **lazy computations** for best performance (no memory wasted).

## Example 1: The NYC taxi dataset + ipyleaflet

25 Gb in a single hdf5 file lying on my my local hard drive.

In [None]:
import vaex
import numpy as np
np.warnings.filterwarnings('ignore')
dstaxi = vaex.open('nyc_taxi2015.hdf5') # mmapped, doesn't cost extra memory

In [None]:
dstaxi.plot_widget("pickup_longitude", "pickup_latitude", f="log", backend="ipyleaflet", shape=600)

In [None]:
dstaxi.plot_widget("dropoff_longitude", "dropoff_latitude", f="log", backend="ipyleaflet",
                   z="dropoff_hour", type="slice", z_shape=24, shape=400, z_relative=True,
                   limits=[None, None, (-0.5, 23.5)])

## Example 2: Gaia dataset + bqplot

In [None]:
ds = vaex.open('helmi-dezeeuw-2000-FeH.hdf5')

In [None]:
ds.plot_widget("x", "y", f="log", limits=[-20, 20])

In [None]:
ds.plot_widget("Lz", "E", f="log")