# Vega, Ibis, and OmniSci Performance

In this notebook we will show two charts. The first generally works, albeit is a bit slow. The second is basically inoperable because of performance issues.

I believe these performance issues are primarily due to two limitations in Vega currently:

1. Each transform in the dataflow graph is executed syncronously. Ideally, we should be able to parallilize the database queries launched by each transform.
2. The UI blocks while waiting for an async transform to complete. This isn't noticeable normally in Vega, but when running all the transforms takes multiple seconds, it makes scrolling and panning basically inoperable.

We will use Jaeger / OpenTracing to look at the timing of the various events to understand the performance.

## Setup

Before launching these, first open the "Jager UI" in a new window, so traces will be collected. You can do this by going to `./jaeger` instead of `./lab` or by clicking the `Jaeger` button in the JupyterLab launcher.

## Time Series Chart

1. Run these cells to create a chart

In [1]:
import altair as alt
import ibis_vega_transform
import ibis.mapd


conn = ibis.mapd.connect(
    host='metis.mapd.com', user='mapd', password='HyperInteractive',
    port=443, database='mapd', protocol= 'https'
)

In [2]:
t = conn.table("flights_donotmodify")

states = alt.selection_multi(fields=['origin_state'])
airlines = alt.selection_multi(fields=['carrier_name'])

# Copy default from 
# https://github.com/vega/vega-lite/blob/8936751a75c3d3713b97a85b918fb30c35262faf/src/selection.ts#L281
# but add debounce
# https://vega.github.io/vega/docs/event-streams/#basic-selectors

DEBOUNCE_MS = 400

dates = alt.selection_interval(
    fields=['dep_timestamp'],
    encodings=['x'],
    on=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    translate=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    zoom=False
)

HEIGHT = 800
WIDTH = 1000

count_filter = alt.Chart(
    t[t.dep_timestamp, t.depdelay, t.origin_state, t.carrier_name],
    title="Selected Rows"
).transform_filter(
    airlines
).transform_filter(
    dates
).transform_filter(
    states
).mark_text().encode(
    text='count()'
)

count_total = alt.Chart(
    t,
    title="Total Rows"
).mark_text().encode(
    text='count()'
)

flights_by_state = alt.Chart(
    t[t.origin_state, t.carrier_name, t.dep_timestamp],
    title="Total Number of Flights by State"
).transform_filter(
    airlines
).transform_filter(
    dates
).mark_bar().encode(
    x='count()',
    y=alt.Y('origin_state', sort=alt.Sort(encoding='x', order='descending')),
    color=alt.condition(states, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
).add_selection(
    states
).properties(
    height= 2 * HEIGHT / 3,
    width=WIDTH / 2
) + alt.Chart(
    t[t.origin_state, t.carrier_name, t.dep_timestamp],
).transform_filter(
    airlines
).transform_filter(
    dates
).mark_text(dx=20).encode(
    x='count()',
    y=alt.Y('origin_state', sort=alt.Sort(encoding='x', order='descending')),
    text='count()'
).properties(
    height= 2 * HEIGHT / 3,
    width=WIDTH / 2
)

carrier_delay = alt.Chart(
    t[t.depdelay, t.arrdelay, t.carrier_name, t.origin_state, t.dep_timestamp],
    title="Carrier Departure Delay by Arrival Delay (Minutes)"
).transform_filter(
    states
).transform_filter(
    dates
).transform_aggregate(
    depdelay='mean(depdelay)',
    arrdelay='mean(arrdelay)',
    groupby=["carrier_name"]
).mark_point(filled=True, size=200).encode(
    x='depdelay',
    y='arrdelay',
    color=alt.condition(airlines, alt.ColorValue("steelblue"), alt.ColorValue("grey")),
    tooltip=['carrier_name', 'depdelay', 'arrdelay']
).add_selection(
    airlines
).properties(
    height=2 * HEIGHT / 3,
    width=WIDTH / 2
) + alt.Chart(
    t[t.depdelay, t.arrdelay, t.carrier_name, t.origin_state, t.dep_timestamp],
).transform_filter(
    states
).transform_filter(
    dates
).transform_aggregate(
    depdelay='mean(depdelay)',
    arrdelay='mean(arrdelay)',
    groupby=["carrier_name"]
).mark_text().encode(
    x='depdelay',
    y='arrdelay',
    text='carrier_name',
).properties(
    height=2 * HEIGHT / 3,
    width=WIDTH / 2
)

time = alt.Chart(
    t[t.dep_timestamp, t.depdelay, t.origin_state, t.carrier_name],
    title='Number of Flights by Departure Time'
).transform_filter(
    'datum.dep_timestamp != null'
).transform_filter(
    airlines
).transform_filter(
    states
).mark_line().encode(
    alt.X(
        'yearmonthdate(dep_timestamp):T',
    ),
    alt.Y(
        'count():Q',
        scale=alt.Scale(zero=False)
    )
).add_selection(
    dates
).properties(
    height=HEIGHT / 3,
    width=WIDTH + 50
)

(
    (count_filter | count_total) &
    (flights_by_state | carrier_delay) &
    time
).configure_axis(
    grid=False
).configure_view(
    strokeOpacity=0
)

alt.VConcatChart(...)

1. Wait for it to render
2. Reload the Jaeger UI page
3. Select the "kernel" service
4. Select "Find Traces"
5. Select the first trace.
6. Now you should be able to see that each transform happens syncronously.
7. If you click on each trace, you should also be able to see logs, including the original Vega Lite spec, the original Vega Spec, and the transformed Vega spec.

If filter based on the top charts, things seem to work OK, even though the UI is a bit slow.

However, if you try to filter based on the bottom chart, by clicking and dragging, you will see it does work, but the UI is not ideal, because you can't see your selection until it finishes getting the data.  Ideally, it would show your current time selectiona show some sort of loading UI in the other sections.

## Geospatial Chart

Now we will try to render a geospatial chart, by binning by pixel: 

In [3]:
t2 = conn.table("tweets_nov_feb")
x, y = t2.goog_x, t2.goog_y

WIDTH = 385
HEIGHT = 564
X_DOMAIN = [
    -3650484.1235206556,
    7413325.514451755
]
Y_DOMAIN = [
    -5778161.9183506705,
    10471808.487466192
]

DEBOUNCE_MS = 100

scales = alt.selection_interval(
    bind='scales',
    on=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    translate=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    zoom=f'wheel!{{0, {DEBOUNCE_MS}}}',
)

alt.Chart(t2[x, y], width=WIDTH, height=HEIGHT).mark_rect().encode(
    alt.X(
        'bin_x:Q',
        bin=alt.Bin(binned=True),
        title='goog_x',
        scale=alt.Scale(domain=X_DOMAIN)
    ),
    alt.X2('bin_x_end'),
    alt.Y(
        'bin_y:Q',
        bin=alt.Bin(binned=True),
        title='goog_y',
        scale=alt.Scale(domain=Y_DOMAIN)
    ),
    alt.Y2('bin_y_end'),
    tooltip='count()'
).add_selection(
    scales
).transform_filter(
    scales
).transform_bin(
    'bin_x',
    'goog_x',
    bin=alt.Bin(maxbins=WIDTH)
).transform_bin(
    'bin_y',
    'goog_y',
    bin=alt.Bin(maxbins=HEIGHT)
)

alt.Chart(...)

Now try to drag this to pan around.

You will notice a few things. First, it actually will does work, but it takes so long to move that it's hard to control. Second, It seems like the initial bin is different than the later bins.