# Partition Visualizations

`Partition` is designed for visualizing big data streams of any type.  Rather than plot every observation, `Partition` plots summaries of the data at each section.  

`Partition` plots behave similarly to the [`tabplot`](https://github.com/mtennekes/tabplot) R package, but:

1. **Data can be summarized by any `OnlineStat`**
1. **Data can grow to any size**
1. **Calculations can be done in parallel**

In [50]:
using OnlineStats, Plots
gr(palette = :viridis)

y = cumsum(randn(10^6)) + 100randn(10^6);

# Motivation

Consider the following histogram.  This provides information on the overall distribution of `y`.  The implicit assumption is that observations are independent and identically distributed.

In [10]:
plot(Series(y, Hist(100)))

If the distribution is changing over time (as it is in this case), there is a lot of information "left on the table".  However, we may not wish to plot every point, as there may be too many points to comprehend or render.  This is the use case of `Partition`:

**Use `Partition` when the bottleneck to plotting is rendering the image**

In [7]:
plot(Series(y, Partition(Hist(25))); legend=false)

# Tabplots example

Here we use OnlineStats/JuliaDB integration to recreate the `diamonds` dataset example from the tabplots README:

https://github.com/mtennekes/tabplot



In [58]:
# Load data into JuliaDB, sorted by carat
using JuliaDB
diamonds = loadtable("diamonds.csv"; indexcols = [:carat])

Table with 53940 rows, 10 columns:
[1mcarat  [22mcut          color  clarity  depth  table  price  x      y      z
───────────────────────────────────────────────────────────────────────────
0.2    "Premium"    "E"    "SI2"    60.2   62.0   345    3.79   3.75   2.27
0.2    "Premium"    "E"    "VS2"    59.8   62.0   367    3.79   3.77   2.26
0.2    "Premium"    "E"    "VS2"    59.0   60.0   367    3.81   3.78   2.24
0.2    "Premium"    "E"    "VS2"    61.1   59.0   367    3.81   3.78   2.32
0.2    "Premium"    "E"    "VS2"    59.7   62.0   367    3.84   3.8    2.28
0.2    "Ideal"      "E"    "VS2"    59.7   55.0   367    3.86   3.84   2.3
0.2    "Premium"    "F"    "VS2"    62.6   59.0   367    3.73   3.71   2.33
0.2    "Ideal"      "D"    "VS2"    61.5   57.0   367    3.81   3.77   2.33
0.2    "Very Good"  "E"    "VS2"    63.4   59.0   367    3.74   3.71   2.36
0.2    "Ideal"      "E"    "VS2"    62.2   57.0   367    3.76   3.73   2.33
0.2    "Premium"    "D"    "VS2"    62.3   60.0 

In [93]:
function tabplot(t, col; kw...)
    T = typeof(t[1][col])
    o = T <: Number ? Extrema() : CountMap(String)
    s = reduce(Partition(o, 75), t; select = col)
    plot(s; kw...)
end

tabplot (generic function with 1 method)

In [97]:
plot(tabplot.(diamonds, colnames(diamonds))...; 
    size = (1000, 2000), 
    title = hcat(colnames(diamonds)...),
    layout = (5, 2))