JuliaComputing/JuliaDB.jl

Fetching contributors…
Cannot retrieve contributors at this time
121 lines (90 sloc) 2.93 KB

Plotting

``````using Pkg, Random
Pkg.add("StatPlots")
Pkg.add("GR")
using StatPlots
ENV["GKSwstype"] = "100"
gr()
Random.seed!(1234)  # set random seed to get consistent plots
``````

StatPlots

JuliaDB has all access to all the power and flexibility of Plots via StatPlots and the `@df` macro.

``````using JuliaDB, StatPlots

t = table((x = randn(100), y = randn(100)))

@df t scatter(:x, :y)
savefig("statplot.png"); nothing # hide
``````

Plotting Big Data

For large datasets, it isn't feasible to render every data point. The OnlineStats package provides a number of data structures for big data visualization that can be created via the `reduce` and `groupreduce` functions.

• Example data:
``````using JuliaDB, Plots, OnlineStats

x = randn(10^6)
y = x + randn(10^6)
z = x .> 1
z2 = (x .+ y) .> 0
t = table((x=x, y=y, z=z, z2=z2))
``````

Mosaic Plots

A mosaic plot visualizes the bivariate distribution of two categorical variables.

``````o = reduce(Mosaic(Bool, Bool), t; select = (3, 4))
plot(o)
png("mosaic.png"); nothing  # hide
``````

Histograms

``````grp = groupreduce(Hist(-5:.5:5), t, :z, select = :x)
plot(plot.(select(grp, 2))...; link=:all)
png("hist.png"); nothing # hide
``````

``````grp = groupreduce(KHist(20), t, :z, select = :x)
plot(plot.(select(grp, 2))...; link = :all)
png("hist2.png"); nothing # hide
``````

Partition and IndexedPartition

• `Partition(stat, n)` summarizes a univariate data stream.
• The `stat` is fitted over `n` approximately equal-sized pieces.
• `IndexedPartition(T, stat, n)` summarizes a bivariate data stream.
• The `stat` is fitted over `n` pieces covering the domain of another variable of type `T`.
``````o = reduce(Partition(KHist(10), 50), t; select=:y)
plot(o)
png("partition.png"); nothing # hide
``````

``````o = reduce(IndexedPartition(Float64, KHist(10), 50), t; select=(:x, :y))
plot(o)
png("partition2.png"); nothing # hide
``````

GroupBy

``````o = reduce(GroupBy{Bool}(KHist(20)), t; select = (:z, :x))
plot(o)
png("groupby.png"); nothing # hide
``````

Convenience function for Partition and IndexedPartition

You can also use the `partitionplot` function, a slightly less verbose way of plotting `Partition` and `IndexedPartition` objects.

``````# x by itself
partitionplot(t, :x, stat = Extrema())
savefig("partitionplot1.png"); nothing # hide
``````

``````# y by x, grouped by z
partitionplot(t, :x, :y, stat = Extrema(), by = :z)
savefig("partitionplot2.png"); nothing # hide
``````