Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
121 lines (90 sloc) 2.93 KB

Plotting

using Pkg, Random
Pkg.add("StatPlots")
Pkg.add("GR")
using StatPlots
ENV["GKSwstype"] = "100"
gr()
Random.seed!(1234)  # set random seed to get consistent plots

StatPlots

JuliaDB has all access to all the power and flexibility of Plots via StatPlots and the @df macro.

using JuliaDB, StatPlots

t = table((x = randn(100), y = randn(100)))

@df t scatter(:x, :y)
savefig("statplot.png"); nothing # hide

Plotting Big Data

For large datasets, it isn't feasible to render every data point. The OnlineStats package provides a number of data structures for big data visualization that can be created via the reduce and groupreduce functions.

  • Example data:
using JuliaDB, Plots, OnlineStats

x = randn(10^6)
y = x + randn(10^6)
z = x .> 1
z2 = (x .+ y) .> 0
t = table((x=x, y=y, z=z, z2=z2))

Mosaic Plots

A mosaic plot visualizes the bivariate distribution of two categorical variables.

o = reduce(Mosaic(Bool, Bool), t; select = (3, 4))
plot(o)
png("mosaic.png"); nothing  # hide

Histograms

grp = groupreduce(Hist(-5:.5:5), t, :z, select = :x)
plot(plot.(select(grp, 2))...; link=:all)
png("hist.png"); nothing # hide

grp = groupreduce(KHist(20), t, :z, select = :x)
plot(plot.(select(grp, 2))...; link = :all)
png("hist2.png"); nothing # hide

Partition and IndexedPartition

  • Partition(stat, n) summarizes a univariate data stream.
    • The stat is fitted over n approximately equal-sized pieces.
  • IndexedPartition(T, stat, n) summarizes a bivariate data stream.
    • The stat is fitted over n pieces covering the domain of another variable of type T.
o = reduce(Partition(KHist(10), 50), t; select=:y)
plot(o)
png("partition.png"); nothing # hide

o = reduce(IndexedPartition(Float64, KHist(10), 50), t; select=(:x, :y))
plot(o)
png("partition2.png"); nothing # hide

GroupBy

o = reduce(GroupBy{Bool}(KHist(20)), t; select = (:z, :x))
plot(o)
png("groupby.png"); nothing # hide

Convenience function for Partition and IndexedPartition

You can also use the partitionplot function, a slightly less verbose way of plotting Partition and IndexedPartition objects.

# x by itself
partitionplot(t, :x, stat = Extrema())
savefig("partitionplot1.png"); nothing # hide

# y by x, grouped by z
partitionplot(t, :x, :y, stat = Extrema(), by = :z)
savefig("partitionplot2.png"); nothing # hide