# Fall in love with Julia: Dataframes & Plots
![logo](logo-fall-in-love-with-julia.png)

Welcome to the fall-in-love-with-julia 101 series. My name is Stephan Sahm. You are always welcome to reach me at s.sahm@reply.de.

This time we focus on plots and dataframes.

# Julia basics - Multiple Dispatch

Let's start with recapping some julia basics

You can easily define your custom type hierarchies. Mind that only leaf-types like `struct`s can take values. Abstract types are merely tags which are used for overloading functions.

In [None]:
abstract type Pet end
struct Cat <: Pet
    name
end
struct Dog <: Pet
    name
end

In [None]:
kitty = Cat("Kitty")
leo = Cat("Leo")
rex = Dog("Rex")
charlie = Dog("Charlie")

You can define functions super easily

In [None]:
name(pet) = pet.name

You can specify types by annotating arguments with e.g. `::Cat` to ensure it is of type `Cat`.
This way you can overload/specialize the function to various specific methods.

This is the core of julia and is called Multiple Dispatch.

In [None]:
meet(pet1::Cat, pet2::Cat) = "$(name(pet1)) looks curiously at $(name(pet2))"
meet(pet1::Cat, pet2::Dog) = "$(name(pet1)) hisses at $(name(pet2))"
meet(pet1::Pet, pet2::Pet) = "$(name(pet1)) just looks at $(name(pet2))."

In [None]:
function pet_story(pet1, pet2)
    println("On a lovely autumn afternoon, $(name(pet1)) meets $(name(pet2))...")
    println(meet(pet1, pet2))
    println("and then goes away")
end

In [None]:
pet_story(kitty, charlie)

Julia comes with full-fledged support for Arrays and Matrices. If you already liked numpy, this takes it to the next level.

In [None]:
pets = [kitty, leo, rex, charlie]
name.(pets)

In [None]:
[(pet1, pet2, meet(pet1, pet2)) for pet1 in pets for pet2 in pets]

In [None]:
[meet(pet1, pet2) for pet1 in pets, pet2 in pets]

## it is your time:
Define a new Pet type Rabbit and let a duck meet a cat. 🐕 🐈 🐇

In [None]:
# go on here

# Plots

Let's jump into plotting things

In [None]:
using Plots  # top level go-to package for plotting in Julia, integrating with many many packages

In [None]:
n = 20
x = 1:n
y = rand(n)
x, y

In [None]:
plot(x, y)

In [None]:
x, y = range(0, 1, length = n), rand(n, 3)*10

In [None]:
plot(x, y, labels = ["one" "two" "three"], xlabel = "xlabel", ylabel = "αβγ",  title = "\$ \\nabla_p^q \\sum_{i = 0}^{n} \$")

You can easily access documentation for the attributes

In [None]:
?plot

In [None]:
plotattr(:Series)

In [None]:
plotattr("seriescolor")

You can also plot functions and let julia infer which concrete points need to be drawn

In [None]:
plot(sin, cos, 0, pi)

You can also add to an existing plot by using `plot!` with exclamation mark

In [None]:
plot!(sin, cos, pi, 2pi)

In [None]:
plot(1:0.1:10, [sin, cos, log], labels=["sin" "cos" "log"])

Some easy layouts are quickly done

In [None]:
x = 1:10
y = rand(10, 4)
plot(x, y, layout = (2, 2))

In [None]:
p1 = plot(x, y) # Make a line plot
p2 = scatter(x, y) # Make a scatter plot
p3 = plot(x, y, xlabel = "This one is labelled", lw = 3, title = "Subtitle")
p4 = plot(x, y, seriestype = :histogram) # Four histograms each with 10 points? Why not!
plot(p1, p2, p3, p4, layout = (2, 2), legend = false)

And a last fancy 3d animation using plots

In [None]:
using Plots
# define the Lorenz attractor
Base.@kwdef mutable struct Lorenz
    dt::Float64 = 0.02
    σ::Float64 = 10
    ρ::Float64 = 28
    β::Float64 = 8/3
    x::Float64 = 1
    y::Float64 = 1
    z::Float64 = 1
end

function step!(l::Lorenz)
    dx = l.σ * (l.y - l.x)
    l.x += l.dt * dx
    
    dy = l.x * (l.ρ - l.z) - l.y
    l.y += l.dt * dy
    
    dz = l.x * l.y - l.β * l.z
    l.z += l.dt * dz
end

attractor = Lorenz()


# initialize a 3D plot with 1 empty series
plt = plot3d(
    1,
    xlim = (-30, 30),
    ylim = (-30, 30),
    zlim = (0, 60),
    title = "Lorenz Attractor",
    marker = 2,
)

# build an animated gif by pushing new points to the plot, saving every 10th frame
@gif for i=1:1500
    step!(attractor)
    push!(plt, attractor.x, attractor.y, attractor.z)
end every 10

## it is your time: 

Try to plot a heart ❤️

In [None]:
# go on here

# Interact

You can also add interactivity very easily

In [None]:
using Interact
using Colors

In [None]:
@manipulate for r = 0:0.05:1, g = 0:0.05:1, b = 0:0.05:1
    HTML(string("<div style='color:#", hex(RGB(r,g,b)), "'>Color me</div>"))
end

In [None]:
@manipulate for n=1:25, g=[:scatter, :path], col=colorant"red"
    plot(rand(n), rand(n), linetype=g, color=col)
end

# DataFrames

Let's look at [DataFrames.jl](https://dataframes.juliadata.org/stable/)

It is very intuitive, much more intuitive than pandas.

You can create dataframes the way you would first guess

In [None]:
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

you can index into dataframes

In [None]:
df[:, :A]

In [None]:
df[2, :]

And also update them

In [None]:
df[:, :A] = [4,3,4,2]
df

In [None]:
df[1, :] = [100, "Z"]
df

.= gives you elementwise/broadcast behaviour

In [None]:
df[:, :A] .= 1
df

## RDatasets
With [RDatasets](https://github.com/JuliaStats/RDatasets.jl) you have access to all standard datasets you know from R

In [None]:
import RDatasets

In [None]:
iris = RDatasets.dataset("datasets", "iris")

## StatsPlots
[StatsPlots](https://github.com/JuliaPlots/StatsPlots.jl) gives you many so called plot recipes with which you can easily visualize your dataframes

In [None]:
using StatsPlots

In [None]:
dataviewer(iris)

StatPlots also gives you a special macro `@df` with which you can use normal plots functions

In [None]:
@df iris scatter(:PetalLength, :PetalWidth)

StatsModels also adds common datascience plots

In [None]:
@df iris marginalkde(:PetalLength, :PetalWidth)

## it is your time:

using the dataviewer, investigate another standard RDataset 📊 📈 📉

In [None]:
# go on here

# Transforming DataFrames - Query.jl

If you know dplyr, [Query.jl](https://www.queryverse.org/Query.jl/stable/) will be very intuitive for you. If you don't know it, it will be similarly intuitive.

It combines a functional style with so called piping together with functional operators like filter, map, groupby, joins etc.

In [None]:
using Query, Statistics

In [None]:
df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])

df |>
    @filter(_.age>50) |>
    @map({_.name, _.children}) |>
    DataFrame

Full list of commands can be found here
http://www.queryverse.org/Query.jl/stable/standalonequerycommands/

In [None]:
df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,2,2])

df |>
    @groupby(_.children) |>
    @map({Key=key(_), Count=length(_)}) |>
    DataFrame

In [None]:
df = DataFrame(a=[1,1,2,3], b=[4,5,6,8])
df |>
    @groupby(_.a) |>
    @map({a=key(_), b=mean(_.b)}) |>
    @filter(_.b > 5) |>
    @orderby_descending(_.b) |>
    DataFrame

In [None]:
df1 = DataFrame(a=[1,2,3], b=[1.,2.,3.])
df2 = DataFrame(c=[2,4,2], d=["John", "Jim","Sally"])

df1 |> @groupjoin(df2, _.a, _.c, {t1=_.a, t2=length(__)}) |> DataFrame

## it is your time:

Apply your first transformation on the iris dataset
* filter Petallength > 2
* group_by Species
* get the mean of Sepalwidth

In [None]:
# go on here

For those interested, note that Query.jl also supports LINQ syntax. See https://www.queryverse.org/Query.jl/stable/linqquerycommands/ for more details.

# Other Tables

DataFrames are really just one of the several possible tables

In [None]:
using StatsPlots

In [None]:
namedtuple_of_vector = (one = [1,2,3,4], two = [9,1,3,2])

In [None]:
@df namedtuple_of_vector plot(:one, :two)

In [None]:
vector_of_namedtuple = [
    (one = 1, two = 3.0),
    (one = 1, two = 4.0),
    (one = 2, two = 5.0),
]

In [None]:
using Query
using Statistics

In [None]:
vector_of_namedtuple |>
    @groupby(_.one) |>
    @map({one=_.one, two=mean(_.two)}) |>
    collect

In [None]:
dataviewer(vector_of_namedtuple)

## Creating your own Table Type

Finally, I want to show you that you can also implement your own rigorous Table type.
(We follow the [Tables.jl](https://tables.juliadata.org/stable/) interface.)

In [None]:
using Tables

In [None]:
struct MatrixTable <: Tables.AbstractColumns
    names
    matrix
end
# getter methods to avoid getproperty clash
names(m::MatrixTable) = getfield(m, :names)
matrix(m::MatrixTable) = getfield(m, :matrix)

# declare that MatrixTable is a table
Tables.istable(::Type{<:MatrixTable}) = true
# schema is column names and types
Tables.schema(m::MatrixTable) = Tables.Schema(names(m), fill(eltype(matrix(m)), size(matrix(m), 2)))

In [None]:
# column interface
Tables.columnaccess(::Type{<:MatrixTable}) = true
Tables.columns(m::MatrixTable) = m

# required Tables.AbstractColumns object methods
Tables.getcolumn(m::MatrixTable, nm::Symbol) = matrix(m)[:, findfirst(x -> x == nm, names(m))]
Tables.getcolumn(m::MatrixTable, i::Int) = matrix(m)[:, i]
Tables.columnnames(m::MatrixTable) = names(m)

Finally you also want to implement this older Table Interface, which may still be used from some julia packages, but quite rarely these days.

In [None]:
import IteratorInterfaceExtensions
IteratorInterfaceExtensions.getiterator(x::MatrixTable) = Tables.datavaluerows(x)

Let's see that it works

In [None]:
mat = [1 4.0 "7"; 2 5.0 "8"; 3 6.0 "9"]

In [None]:
mattbl = MatrixTable([:one, :two, :three], mat)

Now you have pretty support for indexing and the like (indexing is all columnwise)

In [None]:
mattbl[:one]

In [None]:
mattbl[2]

In [None]:
mattbl.one

You can also access the table row-wise

In [None]:
collect(Tables.rows(mattbl))[1].three

## it is your time:

try out query.jl, `@df` and `dataviewer` on the newly defined table type.

In [None]:
# go on here

# The time-to-first-plot problem

The time-to-first-plot is julia's major drawback currently. While being optimized for high-level and high-performance, it was more throughput focussed so far, meaning that doing a simple plot from scratch can take really long.

There are ways to improve these startup times
- wait for newer julia versions, as this is one of the top 3 focus points of the julia development team right now
- reuse your julia session as much as possible. Checkout [`Revise.jl`](https://github.com/timholy/Revise.jl).
- use [`PackageCompiler`](https://github.com/JuliaLang/PackageCompiler.jl) to build faster precompiled versions of packages you use often. On JuliaCon 2020 there was a lovely [introduction talk for it](https://www.youtube.com/watch?v=d7avhSuK2NA&list=PLP8iPy9hna6Tl2UHTrm4jnIYrLkIcAROR).
- use [`SnoopCompile`](https://github.com/timholy/SnoopCompile.jl) to speed up your own developed packages, or as an alternative to PackageCompiler

# Thank you all for joining today!

feel always welcome to contact me at s.sahm@reply.de

![bye bye](https://images.unsplash.com/photo-1581368242547-06eef3599510?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2550&q=80)