# Numerical and statistical tools

** @ CEF 2017** 

**Authors**: Chase Coleman and Spencer Lyon

**Date**: 27 June 2017

- In this notebook we cover packages that didn't have a home in one of the other sections
- These include packages for computing derivatives, basic statistics, handling data and more

## Distributions.jl

- In my opinion Distributions.jl is one of the best examples of flexible, performant, and idiomatic Julia code
- Provides routines for working with probability distributions and...
    - Computing moments/statistics: mean, median, mode, entropy, mgf, quantile
    - Probability evaluation: pdf, cdf, ccdf, quantile, invlogcdf
    - Sampling: rand and rand!

In [None]:
# Pkg.add("Distributions")

### Distributions.jl Basics

In [None]:
using Distributions

In [None]:
# all subtypes of `Distributions.Distribution`
length(subtypes(Distribution))

In [None]:
?Normal  # good documentation

In [None]:
dists = [
    Normal(0, 1),
    Beta(1.0, 2.0),
    Chisq(5),
    Frechet(5.0, 2.0),
    Gamma(1.0, 2.0),
    Pareto(3.0, 2.0),
    Binomial(10, 0.6),
    Poisson(0.7),
    MvLogNormal(ones(2), 3*eye(2)),
    Dirichlet([0.1, 0.2, 0.3, 0.4]),
    InverseWishart(5, eye(2)),
    MixtureModel(Normal[
        Normal(-2.0, 1.2),
        Normal(0.0, 1.0),
        Normal(3.0, 2.5)], 
        [0.2, 0.5, 0.3]  # prior
    )
]

for d in dists
    println("Working with distribution: $(repr(d))")
    @show mean(d)
    if isa(d, Distributions.UnivariateDistribution)
        @show rand(d, 2, 2)
    else
        @show rand(d, 2)
    end
    
    @show pdf(d, rand(d))
    println("\n\n\n")
end

### More than you need


Let's list all the available distributions, by type of distribution

In [None]:
dist_types = [
    Distributions.DiscreteMatrixDistribution,
    Distributions.DiscreteMultivariateDistribution,
    Distributions.DiscreteUnivariateDistribution,
    Distributions.ContinuousMatrixDistribution,
    Distributions.ContinuousMultivariateDistribution,
    Distributions.ContinuousUnivariateDistribution,   
]

for T in dist_types
    println("$T: ")
    @show subtypes(T)
    println("\n\n")
end 

In [None]:
# fitting a distribution, given some samples
fit_mle(Normal, randn(100_000)) # should get close to N(0, 1)

In [None]:
# do fitting with mle
fit_mle(Uniform, rand(100_000) .* 2 .+ 1) # should get close to U(1, 3)

## Calculus.jl

- Computes analytical derivatives of Julia `Expr`essions and accurate numerical derivatives of functions

In [None]:
# Pkg.add("Calculus")

### Calculus.jl Basics

In [None]:
using Calculus

#### Symbolic derivatives

In [None]:
differentiate(:(sin(x)), :x)

In [None]:
differentiate(:(cos(sin(y))), :y)

In [None]:
differentiate(:(c^(1-γ)/(1-γ)), :c)

#### Finite difference

In [None]:
derivative(sin, 1.0) - cos(1.0)

In [None]:
second_derivative(sin, 1.0) + sin(1.0)

In [None]:
Calculus.gradient(x -> exp(x[1]) + sin(x[2]) / x[1], [1.0, π])

In [None]:
Calculus.hessian(x -> exp(x[1]) + sin(x[2]) / x[1], [1.0, π])

In [None]:
Calculus.jacobian(x -> [exp(x[1]),  sin(x[2]) / x[1]], [1.0, π], :central)

## SymEngine.jl

- Next generation C++ backend for sympy computer algebra system
- A very fast alternative to Calculus.jl for symbolic differentiation

In [None]:
# Pkg.add("SymEngine")

### SymEngine.jl Basics

In [None]:
using SymEngine

In [None]:
# needs first argument to be of type SymEngine.Basic
diff(Basic(:(sin(x))), :x)

In [None]:
diff(Basic("cos(sin(y))"), :y)

In [None]:
diff(Basic("c^(1-γ)/(1-γ)"), :c)

Let's see how fast SymEngine is compared to Calculus.jl

To do this we will load the BenchmarkTools.jl package that goes to great lengths to produce statistically accurate and robust timing estimates at the sub-microsecond level

In [None]:
# Pkg.add("BenchmarkTools")
using BenchmarkTools

In [None]:
@benchmark Calculus.differentiate(:((y + r*a - ap)^(1-γ)/(1-γ)), :ap)

In [None]:
@benchmark diff(Basic("(y + r*a - ap)^(1-γ)/(1-γ)"), :ap)

## Data handling

- Julia's data picture is young, but still maturing
- Python is still my go-to choice for data cleaning/analysis
- That being said, working with data in Julia is still doable and effective

I won't demo them now, but some the key packages are:

- [DataFrames.jl](https://github.com/JuliaStats/DataFrames.jl): Provides a DataFrame type for handling columnar data
- [CSV.jl](https://github.com/JuliaData/CSV.jl): very high performance reading and writing of delimited data files
- [DataStreams.jl](https://github.com/JuliaData/DataStreams.jl): provide an interface for streaming data from a source to a sink
- [Query.jl](https://github.com/davidanthoff/Query.jl): filter, project, join, group any iterable data source