# Mill v2.0 features

In [1]:
using Mill, Flux, FileIO, JLD2, SparseArrays, BenchmarkTools, Setfield

┌ Info: Precompiling Mill [1d0525e4-8992-11e8-313c-e310e1f6ddea]
└ @ Base loading.jl:1278


## Bag count

- `AggregationFunction` changed to `AggregationOperator` for clarity and are not meant to be used by the user.
- Reduced number of exported `Segmented*` methods
- `Segmented*` calls now return `Aggregation` type even for aggregations using only one operator.
- All `Aggregation{T}` types now append `log(length(bag) + one(T))` unless a global flag is not set
- slightly more strict type checking
- `Aggregation` is now flattened upon construction
- smart `vcat` implemented

In [2]:
a = SegmentedMeanMax(3)

Aggregation{Float32,2}:
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [3]:
SegmentedMean(3) |> typeof

Aggregation{Float32,1}

In [4]:
SegmentedMean(zeros(3)) |> typeof

SegmentedMean{Float64,Array{Float64,1}}

In [5]:
x = reshape(1:9, 3, 3) |> f32

3×3 Array{Float32,2}:
 1.0  4.0  7.0
 2.0  5.0  8.0
 3.0  6.0  9.0

In [6]:
a(x, Mill.bags([1:2, 3:3]))

7×2 Array{Float32,2}:
 2.5      7.0
 3.5      8.0
 4.5      9.0
 4.0      7.0
 5.0      8.0
 6.0      9.0
 1.09861  0.693147

In [7]:
a(x[:, 1:2], Mill.bags([1:2, 0:-1]))

7×2 Array{Float32,2}:
 2.5      0.0
 3.5      0.0
 4.5      0.0
 4.0      0.0
 5.0      0.0
 6.0      0.0
 1.09861  0.0

In [8]:
Mill.bagcount()

true

In [9]:
Mill.bagcount!(false)
Mill.bagcount()

false

In [10]:
a(x, Mill.bags([1:2, 3:3]))

6×2 Array{Float32,2}:
 2.5  7.0
 3.5  8.0
 4.5  9.0
 4.0  7.0
 5.0  8.0
 6.0  9.0

In [11]:
a = Aggregation(SegmentedPNormLSE(3), Aggregation(SegmentedMean(3)), SegmentedMax(3))

Aggregation{Float32,4}:
 SegmentedPNorm(ψ = Float32[-1.16676, -0.50926, -1.0909], ρ = Float32[-0.114274, -0.177889, -0.630055], c = Float32[0.0, 0.0, 0.0])
 SegmentedLSE(ψ = Float32[-2.76175, -0.205343, -1.6741], ρ = Float32[0.0, 0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [12]:
vcat(SegmentedMean(2), SegmentedMeanMax(2))

Aggregation{Float32,3}:
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0])

## Pre (row) imputing

In [13]:
A = PreImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 PreImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.409449  0.190346   0.455708
 0.25234   0.855877   0.133464
 0.80829   0.0958988  0.390138

ψ:
 0.0  0.0  0.0

In [14]:
hcat(A, A)

3×6 PreImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.409449  0.190346   0.455708  0.409449  0.190346   0.455708
 0.25234   0.855877   0.133464  0.25234   0.855877   0.133464
 0.80829   0.0958988  0.390138  0.80829   0.0958988  0.390138

ψ:
 0.0  0.0  0.0  0.0  0.0  0.0

In [15]:
X = rand(3, 2)

3×2 Array{Float64,2}:
 0.761107  0.976524
 0.939899  0.776236
 0.111206  0.727761

In [16]:
A * X

3×2 Array{Float64,2}:
 0.541218  0.879237
 1.01134   1.00791
 0.748717  1.14768

In [17]:
Y = [1.0 missing; missing 2.0; 3.0 4.0]

3×2 Array{Union{Missing, Float64},2}:
 1.0        missing
  missing  2.0
 3.0       4.0

In [18]:
A * Y

3×2 Array{Float64,2}:
 1.77657   2.20352
 0.652732  2.24561
 1.9787    1.75235

In [19]:
Z = [missing, missing, missing]

3-element Array{Missing,1}:
 missing
 missing
 missing

In [20]:
A * Z

3-element Array{Float64,1}:
 0.0
 0.0
 0.0

In [21]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [1.7376317682933795 1.7161351527001492 0.8389669804895821; 1.7376317682933795 1.7161351527001492 0.8389669804895821; 1.7376317682933795 1.7161351527001492 0.8389669804895821], ψ = nothing), [1.470079394605298 1.470079394605298; 1.1421219378754743 1.1421219378754743; 0.9793091739213722 0.9793091739213722])

In [22]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 2.0 7.0; 1.0 2.0 7.0; 1.0 2.0 7.0], ψ = [1.470079394605298; 1.1421219378754743; 0.0]), [1.470079394605298 0.0; 0.0 1.1421219378754743; 0.9793091739213722 0.9793091739213722])

In [23]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0], ψ = [1.470079394605298, 1.1421219378754743, 0.9793091739213722]), nothing)

## Maybe hot

In [24]:
oh1 = Flux.onehot(1, 1:3)

3-element Flux.OneHotVector:
 1
 0
 0

In [25]:
mh1 = maybehot(1, 1:3)
mh1::AbstractVector{Bool}

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [26]:
Flux.onehot(mh1)

3-element Flux.OneHotVector:
 1
 0
 0

In [27]:
mh2 = Mill.maybehot(missing, 1:3)
mh2::AbstractVector{Missing}

3-element MaybeHotVector{Missing,Int64,Missing}:
 missing
 missing
 missing

In [28]:
ohb1 = Flux.onehotbatch([1, 3], 1:3)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [29]:
mhb1 = Mill.maybehotbatch([1, 3], 1:3)
mhb1::AbstractMatrix{Bool}

3×2 MaybeHotMatrix{Int64,Array{Int64,1},Int64,Bool}:
 1  0
 0  0
 0  1

In [30]:
Flux.onehotbatch(mhb1)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [31]:
mhb2 = Mill.maybehotbatch([1, missing, 3], 1:3)
mhb2::AbstractMatrix{Union{Bool, Missing}}

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [32]:
x = rand(3,3)

3×3 Array{Float64,2}:
 0.659976  0.249149   0.340554
 0.188916  0.0600629  0.707417
 0.796855  0.366108   0.873953

In [33]:
x * oh1

3-element Array{Float64,1}:
 0.6599758842988541
 0.18891628920034576
 0.7968547948247389

In [34]:
x * mh1

3-element Array{Float64,1}:
 0.6599758842988541
 0.18891628920034576
 0.7968547948247389

In [35]:
x * mh2

3-element Array{Missing,1}:
 missing
 missing
 missing

In [36]:
x * ohb1

3×2 Array{Float64,2}:
 0.659976  0.340554
 0.188916  0.707417
 0.796855  0.873953

In [37]:
x * mhb1

3×2 Array{Float64,2}:
 0.659976  0.340554
 0.188916  0.707417
 0.796855  0.873953

In [38]:
x * mhb2

3×3 Array{Union{Missing, Float64},2}:
 0.659976  missing  0.340554
 0.188916  missing  0.707417
 0.796855  missing  0.873953

In [39]:
gradient((x, y) -> x * y |> sum, x, mh1)

([1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], nothing)

In [40]:
gradient((x, y) -> x * y |> sum, x, mh2)

LoadError: Output should be scalar; gradients are not defined for output missing

In [41]:
gradient((x, y) -> x * y |> sum, x, mhb1)

([1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], nothing)

In [42]:
gradient((x, y) -> x * y |> sum, x, mhb2)

LoadError: Output should be scalar; gradients are not defined for output missing

## NGramMatrix with Missing

In [43]:
NGramIterator([3,2,1] |> collect, 4, 10) |> collect

6-element Array{Any,1}:
   3
  32
 321
 321
  21
   1

In [44]:
Y1 = NGramMatrix(["hello", "world"])

2053×2 NGramMatrix{String,Array{String,1},Int64}:
 "hello"
 "world"

In [45]:
Y1S = SparseMatrixCSC(Y1)

2053×2 SparseMatrixCSC{Float32,UInt64} with 14 stored entries:
  [37  , 1]  =  1.0
  [105 , 1]  =  1.0
  [112 , 1]  =  1.0
  [215 , 1]  =  1.0
  [1071, 1]  =  1.0
  [1113, 1]  =  1.0
  [1332, 1]  =  1.0
  [101 , 2]  =  1.0
  [120 , 2]  =  1.0
  [1060, 2]  =  1.0
  [1268, 2]  =  1.0
  [1279, 2]  =  1.0
  [1297, 2]  =  1.0
  [1834, 2]  =  1.0

In [46]:
A1 = rand(10, 2053);
A1 * Y1

10×2 Array{Float64,2}:
 6.7429    4.89728
 0.802493  1.2574
 5.50749   5.05097
 6.3558    6.35945
 2.24496   2.18578
 6.03901   5.82321
 4.33471   4.04752
 0.96762   2.32251
 5.86388   4.25235
 0.120321  1.14825

In [47]:
gradient((x, y) -> x * y |> sum, A1, Y1)

([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], nothing)

In [48]:
Y2 = NGramMatrix([missing, missing])
Y2::AbstractMatrix{Missing}

2053×2 NGramMatrix{Missing,Array{Missing,1},Missing}:
 missing
 missing

In [49]:
Y3 = NGramMatrix([[1,2,3], [4,5,6]])
Y3::AbstractMatrix{Int}

2053×2 NGramMatrix{Array{Int64,1},Array{Array{Int64,1},1},Int64}:
 [1, 2, 3]
 [4, 5, 6]

In [50]:
Y4 = NGramMatrix([missing, "a"])
Y4::AbstractMatrix{Union{Missing,Int}}

2053×2 NGramMatrix{Union{Missing, String},Array{Union{Missing, String},1},Union{Missing, Int64}}:
 missing
 "a"

In [51]:
Mill.Sequence

Union{AbstractString, Base.CodeUnits, AbstractArray{var"#s49",1} where var"#s49"<:Integer}

In [52]:
A2 = PostImputingMatrix(A1)

10×2053 PostImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.963272   0.0404606  0.390423   0.373278   …  0.44633    0.779017  0.372863
 0.114642   0.342094   0.148009   0.652059      0.189425   0.234137  0.921334
 0.786784   0.558523   0.441413   0.431616      0.0529169  0.941059  0.03051
 0.907972   0.909793   0.758402   0.305525      0.956063   0.332242  0.488133
 0.320709   0.291119   0.910328   0.484522      0.0268798  0.33306   0.71015
 0.862715   0.754819   0.613018   0.977915   …  0.565116   0.77766   0.376214
 0.619245   0.47565    0.0211459  0.124565      0.0512021  0.209234  0.985623
 0.138231   0.815674   0.792538   0.0188099     0.0257565  0.35159   0.283921
 0.837698   0.0319319  0.999336   0.209678      0.966193   0.168612  0.285202
 0.0171888  0.531155   0.649938   0.967179      0.0472571  0.100991  0.479984

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [53]:
gradient((x, y) -> x * y |> sum, A2, Y1)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = nothing), nothing)

In [54]:
gradient((x, y) -> x * y |> sum, A2, Y2)

((W = nothing, ψ = [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]), nothing)

In [55]:
gradient((x, y) -> x * y |> sum, A2, Y3)

((W = [0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0], ψ = nothing), nothing)

In [56]:
gradient((x, y) -> x * y |> sum, A2, Y4)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = [1.0; 1.0; … ; 1.0; 1.0]), nothing)

## Post (column) imputing

In [57]:
A = PostImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 PostImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.326253  0.0580193   0.386594
 0.58196   0.268532    0.139718
 0.510112  0.00101466  0.397212

ψ:
 0.0
 0.0
 0.0

In [58]:
X = rand(3)

3-element Array{Float64,1}:
 0.6185220343693185
 0.27575829359651394
 0.6233332923556927

In [59]:
A * X

3-element Array{Float64,1}:
 0.4587711954120451
 0.5210961391508726
 0.5633903832329112

In [60]:
Y = maybehotbatch([1, missing, 3], 1:3)

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [61]:
A * Y

3×3 Array{Float64,2}:
 0.326253  0.0  0.386594
 0.58196   0.0  0.139718
 0.510112  0.0  0.397212

In [62]:
Z = maybehot(1, 1:3)

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [63]:
A * Z

3-element Array{Float64,1}:
 0.3262534742448131
 0.5819604255407069
 0.510111637214435

In [64]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [0.6185220343693185 0.27575829359651394 0.6233332923556927; 0.6185220343693185 0.27575829359651394 0.6233332923556927; 0.6185220343693185 0.27575829359651394 0.6233332923556927], ψ = nothing), [1.418325536999955, 0.3275656686722288, 0.9235238651272217])

In [65]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], ψ = [1.0; 1.0; 1.0]), nothing)

In [66]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], ψ = nothing), nothing)

## Reflect in model and integration

- better IO for all types and trees
- single_key_identity
- single_scalar_identity

In [67]:
m = PreImputingDense(5, 5)

PreImputingDense(5, 5)

In [68]:
typeof(m)

Dense{typeof(identity),PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [69]:
m.W

5×5 PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
  0.090362   0.113137    0.59918   -0.23468    -0.516138
 -0.22188   -0.399462   -0.755334   0.714779   -0.342677
  0.502931  -0.0837839   0.14204   -0.621753    0.677199
 -0.400684   0.400101    0.614787  -0.406998    0.501981
 -0.449089   0.460327    0.729261  -0.0588182  -0.552794

ψ:
 0.0  0.0  0.0  0.0  0.0

In [70]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [71]:
m.σ

identity (generic function with 1 method)

In [72]:
m = PostImputingDense(5, 5)

PostImputingDense(5, 5)

In [73]:
typeof(m)

Dense{typeof(identity),PostImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [74]:
m.W

5×5 PostImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 -0.656622   0.312365   0.44091   -0.0537639  -0.680408
  0.111318  -0.585882  -0.762487  -0.152752    0.312368
 -0.148554   0.400867  -0.415371   0.290294    0.239165
 -0.271938   0.380027  -0.273672  -0.297351    0.714995
 -0.628056   0.450633  -0.468083  -0.291115   -0.398236

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0

In [75]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [76]:
m.σ

identity (generic function with 1 method)

In [77]:
x1 = reshape([i%3 == 0 ? missing : i for i in 1:10], 1, 10) |> collect
aa = BagNode(ArrayNode(x1), bags([1:2, 3:7, 0:-1, 8:10]))
a = ProductNode((; aa))

ba = ArrayNode(NGramMatrix(["a", missing, missing, "b"]))
bb = ArrayNode(NGramMatrix([[1,2], [3,4], [5], [6, 7, 8]]))
b = ProductNode((; ba, bb))

ca = ArrayNode(maybehotbatch([1,missing,9,missing], 1:10))
cb = ArrayNode(maybehotbatch([1,2,3,4], 1:10))
c = ProductNode((; ca, cb))

ds = ProductNode((; a, b, c))
printtree(ds)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Union{Missing, Int64}) with 10 obs[39m
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs[39m
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs[39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs[39m

In [78]:
m = reflectinmodel(ds)
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(21, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(identity) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(1)⟩ ↦ ArrayModel(identity) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(PreImputingDense(1, 1)) ["Q"][39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(PostImputingDense(2053, 10)) ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10)) ["c"][39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(PostImputingDense(10, 10)) ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10)) ["s"][39m

In [79]:
m["E"].m

[37mArrayModel(identity)[39m

In [80]:
m["Q"].m.W

1×1 PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 1.0

ψ:
 0.0

In [81]:
m = reflectinmodel(ds; single_key_identity=false, single_scalar_identity=false)
printtree(m)

[34mProductModel ↦ ArrayModel(Dense(30, 10))[39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(PreImputingDense(1, 10))[39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(PostImputingDense(2053, 10))[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10))[39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(PostImputingDense(10, 10))[39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10))[39m

In [82]:
m(ds)

ArrayNode{Array{Float32,2},Nothing}:
 -0.6906526f0   -1.5865679f0   -0.0047252662f0  -2.7836263f0
  1.459733f0     2.668045f0    -0.036307476f0    5.152928f0
  1.7881393f-7   0.27434653f0  -0.21518002f0     0.42916465f0
  0.2371651f0    0.80481863f0  -0.0021744743f0   0.8122953f0
  0.7773101f0    1.1419597f0   -0.3208381f0      2.0551722f0
 -1.6020952f0   -3.4314964f0   -0.16650856f0    -6.8887873f0
 -0.97207224f0  -2.1263258f0    0.7065586f0     -3.518024f0
 -0.98224175f0  -2.0711553f0   -0.12955739f0    -3.7919593f0
  0.46770677f0   0.77656f0     -0.39608592f0     1.1985433f0
 -0.14452562f0  -0.1414824f0   -0.16099665f0    -0.42402506f0

In [83]:
g = gradient(m -> sum(m(ds).data), m)

((ms = (a = (ms = (aa = (im = (m = (W = (W = Float32[7.246476; 2.9262583; … ; 12.581346; -8.315677], ψ = Float32[-0.9903941]), b = Float32[2.0317223, 0.8204463, -2.1628757, -2.4787538, -3.0932555, 2.4060163, 0.3597223, 1.4212981, 3.5274801, -2.3314981], σ = nothing),), a = (fs = ((ψ = Float32[0.6772408, 0.27348208, -0.72095853, -0.82625127, -1.0310851, 0.8020054, 0.11990744, 0.47376606, 1.1758267, -0.77716607],),),), bm = (m = (W = Float32[0.6928727 -1.6442821 … -1.4455364 0.7557327; -0.9883072 2.345389 … 2.0619 -1.0779701; … ; 2.2793193 -5.409138 … -4.755331 2.4861078; -0.5148125 1.2217209 … 1.0740505 -0.5615183], b = Float32[1.1737905, -1.6742839, -2.4035165, -1.7210643, -0.5549123, -4.620958, 2.9776359, -1.8292284, 3.861378, -0.8721401], σ = nothing),)),), m = (m = (W = Float32[-0.45558468 1.6156411 … -1.2320051 0.24427187; 0.53019345 -1.8802263 … 1.4337642 -0.28427503; … ; -1.9729818 6.9967904 … -5.335394 1.0578582; 0.61080927 -2.1661143 … 1.6517678 -0.32749897], b = Float32[0.7137

## Lens utilities
- ModelLens
- findnonempty
- findin
- replacein

In [84]:
printtree(ds; trav=true)

[34mProductNode with 4 obs [""][39m
[34m  ├── a: [39m[31mProductNode with 4 obs ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Union{Missing, Int64}) with 10 obs ["Q"][39m
[34m  ├── b: [39m[31mProductNode with 4 obs ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs ["c"][39m
[34m  └── c: [39m[31mProductNode with 4 obs ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs ["s"][39m

In [85]:
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(30, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10)) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10)) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(PreImputingDense(1, 10)) ["Q"][39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(PostImputingDense(2053, 10)) ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10)) ["c"][39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(PostImputingDense(10, 10)) ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10)) ["s"][39m

In [86]:
lens = findnonempty(ds)

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:data},_A} where _A,1}:
 (@lens _.data.a.data.aa.data.data)
 (@lens _.data.b.data.ba.data)
 (@lens _.data.b.data.bb.data)
 (@lens _.data.c.data.ca.data)
 (@lens _.data.c.data.cb.data)

In [87]:
[ModelLens(m, l) for l in lens]

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:ms},LI} where LI,1}:
 (@lens _.ms.a.ms.aa.im.m)
 (@lens _.ms.b.ms.ba.m)
 (@lens _.ms.b.ms.bb.m)
 (@lens _.ms.c.ms.ca.m)
 (@lens _.ms.c.ms.cb.m)

In [88]:
n = ArrayNode(rand(1, 10))
ds2 = replacein(ds, ds["Q"], n)
printtree(ds2)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Float64) with 10 obs[39m
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs[39m
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs[39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs[39m

In [89]:
findin(ds, n)

In [90]:
findin(ds2, n)

(@lens _.data.a.data.aa.data)

## Error checks

In [91]:
vcat(PreImputingMatrix(rand(2,2)),
     PreImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to vcat PreImputingMatrices

In [92]:
hcat(PostImputingMatrix(rand(2,2)),
     PostImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to hcat PostImputingMatrices

In [93]:
PreImputingMatrix(rand(2,2)) * rand(3,3)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with number of rows of B (3)")

In [94]:
PostImputingMatrix(rand(2,2)) * maybehot(1, 1:4)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with length of b (4)")

In [95]:
maybehot(1, 1:4)[5]

LoadError: BoundsError: attempt to access 4-element MaybeHotVector{Int64,Int64,Bool} at index [5]

In [96]:
NGramMatrix(["a", "b"])[:, 3]

LoadError: BoundsError: attempt to access 2053×2 NGramMatrix{String,Array{String,1},Int64} at index [:, 3]

## Other changes

- renamed default params everywhere to `ψ` for consistency
- `terseprint` is gone and will be available from a standalone package
- `!` versions of functions for global flags
- `ChainRulesCore.rrule` instead of `Zygote.@adjoint` where possible
- `Nothing{T}` and `Maybe{T}` union types
- `ImputingMatrix`, `Sequence`
- `IdentityModel` changed to `ArrayMode{::typeof(identity)}`
- `3x` more tests than before
- more efficient aggregation operators
- at least `julia-1.5` required from now on
- `nobs` from `LearnBase` gone and replaced by `StatsBase` version
- `Macrotools` as a dependency used from `Flux`
- reworked and simplified gradient checking tests

# Still TODO:
- better alphabet (reduced, wildcards, start/end of word characters `\‘`, `\'`)
- profiling and benchmarking performance
- documentation
- merge to master and release