# Mill v2.0 features

In [1]:
using Mill, Flux, FileIO, JLD2, SparseArrays, BenchmarkTools, Setfield

┌ Info: Precompiling Mill [1d0525e4-8992-11e8-313c-e310e1f6ddea]
└ @ Base loading.jl:1278


## Bag count

- `AggregationFunction` changed to `AggregationOperator` for clarity and are not meant to be used by the user.
- Reduced number of exported `Segmented*` methods
- `Segmented*` calls now return `Aggregation` type even for aggregations using only one operator.
- All `Aggregation{T}` types now append `log(length(bag) + one(T))` unless a global flag is not set
- slightly more strict type checking
- `Aggregation` is now flattened upon construction
- smart `vcat` implemented

In [98]:
a = SegmentedMeanMax(3)

Aggregation{Float32,2}:
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [3]:
SegmentedMean(3) |> typeof

Aggregation{Float32,1}

In [4]:
SegmentedMean(zeros(3)) |> typeof

SegmentedMean{Float64,Array{Float64,1}}

In [99]:
x = reshape(1:9, 3, 3) |> f32

3×3 Array{Float32,2}:
 1.0  4.0  7.0
 2.0  5.0  8.0
 3.0  6.0  9.0

In [6]:
a(x, Mill.bags([1:2, 3:3]))

7×2 Array{Float32,2}:
 2.5      7.0
 3.5      8.0
 4.5      9.0
 4.0      7.0
 5.0      8.0
 6.0      9.0
 1.09861  0.693147

In [100]:
a(x[:, 1:2], Mill.bags([1:2, 0:-1]))

6×2 Array{Float32,2}:
 2.5  0.0
 3.5  0.0
 4.5  0.0
 4.0  0.0
 5.0  0.0
 6.0  0.0

In [7]:
Mill.bagcount()

true

In [8]:
Mill.bagcount!(false)
Mill.bagcount()

false

In [9]:
a(x, Mill.bags([1:2, 3:3]))

6×2 Array{Float32,2}:
 2.5  7.0
 3.5  8.0
 4.5  9.0
 4.0  7.0
 5.0  8.0
 6.0  9.0

In [10]:
a = Aggregation(SegmentedPNormLSE(3), Aggregation(SegmentedMean(3)), SegmentedMax(3))

Aggregation{Float32,4}:
 SegmentedPNorm(ψ = Float32[-3.06445, 2.75515, -0.276394], ρ = Float32[-0.384319, 0.15008, 0.489879], c = Float32[0.0, 0.0, 0.0])
 SegmentedLSE(ψ = Float32[-1.15335, -1.46075, -0.611554], ρ = Float32[0.0, 0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [11]:
vcat(SegmentedMean(2), SegmentedMeanMax(2))

Aggregation{Float32,3}:
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0])

## Row imputing

In [12]:
A = RowImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 RowImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.155935  0.723339  0.580908
 0.155259  0.180053  0.212448
 0.162214  0.737055  0.739456

ψ:
 0.0  0.0  0.0

In [13]:
hcat(A, A)

3×6 RowImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.155935  0.723339  0.580908  0.155935  0.723339  0.580908
 0.155259  0.180053  0.212448  0.155259  0.180053  0.212448
 0.162214  0.737055  0.739456  0.162214  0.737055  0.739456

ψ:
 0.0  0.0  0.0  0.0  0.0  0.0

In [14]:
X = rand(3, 2)

3×2 Array{Float64,2}:
 0.0891233  0.648923
 0.799927   0.681042
 0.0731798  0.519837

In [15]:
A * X

3×2 Array{Float64,2}:
 0.635026  0.895792
 0.173413  0.333813
 0.658161  0.991626

In [16]:
Y = [1.0 missing; missing 2.0; 3.0 4.0]

3×2 Array{Union{Missing, Float64},2}:
 1.0        missing
  missing  2.0
 3.0       4.0

In [17]:
A * Y

3×2 Array{Float64,2}:
 1.89866   3.77031
 0.792604  1.2099
 2.38058   4.43194

In [18]:
Z = [missing, missing, missing]

3-element Array{Missing,1}:
 missing
 missing
 missing

In [19]:
A * Z

3-element Array{Float64,1}:
 0.0
 0.0
 0.0

In [20]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [0.7380466677961675 1.4809689890984554 0.5930163827685073; 0.7380466677961675 1.4809689890984554 0.5930163827685073; 0.7380466677961675 1.4809689890984554 0.5930163827685073], ψ = nothing), [0.47340833531452753 0.47340833531452753; 1.6404463299186878 1.6404463299186878; 1.5328131540709018 1.5328131540709018])

In [21]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 2.0 7.0; 1.0 2.0 7.0; 1.0 2.0 7.0], ψ = [0.47340833531452753; 1.6404463299186878; 0.0]), [0.47340833531452753 0.0; 0.0 1.6404463299186878; 1.5328131540709018 1.5328131540709018])

In [22]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0], ψ = [0.47340833531452753, 1.6404463299186878, 1.5328131540709018]), nothing)

## Maybe hot

In [23]:
oh1 = Flux.onehot(1, 1:3)

3-element Flux.OneHotVector:
 1
 0
 0

In [24]:
mh1 = maybehot(1, 1:3)
mh1::AbstractVector{Bool}

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [25]:
Flux.onehot(mh1)

3-element Flux.OneHotVector:
 1
 0
 0

In [26]:
mh2 = Mill.maybehot(missing, 1:3)
mh2::AbstractVector{Missing}

3-element MaybeHotVector{Missing,Int64,Missing}:
 missing
 missing
 missing

In [27]:
ohb1 = Flux.onehotbatch([1, 3], 1:3)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [28]:
mhb1 = Mill.maybehotbatch([1, 3], 1:3)
mhb1::AbstractMatrix{Bool}

3×2 MaybeHotMatrix{Int64,Array{Int64,1},Int64,Bool}:
 1  0
 0  0
 0  1

In [29]:
Flux.onehotbatch(mhb1)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [30]:
mhb2 = Mill.maybehotbatch([1, missing, 3], 1:3)
mhb2::AbstractMatrix{Union{Bool, Missing}}

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [31]:
x = rand(3,3)

3×3 Array{Float64,2}:
 0.183704  0.0606907  0.962855
 0.456992  0.331529   0.612568
 0.17048   0.449118   0.654352

In [32]:
x * oh1

3-element Array{Float64,1}:
 0.18370432477484222
 0.456992456780293
 0.17048007252034814

In [33]:
x * mh1

3-element Array{Float64,1}:
 0.18370432477484222
 0.456992456780293
 0.17048007252034814

In [34]:
x * mh2

3-element Array{Missing,1}:
 missing
 missing
 missing

In [35]:
x * ohb1

3×2 Array{Float64,2}:
 0.183704  0.962855
 0.456992  0.612568
 0.17048   0.654352

In [36]:
x * mhb1

3×2 Array{Float64,2}:
 0.183704  0.962855
 0.456992  0.612568
 0.17048   0.654352

In [37]:
x * mhb2

3×3 Array{Union{Missing, Float64},2}:
 0.183704  missing  0.962855
 0.456992  missing  0.612568
 0.17048   missing  0.654352

In [38]:
gradient((x, y) -> x * y |> sum, x, mh1)

([1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], nothing)

In [39]:
gradient((x, y) -> x * y |> sum, x, mh2)

LoadError: Output should be scalar; gradients are not defined for output missing

In [40]:
gradient((x, y) -> x * y |> sum, x, mhb1)

([1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], nothing)

In [41]:
gradient((x, y) -> x * y |> sum, x, mhb2)

LoadError: Output should be scalar; gradients are not defined for output missing

## NGramMatrix with Missing

In [106]:
NGramIterator([3,2,1] |> collect, 4, 10) |> collect

6-element Array{Any,1}:
   3
  32
 321
 321
  21
   1

In [43]:
Y1 = NGramMatrix(["hello", "world"])

2053×2 NGramMatrix{String,Array{String,1},Int64}:
 "hello"
 "world"

In [44]:
Y1S = SparseMatrixCSC(Y1)

2053×2 SparseMatrixCSC{Float32,UInt64} with 14 stored entries:
  [37  , 1]  =  1.0
  [105 , 1]  =  1.0
  [112 , 1]  =  1.0
  [215 , 1]  =  1.0
  [1071, 1]  =  1.0
  [1113, 1]  =  1.0
  [1332, 1]  =  1.0
  [101 , 2]  =  1.0
  [120 , 2]  =  1.0
  [1060, 2]  =  1.0
  [1268, 2]  =  1.0
  [1279, 2]  =  1.0
  [1297, 2]  =  1.0
  [1834, 2]  =  1.0

In [45]:
A1 = rand(10, 2053);
A1 * Y1

10×2 Array{Float64,2}:
 2.5996    2.72308
 2.09448   1.88689
 3.92957   3.48553
 4.78446   4.94053
 2.66736   3.02912
 3.39244   3.9153
 0.737915  1.86471
 1.12539   2.77622
 3.19073   4.0037
 3.44317   2.76024

In [46]:
gradient((x, y) -> x * y |> sum, A1, Y1)

([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], nothing)

In [47]:
Y2 = NGramMatrix([missing, missing])
Y2::AbstractMatrix{Missing}

2053×2 NGramMatrix{Missing,Array{Missing,1},Missing}:
 missing
 missing

In [48]:
Y3 = NGramMatrix([[1,2,3], [4,5,6]])
Y3::AbstractMatrix{Int}

2053×2 NGramMatrix{Array{Int64,1},Array{Array{Int64,1},1},Int64}:
 [1, 2, 3]
 [4, 5, 6]

In [49]:
Y4 = NGramMatrix([missing, "a"])
Y4::AbstractMatrix{Union{Missing,Int}}

2053×2 NGramMatrix{Union{Missing, String},Array{Union{Missing, String},1},Union{Missing, Int64}}:
 missing
 "a"

In [50]:
Mill.Sequence

Union{AbstractString, Base.CodeUnits, AbstractArray{var"#s49",1} where var"#s49"<:Integer}

In [51]:
A2 = ColImputingMatrix(A1)

10×2053 ColImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.371371  0.433111  0.0900573  0.911569   …  0.683739  0.163985    0.112132
 0.299212  0.195413  0.737097   0.133165      0.305439  0.916603    0.252151
 0.561368  0.339347  0.29878    0.516022      0.241723  0.00308394  0.199264
 0.683495  0.76153   0.264764   0.49617       0.051125  0.358108    0.170918
 0.381051  0.561935  0.0876429  0.297594      0.581506  0.684473    0.283201
 0.484634  0.746067  0.745403   0.494892   …  0.343896  0.925658    0.0668124
 0.105416  0.668816  0.587261   0.60605       0.790473  0.248388    0.179196
 0.16077   0.986188  0.748311   0.702658      0.23601   0.197103    0.881278
 0.455819  0.862303  0.901519   0.0671007     0.546569  0.760512    0.625165
 0.491881  0.150416  0.300263   0.621804      0.939945  0.790308    0.305915

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [52]:
gradient((x, y) -> x * y |> sum, A2, Y1)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = nothing), nothing)

In [53]:
gradient((x, y) -> x * y |> sum, A2, Y2)

((W = nothing, ψ = [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]), nothing)

In [54]:
gradient((x, y) -> x * y |> sum, A2, Y3)

((W = [0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0], ψ = nothing), nothing)

In [55]:
gradient((x, y) -> x * y |> sum, A2, Y4)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = [1.0; 1.0; … ; 1.0; 1.0]), nothing)

## Column imputing

In [56]:
A = ColImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 ColImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.29483   0.0104799  0.422587
 0.760895  0.598869   0.188454
 0.735848  0.849712   0.349429

ψ:
 0.0
 0.0
 0.0

In [57]:
X = rand(3)

3-element Array{Float64,1}:
 0.05507061817246606
 0.6645043380807911
 0.4211639292287035

In [58]:
A * X

3-element Array{Float64,1}:
 0.20117878573668493
 0.5192239956138598
 0.752328099117545

In [59]:
Y = maybehotbatch([1, missing, 3], 1:3)

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [60]:
A * Y

3×3 Array{Float64,2}:
 0.29483   0.0  0.422587
 0.760895  0.0  0.188454
 0.735848  0.0  0.349429

In [61]:
Z = maybehot(1, 1:3)

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [62]:
A * Z

3-element Array{Float64,1}:
 0.2948302541126848
 0.7608953514114609
 0.7358481588433874

In [63]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [0.05507061817246606 0.6645043380807911 0.4211639292287035; 0.05507061817246606 0.6645043380807911 0.4211639292287035; 0.05507061817246606 0.6645043380807911 0.4211639292287035], ψ = nothing), [1.7915737643675331, 1.4590615820491695, 0.9604693728280791])

In [64]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], ψ = [1.0; 1.0; 1.0]), nothing)

In [65]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], ψ = nothing), nothing)

## Reflect in model and integration

- better IO for all types and trees
- single_key_identity
- single_scalar_identity

In [66]:
m = RowImputingDense(5, 5)

RowImputingDense(5, 5)

In [67]:
typeof(m)

Dense{typeof(identity),RowImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [68]:
m.W

5×5 RowImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 -0.372289    0.605197    0.446778     -0.0499073  -0.652797
 -0.0474058   0.291017   -0.0245484     0.452029   -0.3542
  0.291981   -0.392264    0.0248956    -0.250354   -0.467441
  0.121865    0.0668249   0.290167      0.371941    0.0992193
  0.664294   -0.0982265  -0.000151436   0.129793   -0.0363971

ψ:
 0.0  0.0  0.0  0.0  0.0

In [69]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [70]:
m.σ

identity (generic function with 1 method)

In [71]:
m = ColImputingDense(5, 5)

ColImputingDense(5, 5)

In [72]:
typeof(m)

Dense{typeof(identity),ColImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [73]:
m.W

5×5 ColImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 -0.757606  -0.604557   0.437204  -0.725591   -0.0154609
 -0.214702  -0.1378     0.142801  -0.0645493  -0.445341
 -0.7349     0.352808  -0.426866  -0.424481    0.29268
  0.131789  -0.387726   0.27454    0.320751   -0.295696
  0.13635    0.565318   0.42899   -0.172284   -0.641756

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0

In [74]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [75]:
m.σ

identity (generic function with 1 method)

In [76]:
x1 = reshape([i%3 == 0 ? missing : i for i in 1:10], 1, 10) |> collect
aa = BagNode(ArrayNode(x1), bags([1:2, 3:7, 0:-1, 8:10]))
a = ProductNode((; aa))

ba = ArrayNode(NGramMatrix(["a", missing, missing, "b"]))
bb = ArrayNode(NGramMatrix([[1,2], [3,4], [5], [6, 7, 8]]))
b = ProductNode((; ba, bb))

ca = ArrayNode(maybehotbatch([1,missing,9,missing], 1:10))
cb = ArrayNode(maybehotbatch([1,2,3,4], 1:10))
c = ProductNode((; ca, cb))

ds = ProductNode((; a, b, c))
printtree(ds)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Union{Missing, Int64}) with 10 obs[39m
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs[39m
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs[39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs[39m

In [77]:
m = reflectinmodel(ds)
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(21, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(identity) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(1)⟩ ↦ ArrayModel(identity) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(RowImputingDense(1, 1)) ["Q"][39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(ColImputingDense(2053, 10)) ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10)) ["c"][39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(ColImputingDense(10, 10)) ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10)) ["s"][39m

In [78]:
m["E"].m

[37mArrayModel(identity)[39m

In [79]:
m["Q"].m.W

1×1 RowImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 1.0

ψ:
 0.0

In [80]:
m = reflectinmodel(ds; single_key_identity=false, single_scalar_identity=false)
printtree(m)

[34mProductModel ↦ ArrayModel(Dense(30, 10))[39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(RowImputingDense(1, 10))[39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(ColImputingDense(2053, 10))[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10))[39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(ColImputingDense(10, 10))[39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10))[39m

In [81]:
m(ds)

ArrayNode{Array{Float32,2},Nothing}:
 -0.67199457f0   -1.1701161f0   -0.044900842f0  -1.9943943f0
 -0.45965022f0   -0.33575702f0  -0.20677046f0   -0.696443f0
  0.22884655f0    0.25142282f0  -0.030810641f0   0.3107742f0
  0.050240092f0   0.3117562f0    0.07796707f0    0.51512957f0
 -0.2282879f0     0.1307442f0    0.051295295f0  -0.06423733f0
  0.2772269f0    -0.06357747f0   0.015852883f0   0.24160242f0
  0.16687497f0    0.16743556f0  -0.045151904f0   0.20538574f0
  0.31265113f0    0.48242566f0  -0.08425555f0    0.705531f0
 -0.70362395f0   -0.6313346f0   -0.006941357f0  -0.95065695f0
  0.18386018f0    0.06354078f0  -0.02975182f0    0.3103891f0

In [82]:
g = gradient(m -> sum(m(ds).data), m)

((ms = (a = (ms = (aa = (im = (m = (W = (W = Float32[0.57542586; 10.136815; … ; -3.6215904; -6.633128], ψ = Float32[-0.11896703]), b = Float32[0.16133434, 2.8420978, -1.9447837, -1.6839157, -1.1364326, 1.5087241, 0.46518844, 1.9201753, -1.0153992, -1.8597556], σ = nothing),), a = (fs = ((ψ = Float32[0.05377812, 0.9473659, -0.64826113, -0.5613053, -0.37881088, 0.502908, 0.15506282, 0.6400584, -0.3384664, -0.6199185],),),), bm = (m = (W = Float32[-1.0884475 -3.1780763 … -2.7260938 -0.58055586; 2.668156 7.7905483 … 6.6825852 1.4231403; … ; 0.7188727 2.0989828 … 1.8004676 0.38343215; -0.49809363 -1.4543463 … -1.2475107 -0.26567304], b = Float32[1.942512, -4.7617593, 0.7608926, -1.307604, 1.198051, -1.3519645, -3.0714536, -0.8468269, -1.2829456, 0.8889293], σ = nothing),)),), m = (m = (W = Float32[2.9960237 -0.21085687 … -2.1349876 -1.0979007; -1.9759058 0.13906208 … 1.4080443 0.72407585; … ; -7.8498507 0.5524639 … 5.593859 2.8765984; -1.6288086 0.114633776 … 1.1607006 0.5968812], b = Float

## Lens utilities
- ModelLens
- findnonempty
- findin
- replacein

In [83]:
printtree(ds; trav=true)

[34mProductNode with 4 obs [""][39m
[34m  ├── a: [39m[31mProductNode with 4 obs ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Union{Missing, Int64}) with 10 obs ["Q"][39m
[34m  ├── b: [39m[31mProductNode with 4 obs ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs ["c"][39m
[34m  └── c: [39m[31mProductNode with 4 obs ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs ["s"][39m

In [84]:
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(30, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10)) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10)) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayModel(RowImputingDense(1, 10)) ["Q"][39m
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayModel(ColImputingDense(2053, 10)) ["Y"][39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayModel(Dense(2053, 10)) ["c"][39m
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[37mArrayModel(ColImputingDense(10, 10)) ["o"][39m
[34m         [39m[31m  └── cb: [39m[37mArrayModel(Dense(10, 10)) ["s"][39m

In [85]:
lens = findnonempty(ds)

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:data},_A} where _A,1}:
 (@lens _.data.a.data.aa.data.data)
 (@lens _.data.b.data.ba.data)
 (@lens _.data.b.data.bb.data)
 (@lens _.data.c.data.ca.data)
 (@lens _.data.c.data.cb.data)

In [86]:
[ModelLens(m, l) for l in lens]

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:ms},LI} where LI,1}:
 (@lens _.ms.a.ms.aa.im.m)
 (@lens _.ms.b.ms.ba.m)
 (@lens _.ms.b.ms.bb.m)
 (@lens _.ms.c.ms.ca.m)
 (@lens _.ms.c.ms.cb.m)

In [87]:
n = ArrayNode(rand(1, 10))
ds2 = replacein(ds, ds["Q"], n)
printtree(ds2)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[37mArrayNode(1x10 Array, Float64) with 10 obs[39m
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[37mArrayNode(2053x4 NGramMatrix, Union{Missing, Int64}) with 4 obs[39m
[34m  │      [39m[31m  └── bb: [39m[37mArrayNode(2053x4 NGramMatrix, Int64) with 4 obs[39m
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[37mArrayNode(10x4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs[39m
[34m         [39m[31m  └── cb: [39m[37mArrayNode(10x4 MaybeHotMatrix, Bool) with 4 obs[39m

In [88]:
findin(ds, n)

In [89]:
findin(ds2, n)

(@lens _.data.a.data.aa.data)

## Error checks

In [90]:
vcat(RowImputingMatrix(rand(2,2)),
     RowImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to vcat RowImputingMatrices

In [91]:
hcat(ColImputingMatrix(rand(2,2)),
     ColImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to hcat ColImputingMatrices

In [92]:
RowImputingMatrix(rand(2,2)) * rand(3,3)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with number of rows of B (3)")

In [93]:
ColImputingMatrix(rand(2,2)) * maybehot(1, 1:4)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with length of b (4)")

In [94]:
maybehot(1, 1:4)[5]

LoadError: BoundsError: attempt to access 4-element MaybeHotVector{Int64,Int64,Bool} at index [5]

In [95]:
NGramMatrix(["a", "b"])[:, 3]

LoadError: BoundsError: attempt to access 2053×2 NGramMatrix{String,Array{String,1},Int64} at index [:, 3]

## Other changes

- renamed default params everywhere to `ψ` for consistency
- `terseprint` is gone and will be available from a standalone package
- `!` versions of functions for global flags
- `ChainRulesCore.rrule` instead of `Zygote.@adjoint` where possible
- `Nothing{T}` and `Maybe{T}` union types
- `ImputingMatrix`, `Sequence`
- `IdentityModel` changed to `ArrayMode{::typeof(identity)}`
- `3x` more tests than before
- more efficient aggregation operators
- at least `julia-1.5` required from now on
- `nobs` from `LearnBase` gone and replaced by `StatsBase` version
- `Macrotools` as a dependency used from `Flux`
- reworked and simplified gradient checking tests

# Still TODO:
- better alphabet (reduced, wildcards, start/end of word characters `\‘`, `\'`)
- profiling and benchmarking performance
- documentation
- merge to master and release