# Mill v2.0 features

In [1]:
using Mill, Flux, FileIO, JLD2, SparseArrays, BenchmarkTools, Setfield

┌ Info: Precompiling Mill [1d0525e4-8992-11e8-313c-e310e1f6ddea]
└ @ Base loading.jl:1278
┌ Info: Precompiling JLD2 [033835bb-8acc-5ee8-8aae-3f567f8a3819]
└ @ Base loading.jl:1278


## Bag count

- `AggregationFunction` changed to `AggregationOperator` for clarity and are not meant to be used by the user.
- Reduced number of exported `Segmented*` methods
- `Segmented*` calls now return `Aggregation` type even for aggregations using only one operator.
- All `Aggregation{T}` types now append `log(length(bag) + one(T))` unless a global flag is not set
- slightly more strict type checking
- `Aggregation` is now flattened upon construction
- smart `vcat` implemented

In [2]:
a = SegmentedMeanMax(3)

Aggregation{Float32,2}:
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [3]:
SegmentedMean(3) |> typeof

Aggregation{Float32,1}

In [4]:
SegmentedMean(zeros(3)) |> typeof

SegmentedMean{Float64,Array{Float64,1}}

In [5]:
x = reshape(1:9, 3, 3) |> f32

3×3 Array{Float32,2}:
 1.0  4.0  7.0
 2.0  5.0  8.0
 3.0  6.0  9.0

In [6]:
a(x, Mill.bags([1:2, 3:3]))

7×2 Array{Float32,2}:
 2.5      7.0
 3.5      8.0
 4.5      9.0
 4.0      7.0
 5.0      8.0
 6.0      9.0
 1.09861  0.693147

In [7]:
a(x[:, 1:2], Mill.bags([1:2, 0:-1]))

7×2 Array{Float32,2}:
 2.5      0.0
 3.5      0.0
 4.5      0.0
 4.0      0.0
 5.0      0.0
 6.0      0.0
 1.09861  0.0

In [8]:
Mill.bagcount()

true

In [9]:
Mill.bagcount!(false)
Mill.bagcount()

false

In [10]:
a(x, Mill.bags([1:2, 3:3]))

6×2 Array{Float32,2}:
 2.5  7.0
 3.5  8.0
 4.5  9.0
 4.0  7.0
 5.0  8.0
 6.0  9.0

In [11]:
a = Aggregation(SegmentedPNormLSE(3), Aggregation(SegmentedMean(3)), SegmentedMax(3))

Aggregation{Float32,4}:
 SegmentedPNorm(ψ = Float32[0.257819, 0.870431, 0.31768], ρ = Float32[0.313808, -0.555411, -0.26048], c = Float32[0.0, 0.0, 0.0])
 SegmentedLSE(ψ = Float32[0.121926, 0.597329, -0.961717], ρ = Float32[0.0, 0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

In [12]:
vcat(SegmentedMean(2), SegmentedMeanMax(2))

Aggregation{Float32,3}:
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMean(ψ = Float32[0.0, 0.0])
 SegmentedMax(ψ = Float32[0.0, 0.0])

## Pre (row) imputing

In [13]:
A = PreImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 PreImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.837137   0.122103  0.90182
 0.0409905  0.469328  0.123207
 0.0387109  0.178327  0.833356

ψ:
 0.0  0.0  0.0

In [14]:
hcat(A, A)

3×6 PreImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.837137   0.122103  0.90182   0.837137   0.122103  0.90182
 0.0409905  0.469328  0.123207  0.0409905  0.469328  0.123207
 0.0387109  0.178327  0.833356  0.0387109  0.178327  0.833356

ψ:
 0.0  0.0  0.0  0.0  0.0  0.0

In [15]:
X = rand(3, 2)

3×2 Array{Float64,2}:
 0.936239  0.545714
 0.314886  0.124654
 0.492016  0.542917

In [16]:
A * X

3×2 Array{Float64,2}:
 1.26592   0.961671
 0.246782  0.147764
 0.50242   0.495797

In [17]:
Y = [1.0 missing; missing 2.0; 3.0 4.0]

3×2 Array{Union{Missing, Float64},2}:
 1.0        missing
  missing  2.0
 3.0       4.0

In [18]:
A * Y

3×2 Array{Float64,2}:
 3.5426    3.85148
 0.410611  1.43148
 2.53878   3.69008

In [19]:
Z = [missing, missing, missing]

3-element Array{Missing,1}:
 missing
 missing
 missing

In [20]:
A * Z

3-element Array{Float64,1}:
 0.0
 0.0
 0.0

In [21]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [1.4819533162052188 0.43953996253651906 1.0349329737845445; 1.4819533162052188 0.43953996253651906 1.0349329737845445; 1.4819533162052188 0.43953996253651906 1.0349329737845445], ψ = nothing), [0.9168388994682652 0.9168388994682652; 0.76975819065658 0.76975819065658; 1.8583828666042692 1.8583828666042692])

In [22]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 2.0 7.0; 1.0 2.0 7.0; 1.0 2.0 7.0], ψ = [0.9168388994682652; 0.76975819065658; 0.0]), [0.9168388994682652 0.0; 0.0 0.76975819065658; 1.8583828666042692 1.8583828666042692])

In [23]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0], ψ = [0.9168388994682652, 0.76975819065658, 1.8583828666042692]), nothing)

## Maybe hot

In [24]:
oh1 = Flux.onehot(1, 1:3)

3-element Flux.OneHotVector:
 1
 0
 0

In [25]:
mh1 = maybehot(1, 1:3)
mh1::AbstractVector{Bool}

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [26]:
Flux.onehot(mh1)

3-element Flux.OneHotVector:
 1
 0
 0

In [27]:
mh2 = Mill.maybehot(missing, 1:3)
mh2::AbstractVector{Missing}

3-element MaybeHotVector{Missing,Int64,Missing}:
 missing
 missing
 missing

In [28]:
ohb1 = Flux.onehotbatch([1, 3], 1:3)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [29]:
mhb1 = Mill.maybehotbatch([1, 3], 1:3)
mhb1::AbstractMatrix{Bool}

3×2 MaybeHotMatrix{Int64,Array{Int64,1},Int64,Bool}:
 1  0
 0  0
 0  1

In [30]:
Flux.onehotbatch(mhb1)

3×2 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 1  0
 0  0
 0  1

In [31]:
mhb2 = Mill.maybehotbatch([1, missing, 3], 1:3)
mhb2::AbstractMatrix{Union{Bool, Missing}}

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [32]:
x = rand(3,3)

3×3 Array{Float64,2}:
 0.335087  0.36826   0.367776
 0.578647  0.585546  0.94464
 0.956251  0.873661  0.0490286

In [33]:
x * oh1

3-element Array{Float64,1}:
 0.33508705348542134
 0.5786474548222256
 0.9562510531949802

In [34]:
x * mh1

3-element Array{Float64,1}:
 0.33508705348542134
 0.5786474548222256
 0.9562510531949802

In [35]:
x * mh2

3-element Array{Missing,1}:
 missing
 missing
 missing

In [36]:
x * ohb1

3×2 Array{Float64,2}:
 0.335087  0.367776
 0.578647  0.94464
 0.956251  0.0490286

In [37]:
x * mhb1

3×2 Array{Float64,2}:
 0.335087  0.367776
 0.578647  0.94464
 0.956251  0.0490286

In [38]:
x * mhb2

3×3 Array{Union{Missing, Float64},2}:
 0.335087  missing  0.367776
 0.578647  missing  0.94464
 0.956251  missing  0.0490286

In [39]:
gradient((x, y) -> x * y |> sum, x, mh1)

([1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], nothing)

In [40]:
gradient((x, y) -> x * y |> sum, x, mh2)

LoadError: Output should be scalar; gradients are not defined for output missing

In [41]:
gradient((x, y) -> x * y |> sum, x, mhb1)

([1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], nothing)

In [42]:
gradient((x, y) -> x * y |> sum, x, mhb2)

LoadError: Output should be scalar; gradients are not defined for output missing

## NGramMatrix with Missing

In [43]:
NGramIterator([3,2,1] |> collect, 4, 10) |> collect

6-element Array{Int64,1}:
 2223
 2232
 2321
 3213
 2133
 1333

In [44]:
Mill.string_start_code!(0)

0

In [45]:
Mill.string_start_code!(0)

0

In [46]:
NGramIterator([3,2,1] |> collect, 4, 10) |> collect

6-element Array{Int64,1}:
    3
   32
  321
 3213
 2133
 1333

In [47]:
Y1 = NGramMatrix(["hello", "world"])

2053×2 NGramMatrix{String,Array{String,1},Int64}:
 "hello"
 "world"

In [48]:
Y1S = SparseMatrixCSC(Y1)

2053×2 SparseMatrixCSC{Float32,UInt64} with 14 stored entries:
  [37  , 1]  =  1.0
  [105 , 1]  =  1.0
  [215 , 1]  =  1.0
  [875 , 1]  =  1.0
  [1113, 1]  =  1.0
  [1332, 1]  =  1.0
  [1489, 1]  =  1.0
  [112 , 2]  =  1.0
  [120 , 2]  =  1.0
  [1196, 2]  =  1.0
  [1268, 2]  =  1.0
  [1279, 2]  =  1.0
  [1297, 2]  =  1.0
  [1834, 2]  =  1.0

In [49]:
A1 = rand(10, 2053);
A1 * Y1

10×2 Array{Float64,2}:
 3.90584  3.77515
 2.89506  3.28477
 2.509    3.89028
 4.10531  3.3759
 2.63214  4.1285
 3.08886  3.52846
 4.61518  3.36014
 4.5827   3.36295
 3.6208   4.8233
 5.10926  2.94604

In [50]:
gradient((x, y) -> x * y |> sum, A1, Y1)

([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], nothing)

In [51]:
Y2 = NGramMatrix([missing, missing])
Y2::AbstractMatrix{Missing}

2053×2 NGramMatrix{Missing,Array{Missing,1},Missing}:
 missing
 missing

In [52]:
Y3 = NGramMatrix([[1,2,3], [4,5,6]])
Y3::AbstractMatrix{Int}

2053×2 NGramMatrix{Array{Int64,1},Array{Array{Int64,1},1},Int64}:
 [1, 2, 3]
 [4, 5, 6]

In [53]:
Y4 = NGramMatrix([missing, "a"])
Y4::AbstractMatrix{Union{Missing,Int}}

2053×2 NGramMatrix{Union{Missing, String},Array{Union{Missing, String},1},Union{Missing, Int64}}:
 missing
 "a"

In [54]:
Mill.Sequence

Union{AbstractString, Base.CodeUnits, AbstractArray{var"#s49",1} where var"#s49"<:Integer}

In [55]:
A2 = PostImputingMatrix(A1)

10×2053 PostImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.485399  0.112598  0.342821   0.90522   …  0.294549   0.713174   0.0616785
 0.59183   0.43315   0.23828    0.074511     0.940449   0.0686212  0.71156
 0.487902  0.306681  0.853688   0.636416     0.825156   0.0472395  0.718607
 0.881556  0.458054  0.363097   0.132742     0.304384   0.0965665  0.469558
 0.555162  0.657948  0.899837   0.234271     0.90012    0.846639   0.673113
 0.792749  0.653375  0.781468   0.8923    …  0.841502   0.651107   0.096538
 0.500394  0.885379  0.0125803  0.158539     0.329611   0.89235    0.833952
 0.737689  0.834877  0.270169   0.785252     0.165079   0.348791   0.529045
 0.768394  0.638561  0.382991   0.477191     0.0648912  0.310851   0.402197
 0.743375  0.788291  0.188621   0.754287     0.555854   0.155029   0.416455

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [56]:
gradient((x, y) -> x * y |> sum, A2, Y1)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = nothing), nothing)

In [57]:
gradient((x, y) -> x * y |> sum, A2, Y2)

((W = nothing, ψ = [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]), nothing)

In [58]:
gradient((x, y) -> x * y |> sum, A2, Y3)

((W = [0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 1.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0], ψ = nothing), nothing)

In [59]:
gradient((x, y) -> x * y |> sum, A2, Y4)

((W = [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ψ = [1.0; 1.0; … ; 1.0; 1.0]), nothing)

## Post (column) imputing

In [60]:
A = PostImputingMatrix(rand(3,3))
A::AbstractMatrix{Float64}

3×3 PostImputingMatrix{Float64,Array{Float64,1},Array{Float64,2}}:
W:
 0.983661  0.861208   0.389237
 0.142421  0.500954   0.0707543
 0.617372  0.0474067  0.573172

ψ:
 0.0
 0.0
 0.0

In [61]:
X = rand(3)

3-element Array{Float64,1}:
 0.12311507296192148
 0.4846945229271662
 0.7055062638526115

In [62]:
A * X

3-element Array{Float64,1}:
 0.8131352416991011
 0.3102616531576294
 0.5033617006419145

In [63]:
Y = maybehotbatch([1, missing, 3], 1:3)

3×3 MaybeHotMatrix{Union{Missing, Int64},Array{Union{Missing, Int64},1},Int64,Union{Missing, Bool}}:
  true  missing  false
 false  missing  false
 false  missing   true

In [64]:
A * Y

3×3 Array{Float64,2}:
 0.983661  0.0  0.389237
 0.142421  0.0  0.0707543
 0.617372  0.0  0.573172

In [65]:
Z = maybehot(1, 1:3)

3-element MaybeHotVector{Int64,Int64,Bool}:
 1
 0
 0

In [66]:
A * Z

3-element Array{Float64,1}:
 0.9836610471467733
 0.1424212473370705
 0.6173722875436805

In [67]:
gradient((x, y) -> x * y |> sum, A, X)

((W = [0.12311507296192148 0.4846945229271662 0.7055062638526115; 0.12311507296192148 0.4846945229271662 0.7055062638526115; 0.12311507296192148 0.4846945229271662 0.7055062638526115], ψ = nothing), [1.7434545820275242, 1.4095691663291088, 1.0331624822711363])

In [68]:
gradient((x, y) -> x * y |> sum, A, Y)

((W = [1.0 0.0 1.0; 1.0 0.0 1.0; 1.0 0.0 1.0], ψ = [1.0; 1.0; 1.0]), nothing)

In [69]:
gradient((x, y) -> x * y |> sum, A, Z)

((W = [1.0 0.0 0.0; 1.0 0.0 0.0; 1.0 0.0 0.0], ψ = nothing), nothing)

## Reflect in model and integration

- better IO for all types and trees
- single_key_identity
- single_scalar_identity

In [70]:
m = PreImputingDense(5, 5)

PreImputingDense(5, 5)

In [71]:
typeof(m)

Dense{typeof(identity),PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [72]:
m.W

5×5 PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
  0.770642  -0.105618   -0.639469  -0.0613623   0.312846
 -0.608667   0.194929   -0.720417  -0.178393    0.474514
  0.400999  -0.530013    0.153515   0.342197    0.275042
 -0.738011   0.0715067  -0.449064   0.498823   -0.474731
  0.589017   0.279058    0.063485   0.162089   -0.264956

ψ:
 0.0  0.0  0.0  0.0  0.0

In [73]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [74]:
m.σ

identity (generic function with 1 method)

In [75]:
m = PostImputingDense(5, 5)

PostImputingDense(5, 5)

In [76]:
typeof(m)

Dense{typeof(identity),PostImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}},Array{Float32,1}}

In [77]:
m.W

5×5 PostImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 -0.0619664  -0.712689  -0.361695  -0.728934  -0.684152
  0.517435   -0.263719  -0.200393   0.527432   0.733296
  0.209236   -0.224855  -0.227583  -0.493923  -0.42822
  0.250025    0.389438   0.643126  -0.512282  -0.562059
  0.0553283  -0.106888   0.609508  -0.166765  -0.560717

ψ:
 0.0
 0.0
 0.0
 0.0
 0.0

In [78]:
m.b

5-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [79]:
m.σ

identity (generic function with 1 method)

In [80]:
x1 = reshape([i%3 == 0 ? missing : i for i in 1:10], 1, 10) |> collect
aa = BagNode(ArrayNode(x1), bags([1:2, 3:7, 0:-1, 8:10]))
a = ProductNode((; aa))

ba = ArrayNode(NGramMatrix(["a", missing, missing, "b"]))
bb = ArrayNode(NGramMatrix([[1,2], [3,4], [5], [6, 7, 8]]))
b = ProductNode((; ba, bb))

ca = ArrayNode(maybehotbatch([1,missing,9,missing], 1:10))
cb = ArrayNode(maybehotbatch([1,2,3,4], 1:10))
c = ProductNode((; ca, cb))

ds = ProductNode((; a, b, c))
printtree(ds)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayNode(1×10 Array, Union{Missing, Int64}) with 10 obs
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayNode(2053×4 NGramMatrix, Union{Missing, Int64}) with 4 obs
[34m  │      [39m[31m  └── bb: [39m[39mArrayNode(2053×4 NGramMatrix, Int64) with 4 obs
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[39mArrayNode(10×4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs
[34m         [39m[31m  └── cb: [39m[39mArrayNode(10×4 MaybeHotMatrix, Bool) with 4 obs

In [81]:
m = reflectinmodel(ds)
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(21, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(identity) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(1)⟩ ↦ ArrayModel(identity) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayModel(PreImputingDense(1, 1)) ["Q"]
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayModel(PostImputingDense(2053, 10)) ["Y"]
[34m  │      [39m[31m  └── bb: [39m[39mArrayModel(Dense(2053, 10)) ["c"]
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[39mArrayModel(PostImputingDense(10, 10)) ["o"]
[34m         [39m[31m  └── cb: [39m[39mArrayModel(Dense(10, 10)) ["s"]

In [82]:
m["E"].m

[39mArrayModel(identity)

In [83]:
m["Q"].m.W

1×1 PreImputingMatrix{Float32,Array{Float32,1},Array{Float32,2}}:
W:
 1.0

ψ:
 0.0

In [84]:
m = reflectinmodel(ds; single_key_identity=false, single_scalar_identity=false)
printtree(m)

[34mProductModel ↦ ArrayModel(Dense(30, 10))[39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10))[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayModel(PreImputingDense(1, 10))
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayModel(PostImputingDense(2053, 10))
[34m  │      [39m[31m  └── bb: [39m[39mArrayModel(Dense(2053, 10))
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10))[39m
[34m         [39m[31m  ├── ca: [39m[39mArrayModel(PostImputingDense(10, 10))
[34m         [39m[31m  └── cb: [39m[39mArrayModel(Dense(10, 10))

In [85]:
m(ds)

ArrayNode{Array{Float32,2},Nothing}:
 -0.5824197f0    -1.2803533f0    0.27412367f0  -1.3914753f0
 -0.017148316f0  -0.28040183f0  -0.5657954f0   -0.6519864f0
 -0.13130462f0    0.1808702f0   -0.47555473f0   0.2845776f0
  0.016083531f0  -0.48261806f0  -0.37675557f0  -0.89310277f0
 -0.20278543f0   -0.99934375f0  -0.00856679f0  -1.3289661f0
 -0.610347f0     -0.6957619f0    0.36581814f0  -1.3133608f0
  0.46559334f0    0.09769949f0   0.07922515f0   0.069334954f0
  0.4810563f0     0.32970187f0  -0.16699517f0   0.36774486f0
 -0.6218535f0    -0.34487528f0   0.20834185f0  -0.07497307f0
  0.2815043f0    -0.21695824f0  -0.17535739f0  -0.5719641f0

In [86]:
g = gradient(m -> sum(m(ds).data), m)

((ms = (a = (ms = (aa = (im = (m = (W = (W = Float32[0.81533116; 1.1350142; … ; 3.475454; 2.0494342], ψ = Float32[-0.71509427]), b = Float32[0.22859754, 0.31822827, 1.3198678, 3.3016734, 1.5490509, -1.2158129, -2.9288917, 4.742551, 0.9744264, 0.57460773], σ = nothing),), a = (fs = ((ψ = Float32[0.076199174, 0.10607609, 0.439956, 1.1005578, 0.5163504, -0.40527093, -0.97629726, 1.5808504, 0.3248088, 0.19153592],),),), bm = (m = (W = Float32[-0.32294756 -1.4823573 … 2.4718795 -1.4233576; 0.29382825 1.3486971 … -2.2489967 1.2950172; … ; -0.50565386 -2.320995 … 3.8703356 -2.2286165; 1.1035519 5.0653987 … -8.446719 4.8637896], b = Float32[1.333145, -1.2129389, 3.1436987, 1.9234235, 4.6478662, 5.5194383, 2.3805075, 4.8733354, 2.0873666, -4.555522], σ = nothing),)),), m = (m = (W = Float32[7.464212 3.4295392 … -8.68814 8.58575; -3.0392654 -1.3964341 … 3.5376225 -3.4959314; … ; -6.0898952 -2.7980897 … 7.0884724 -7.0049343; 4.066519 1.8684204 … -4.7333174 4.677535], b = Float32[6.671994, -2.7166

## Lens utilities
- ModelLens
- findnonempty
- findin
- replacein

In [87]:
printtree(ds; trav=true)

[34mProductNode with 4 obs [""][39m
[34m  ├── a: [39m[31mProductNode with 4 obs ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayNode(1×10 Array, Union{Missing, Int64}) with 10 obs ["Q"]
[34m  ├── b: [39m[31mProductNode with 4 obs ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayNode(2053×4 NGramMatrix, Union{Missing, Int64}) with 4 obs ["Y"]
[34m  │      [39m[31m  └── bb: [39m[39mArrayNode(2053×4 NGramMatrix, Int64) with 4 obs ["c"]
[34m  └── c: [39m[31mProductNode with 4 obs ["k"][39m
[34m         [39m[31m  ├── ca: [39m[39mArrayNode(10×4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs ["o"]
[34m         [39m[31m  └── cb: [39m[39mArrayNode(10×4 MaybeHotMatrix, Bool) with 4 obs ["s"]

In [88]:
printtree(m; trav=true)

[34mProductModel ↦ ArrayModel(Dense(30, 10)) [""][39m
[34m  ├── a: [39m[31mProductModel ↦ ArrayModel(Dense(10, 10)) ["E"][39m
[34m  │      [39m[31m  └── aa: [39m[32mBagModel ↦ ⟨SegmentedMean(10)⟩ ↦ ArrayModel(Dense(10, 10)) ["M"][39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayModel(PreImputingDense(1, 10)) ["Q"]
[34m  ├── b: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["U"][39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayModel(PostImputingDense(2053, 10)) ["Y"]
[34m  │      [39m[31m  └── bb: [39m[39mArrayModel(Dense(2053, 10)) ["c"]
[34m  └── c: [39m[31mProductModel ↦ ArrayModel(Dense(20, 10)) ["k"][39m
[34m         [39m[31m  ├── ca: [39m[39mArrayModel(PostImputingDense(10, 10)) ["o"]
[34m         [39m[31m  └── cb: [39m[39mArrayModel(Dense(10, 10)) ["s"]

In [89]:
lens = findnonempty(ds)

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:data},_A} where _A,1}:
 (@lens _.data.a.data.aa.data.data)
 (@lens _.data.b.data.ba.data)
 (@lens _.data.b.data.bb.data)
 (@lens _.data.c.data.ca.data)
 (@lens _.data.c.data.cb.data)

In [90]:
[ModelLens(m, l) for l in lens]

5-element Array{Setfield.ComposedLens{Setfield.PropertyLens{:ms},LI} where LI,1}:
 (@lens _.ms.a.ms.aa.im.m)
 (@lens _.ms.b.ms.ba.m)
 (@lens _.ms.b.ms.bb.m)
 (@lens _.ms.c.ms.ca.m)
 (@lens _.ms.c.ms.cb.m)

In [91]:
n = ArrayNode(rand(1, 10))
ds2 = replacein(ds, ds["Q"], n)
printtree(ds2)

[34mProductNode with 4 obs[39m
[34m  ├── a: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  └── aa: [39m[32mBagNode with 4 obs[39m
[34m  │      [39m[31m          [39m[32m  └── [39m[39mArrayNode(1×10 Array, Float64) with 10 obs
[34m  ├── b: [39m[31mProductNode with 4 obs[39m
[34m  │      [39m[31m  ├── ba: [39m[39mArrayNode(2053×4 NGramMatrix, Union{Missing, Int64}) with 4 obs
[34m  │      [39m[31m  └── bb: [39m[39mArrayNode(2053×4 NGramMatrix, Int64) with 4 obs
[34m  └── c: [39m[31mProductNode with 4 obs[39m
[34m         [39m[31m  ├── ca: [39m[39mArrayNode(10×4 MaybeHotMatrix, Union{Missing, Bool}) with 4 obs
[34m         [39m[31m  └── cb: [39m[39mArrayNode(10×4 MaybeHotMatrix, Bool) with 4 obs

In [92]:
findin(ds, n)

In [93]:
findin(ds2, n)

(@lens _.data.a.data.aa.data)

## Error checks

In [94]:
vcat(PreImputingMatrix(rand(2,2)),
     PreImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to vcat PreImputingMatrices

In [95]:
hcat(PostImputingMatrix(rand(2,2)),
     PostImputingMatrix(rand(2,2))
)

LoadError: ArgumentError: It doesn't make sense to hcat PostImputingMatrices

In [96]:
PreImputingMatrix(rand(2,2)) * rand(3,3)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with number of rows of B (3)")

In [97]:
PostImputingMatrix(rand(2,2)) * maybehot(1, 1:4)

LoadError: DimensionMismatch("Number of columns of A (2) must correspond with length of b (4)")

In [98]:
maybehot(1, 1:4)[5]

LoadError: BoundsError: attempt to access 4-element MaybeHotVector{Int64,Int64,Bool} at index [5]

In [99]:
NGramMatrix(["a", "b"])[:, 3]

LoadError: BoundsError: attempt to access 2053×2 NGramMatrix{String,Array{String,1},Int64} at index [:, 3]

## Other changes

- renamed default params everywhere to `ψ` for consistency
- `terseprint` is gone and will be available from a standalone package
- `!` versions of functions for global flags
- `ChainRulesCore.rrule` instead of `Zygote.@adjoint` where possible
- `Nothing{T}` and `Maybe{T}` union types
- `ImputingMatrix`, `Sequence`
- `IdentityModel` changed to `ArrayMode{::typeof(identity)}`
- `3x` more tests than before
- more efficient aggregation operators
- at least `julia-1.5` required from now on
- `nobs` from `LearnBase` gone and replaced by `StatsBase` version
- `Macrotools` as a dependency used from `Flux`
- `NGramIterator` now works with starting and ending characters
- `MillString` prototyped
- reworked and simplified gradient checking tests

# Still TODO:
- profiling and benchmarking performance
- documentation
- merge to master and release