Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[speculative] What does it look like to use StatsModels with a neural net? #116
This is something we've talked about a bit,
So here are some random thoughts I've had. Quality varies so take these ideas as purely speculative at this point (not necessarily strong suggestions).
I think it could open up a lot of opportunities if formulas could be chain terms together in various ways, similar to how Turing.jl allows a series of equations using
So the classic
@formula begin layer1 ~ Conv(training_images....) layer2 ~ Conv(layer1,....) end
I know this is more verbose, but it also leaves room for several interesting ways of intuitively customizing other aspects of a neural network.
Control over kernel weights
The following syntax is probably not ideal, but with formulas you could precondition weights on certain distributions.
@formula begin layer1_kernel_weights ~ Kernel(Normal(), (2,2), channel1 => channel2) layer1 ~ Conv(training_images, layer1_kernel_weights) layer2 ~ Conv(layer1,...) end
Treating the kernels as statistical weights and a neural network layer as a traditional statistical model has the possibility of also doing a lot of very simple transfer learning because you could just take another models weights
Simple topology manipulation
Because formulas allow defining symbols relationships and not just chaining together layers you can easily do a bunch of topological manipulation without new custom layers. So a DenseNet could be something like:
@formula begin layer1 ~ training_variables layer2 ~ layer1 layer3 ~ layer1 + layer2 end
This is nice because it doesn't require any novel syntax for implementation and the concatenation aspect is using the same syntax you'd typically use in a formula.
Per layer loss functions and gradient updates
Again this syntax is in no way polished, but being able to easily specify per layer loss functions would be pretty interesting.
@formula(ŷ ~ x| abs(ŷ-mean(y)))
Maybe a grad update could be done with something like
@formula(y ~ x Δ ADAM)
@Tokazama that is a cool idea, but I think it is outside the scope of StatsModels.jl
Those are some very cool ideas. I tend to think of a formula as specifying a single many-to-many transformation, so chaining a bunch of those together to specify a NN topology certainly makes sense to me. I agree with @oxinabox that the specifics seem out-of-scope for statsmodels though; what's NOT out-of-scope is making sure that we don't build in overly-restrictive assumptions about how the abstractions we have here are going to be used in other packages.
To @oxinabox original questions: the interaction stuff can be handled at the
The "always matrix" and obs-dims stuff is more complicated, and I think is part of the general problem of allowing formula consumers control over the destination container (e.g., sparse matrix, GPU array, row/column major).