-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salsa Example: Linear Regression + Incremental Maintenance #24
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a super simple Linear Regression example, together with a version that incrementally maintains the learned weights. The incrementally maintained algorithm keeps constant-time performance for every inserted batch (constant in the size of the batch), whereas for the naive, non-maintained version, the time to re-train grows with each batch, proportional to the total number of observed samples. Which is super cool! 😁 ----------- I was starting to work on the outline for the Salsa presentation for JuliaCon, but since coding is always easier than writing slides, I made this fun example that I think really showcases the benefits of incremental maintenance (through recursion!) 🙂: First, `examples/ML-LR-IVM/linear-regression.jl` contains a _very_ simple version of linear regression with gradient descent in Salsa (very simple; only `m`, no `bias`), which unrolls the gradient descent loop via an `iteration::Int` parameter in Salsa. The outer loop just keeps bumping `i` and calling the unrolled function one after another until it converges, and each unrolled call requests the result from the previous call, computes the new gradient, and returns the adjusted weights. And then, I made it incremental, via the `previous_outputs()` function that @comnik added to Salsa! 🙂 And I only needed this one super simple hack to make it work: to incrementally maintain the derivative at each unrolled step, we basically just need to compute the derivative of the weights with respect to the loss _only for the newly added samples_, and _add it_ to the derivative we'd already computed. So this means that our derived function needs access to the previous derivative **and the diff between the samples** so it knows which ones to compute. This is the same as in general IVM, where we need to get both the previously computed value, _and_ the previous values for all our inputs. But in this case, I was able to hack this in by just changing the function that computes the derivate to _also_ return `n`, so that i can get `n` from `previous_output()`!! 😅 And with that, I have an incrementally maintained linear regression computation! Here's the diff that made it incremental -- it's just a few lines change! 🙂 ```diff --- examples/ML-LR-IVM/linear-regression.jl 2020-06-07 23:45:17.000000000 -0400 +++ examples/ML-LR-IVM/linear-regression-incremental.jl 2020-06-07 23:45:16.000000000 -0400 @@ -1,4 +1,4 @@ -module SalsaML +module SalsaMLIncr using Salsa @@ -47,6 +47,7 @@ @info "timed out in learn iterations" return weights end + @derived function lr_learned_weights_unrolled(s::Runtime, iteration::Int) if iteration == 0 @@ -54,18 +55,30 @@ else n = num_samples(s) weights = lr_learned_weights_unrolled(s, iteration-1) - δweights_δcost = lr_δweights_δmse(s, iteration-1) .* learning_rate + deriv,_ = lr_δweights_δmse(s, iteration-1) + δweights_δcost = deriv .* learning_rate weights = weights .- δweights_δcost return weights end end +# HACK: Return the actual value, AND `n`, so that we can use Salsa.previous_output() to +# incrementally maintain the result! :D @derived function lr_δweights_δmse(s::Runtime, iteration::Int) - n = num_samples(s) - (-2/n) * sum( + prev_value = Salsa.previous_output(s) + prev_deriv = 0 + start_n = 1 + if prev_value !== nothing # Incremental case + (prev_deriv, start_n) = prev_value + end + + N = num_samples(s) + n = length(1:N) + new_deriv = (-2/n) * sum( sum(sample(s, i) .* (response(s, i) - lr_predicted_unrolled(s, i, iteration))) - for i in 1:n + for i in start_n:N ) + return prev_deriv + new_deriv, n end @derived function lr_predicted_unrolled(s::Runtime, i::Int, iteration::Int) ``` And after that, as you can see, the incremental algorithm performs in constant time on every inserted batch, which is really pretty amazing! 😁 ```julia julia> s = let s = SalsaML.new_lr_runtime() for i in 1:10 for _ in 1:100 SalsaML.insert_training_pair!(s, (i,), i + rand(-0.2:0.2)) end # Gets slower as the number of total observed samples increases @time SalsaML.predict(s, (rand(),)) end s end [ Info: timed out in learn iterations 0.080560 seconds (837.95 k allocations: 20.501 MiB) [ Info: timed out in learn iterations 0.173123 seconds (2.06 M allocations: 48.929 MiB) [ Info: timed out in learn iterations 0.553986 seconds (3.26 M allocations: 74.498 MiB, 50.89% gc time) [ Info: timed out in learn iterations 0.420706 seconds (4.47 M allocations: 101.702 MiB, 10.86% gc time) [ Info: timed out in learn iterations 0.640236 seconds (5.68 M allocations: 135.161 MiB, 20.79% gc time) [ Info: timed out in learn iterations 0.752063 seconds (6.90 M allocations: 156.269 MiB, 21.50% gc time) [ Info: timed out in learn iterations 0.892328 seconds (8.12 M allocations: 183.637 MiB, 22.46% gc time) [ Info: timed out in learn iterations 1.014956 seconds (9.34 M allocations: 211.028 MiB, 22.25% gc time) [ Info: timed out in learn iterations 1.152366 seconds (10.56 M allocations: 238.399 MiB, 22.00% gc time) [ Info: timed out in learn iterations 1.424530 seconds (11.78 M allocations: 265.786 MiB, 30.55% gc time) Salsa.Runtime(Salsa.DefaultStorage(3001, ...)) julia> SalsaML.predict(s, (10,)) 9.527912879192533 julia> SalsaML.predict(s, (100,)) 95.27912879192533 julia> s1 = let s = SalsaMLIncr.new_lr_runtime() for i in 1:10 for _ in 1:100 SalsaMLIncr.insert_training_pair!(s, (i,), i + rand(-1.0:1.0)) end # Time remains constant: O(the batch size, or, 100) @time SalsaMLIncr.predict(s, (rand(),)) end s end [ Info: timed out in learn iterations 0.156828 seconds (819.17 k allocations: 19.918 MiB, 51.31% gc time) [ Info: timed out in learn iterations 0.082489 seconds (874.45 k allocations: 21.896 MiB) [ Info: timed out in learn iterations 0.073752 seconds (875.01 k allocations: 20.342 MiB) [ Info: timed out in learn iterations 0.142901 seconds (875.00 k allocations: 20.342 MiB, 43.37% gc time) [ Info: timed out in learn iterations 0.087172 seconds (875.02 k allocations: 26.592 MiB) [ Info: timed out in learn iterations 0.076289 seconds (875.62 k allocations: 20.352 MiB) [ Info: timed out in learn iterations 0.140228 seconds (875.76 k allocations: 20.354 MiB, 39.38% gc time) [ Info: stopped after 99 iterations 0.077766 seconds (866.83 k allocations: 20.147 MiB) [ Info: stopped after 81 iterations 0.067455 seconds (709.44 k allocations: 16.488 MiB) [ Info: stopped after 66 iterations 0.055761 seconds (577.98 k allocations: 13.434 MiB) Salsa.Runtime(Salsa.DefaultStorage(3001, ...)) julia> SalsaMLIncr.predict(s1, (10,)) 10.696869285338447 julia> SalsaMLIncr.predict(s1, (100,)) 106.96869285338448 julia> ``` --------- So i think this is a pretty neat example of a bunch of things: - The unrolled loop is sort of like staging? TJ and I originally imagined we would unroll recursion via a loop parameter exactly like this, but it proved too hard to implement right away. - The Incremental Maintenance is really neat. Even in this simple example, we can see why we also need the diffs for all of the inputs, between their values when _we last run_ and their values now. We can start tracking this in Salsa, but without it we have to do some kind of hack like we've done here. For cheap inputs like `n`, though, this is not bad. - It's also just another interesting application of Salsa to a new domain: simple ML algorithms. People usually think of ML as an imperative process performed over some data, but of course as we know at RAI, it can also be expressed as a declarative transformation over input data that includes looping (a.k.a. recursion), which is exactly what is done here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a super simple Linear Regression example, together with a version
that incrementally maintains the learned weights.
The incrementally maintained algorithm keeps constant-time performance
for every inserted batch (constant in the size of the batch), whereas
for the naive, non-maintained version, the time to re-train grows with
each batch, proportional to the total number of observed samples. Which is super cool! 😁
I was starting to work on the outline for the Salsa presentation for JuliaCon, but since coding is always easier than writing slides, I made this fun example that I think really showcases the benefits of incremental maintenance (through recursion!) 🙂:
First,
examples/ML-LR-IVM/linear-regression.jl
contains a very simple version of linear regression with gradient descent in Salsa (very simple; onlym
, nobias
), which unrolls the gradient descent loop via aniteration::Int
parameter in Salsa. The outer loop just keeps bumpingi
and calling the unrolled function one after another until it converges, and each unrolled call requests the result from the previous call, computes the new gradient, and returns the adjusted weights.And then, I made it incremental, via the
previous_outputs()
function that @comnik added to Salsa! 🙂And I only needed this one super simple hack to make it work: to incrementally maintain the derivative at each unrolled step, we basically just need to compute the derivative of the weights with respect to the loss only for the newly added samples, and add it to the derivative we'd already computed. So this means that our derived function needs access to the previous derivative and the diff between the samples so it knows which ones to compute. This is the same as in general IVM, where we need to get both the previously computed value, and the previous values for all our inputs. But in this case, I was able to hack this in by just changing the function that computes the derivate to also return
n
, so that i can getn
fromprevious_output()
!! 😅And with that, I have an incrementally maintained linear regression computation!
Here's the diff that made it incremental -- it's just a few lines change! 🙂
And after that, as you can see, the incremental algorithm performs in constant time on every inserted batch, which is really pretty amazing! 😁
So i think this is a pretty neat example of a bunch of things:
n
, though, this is not bad.