Salsa Example: Linear Regression + Incremental Maintenance #24

NHDaly · 2020-07-04T02:04:40Z

Adds a super simple Linear Regression example, together with a version
that incrementally maintains the learned weights.

The incrementally maintained algorithm keeps constant-time performance
for every inserted batch (constant in the size of the batch), whereas
for the naive, non-maintained version, the time to re-train grows with
each batch, proportional to the total number of observed samples. Which is super cool! 😁

I was starting to work on the outline for the Salsa presentation for JuliaCon, but since coding is always easier than writing slides, I made this fun example that I think really showcases the benefits of incremental maintenance (through recursion!) 🙂:

First, examples/ML-LR-IVM/linear-regression.jl contains a very simple version of linear regression with gradient descent in Salsa (very simple; only m, no bias), which unrolls the gradient descent loop via an iteration::Int parameter in Salsa. The outer loop just keeps bumping i and calling the unrolled function one after another until it converges, and each unrolled call requests the result from the previous call, computes the new gradient, and returns the adjusted weights.

And then, I made it incremental, via the previous_outputs() function that @comnik added to Salsa! 🙂

And I only needed this one super simple hack to make it work: to incrementally maintain the derivative at each unrolled step, we basically just need to compute the derivative of the weights with respect to the loss only for the newly added samples, and add it to the derivative we'd already computed. So this means that our derived function needs access to the previous derivative and the diff between the samples so it knows which ones to compute. This is the same as in general IVM, where we need to get both the previously computed value, and the previous values for all our inputs. But in this case, I was able to hack this in by just changing the function that computes the derivate to also return n, so that i can get n from previous_output()!! 😅

And with that, I have an incrementally maintained linear regression computation!
Here's the diff that made it incremental -- it's just a few lines change! 🙂

--- examples/ML-LR-IVM/linear-regression.jl	2020-06-07 23:45:17.000000000 -0400
+++ examples/ML-LR-IVM/linear-regression-incremental.jl	2020-06-07 23:45:16.000000000 -0400
@@ -1,4 +1,4 @@
-module SalsaML
+module SalsaMLIncr

 using Salsa

@@ -47,6 +47,7 @@
     @info "timed out in learn iterations"
     return weights
 end
+
 @derived function lr_learned_weights_unrolled(s::Runtime, iteration::Int)

     if iteration == 0
@@ -54,18 +55,30 @@
     else
         n = num_samples(s)
         weights = lr_learned_weights_unrolled(s, iteration-1)
-        δweights_δcost = lr_δweights_δmse(s, iteration-1) .* learning_rate
+        deriv,_ = lr_δweights_δmse(s, iteration-1)
+        δweights_δcost = deriv .* learning_rate
         weights = weights .- δweights_δcost
         return weights
     end
 end

+# HACK: Return the actual value, AND `n`, so that we can use Salsa.previous_output() to
+# incrementally maintain the result! :D
 @derived function lr_δweights_δmse(s::Runtime, iteration::Int)
-    n = num_samples(s)
-    (-2/n) * sum(
+    prev_value = Salsa.previous_output(s)
+    prev_deriv = 0
+    start_n = 1
+    if prev_value !== nothing  # Incremental case
+        (prev_deriv, start_n) = prev_value
+    end
+
+    N = num_samples(s)
+    n = length(1:N)
+    new_deriv = (-2/n) * sum(
         sum(sample(s, i) .* (response(s, i) - lr_predicted_unrolled(s, i, iteration)))
-        for i in 1:n
+        for i in start_n:N
     )
+    return prev_deriv + new_deriv, n
 end
 @derived function lr_predicted_unrolled(s::Runtime, i::Int, iteration::Int)

And after that, as you can see, the incremental algorithm performs in constant time on every inserted batch, which is really pretty amazing! 😁

julia> s = let s = SalsaML.new_lr_runtime()
           for i in 1:10
               for _ in 1:100
                   SalsaML.insert_training_pair!(s, (i,), i + rand(-0.2:0.2))
               end
               # Gets slower as the number of total observed samples increases
               @time SalsaML.predict(s, (rand(),))
           end
           s
       end
[ Info: timed out in learn iterations
  0.080560 seconds (837.95 k allocations: 20.501 MiB)
[ Info: timed out in learn iterations
  0.173123 seconds (2.06 M allocations: 48.929 MiB)
[ Info: timed out in learn iterations
  0.553986 seconds (3.26 M allocations: 74.498 MiB, 50.89% gc time)
[ Info: timed out in learn iterations
  0.420706 seconds (4.47 M allocations: 101.702 MiB, 10.86% gc time)
[ Info: timed out in learn iterations
  0.640236 seconds (5.68 M allocations: 135.161 MiB, 20.79% gc time)
[ Info: timed out in learn iterations
  0.752063 seconds (6.90 M allocations: 156.269 MiB, 21.50% gc time)
[ Info: timed out in learn iterations
  0.892328 seconds (8.12 M allocations: 183.637 MiB, 22.46% gc time)
[ Info: timed out in learn iterations
  1.014956 seconds (9.34 M allocations: 211.028 MiB, 22.25% gc time)
[ Info: timed out in learn iterations
  1.152366 seconds (10.56 M allocations: 238.399 MiB, 22.00% gc time)
[ Info: timed out in learn iterations
  1.424530 seconds (11.78 M allocations: 265.786 MiB, 30.55% gc time)
Salsa.Runtime(Salsa.DefaultStorage(3001, ...))

julia> SalsaML.predict(s, (10,))
9.527912879192533

julia> SalsaML.predict(s, (100,))
95.27912879192533

julia> s1 = let s = SalsaMLIncr.new_lr_runtime()
           for i in 1:10
               for _ in 1:100
                   SalsaMLIncr.insert_training_pair!(s, (i,), i + rand(-1.0:1.0))
               end
               # Time remains constant: O(the batch size, or, 100)
               @time SalsaMLIncr.predict(s, (rand(),))
           end
           s
       end
[ Info: timed out in learn iterations
  0.156828 seconds (819.17 k allocations: 19.918 MiB, 51.31% gc time)
[ Info: timed out in learn iterations
  0.082489 seconds (874.45 k allocations: 21.896 MiB)
[ Info: timed out in learn iterations
  0.073752 seconds (875.01 k allocations: 20.342 MiB)
[ Info: timed out in learn iterations
  0.142901 seconds (875.00 k allocations: 20.342 MiB, 43.37% gc time)
[ Info: timed out in learn iterations
  0.087172 seconds (875.02 k allocations: 26.592 MiB)
[ Info: timed out in learn iterations
  0.076289 seconds (875.62 k allocations: 20.352 MiB)
[ Info: timed out in learn iterations
  0.140228 seconds (875.76 k allocations: 20.354 MiB, 39.38% gc time)
[ Info: stopped after 99 iterations
  0.077766 seconds (866.83 k allocations: 20.147 MiB)
[ Info: stopped after 81 iterations
  0.067455 seconds (709.44 k allocations: 16.488 MiB)
[ Info: stopped after 66 iterations
  0.055761 seconds (577.98 k allocations: 13.434 MiB)
Salsa.Runtime(Salsa.DefaultStorage(3001, ...))

julia> SalsaMLIncr.predict(s1, (10,))
10.696869285338447

julia> SalsaMLIncr.predict(s1, (100,))
106.96869285338448

julia>

So i think this is a pretty neat example of a bunch of things:

The unrolled loop is sort of like staging? TJ and I originally imagined we would unroll recursion via a loop parameter exactly like this, but it proved too hard to implement right away.
The Incremental Maintenance is really neat. Even in this simple example, we can see why we also need the diffs for all of the inputs, between their values when we last run and their values now. We can start tracking this in Salsa, but without it we have to do some kind of hack like we've done here. For cheap inputs like n, though, this is not bad.
It's also just another interesting application of Salsa to a new domain: simple ML algorithms. People usually think of ML as an imperative process performed over some data, but of course as we know at RAI, it can also be expressed as a declarative transformation over input data that includes looping (a.k.a. recursion), which is exactly what is done here.

@comnik

Adds a super simple Linear Regression example, together with a version that incrementally maintains the learned weights. The incrementally maintained algorithm keeps constant-time performance for every inserted batch (constant in the size of the batch), whereas for the naive, non-maintained version, the time to re-train grows with each batch, proportional to the total number of observed samples. Which is super cool! 😁 ----------- I was starting to work on the outline for the Salsa presentation for JuliaCon, but since coding is always easier than writing slides, I made this fun example that I think really showcases the benefits of incremental maintenance (through recursion!) 🙂: First, `examples/ML-LR-IVM/linear-regression.jl` contains a _very_ simple version of linear regression with gradient descent in Salsa (very simple; only `m`, no `bias`), which unrolls the gradient descent loop via an `iteration::Int` parameter in Salsa. The outer loop just keeps bumping `i` and calling the unrolled function one after another until it converges, and each unrolled call requests the result from the previous call, computes the new gradient, and returns the adjusted weights. And then, I made it incremental, via the `previous_outputs()` function that @comnik added to Salsa! 🙂 And I only needed this one super simple hack to make it work: to incrementally maintain the derivative at each unrolled step, we basically just need to compute the derivative of the weights with respect to the loss _only for the newly added samples_, and _add it_ to the derivative we'd already computed. So this means that our derived function needs access to the previous derivative **and the diff between the samples** so it knows which ones to compute. This is the same as in general IVM, where we need to get both the previously computed value, _and_ the previous values for all our inputs. But in this case, I was able to hack this in by just changing the function that computes the derivate to _also_ return `n`, so that i can get `n` from `previous_output()`!! 😅 And with that, I have an incrementally maintained linear regression computation! Here's the diff that made it incremental -- it's just a few lines change! 🙂 ```diff --- examples/ML-LR-IVM/linear-regression.jl 2020-06-07 23:45:17.000000000 -0400 +++ examples/ML-LR-IVM/linear-regression-incremental.jl 2020-06-07 23:45:16.000000000 -0400 @@ -1,4 +1,4 @@ -module SalsaML +module SalsaMLIncr using Salsa @@ -47,6 +47,7 @@ @info "timed out in learn iterations" return weights end + @derived function lr_learned_weights_unrolled(s::Runtime, iteration::Int) if iteration == 0 @@ -54,18 +55,30 @@ else n = num_samples(s) weights = lr_learned_weights_unrolled(s, iteration-1) - δweights_δcost = lr_δweights_δmse(s, iteration-1) .* learning_rate + deriv,_ = lr_δweights_δmse(s, iteration-1) + δweights_δcost = deriv .* learning_rate weights = weights .- δweights_δcost return weights end end +# HACK: Return the actual value, AND `n`, so that we can use Salsa.previous_output() to +# incrementally maintain the result! :D @derived function lr_δweights_δmse(s::Runtime, iteration::Int) - n = num_samples(s) - (-2/n) * sum( + prev_value = Salsa.previous_output(s) + prev_deriv = 0 + start_n = 1 + if prev_value !== nothing # Incremental case + (prev_deriv, start_n) = prev_value + end + + N = num_samples(s) + n = length(1:N) + new_deriv = (-2/n) * sum( sum(sample(s, i) .* (response(s, i) - lr_predicted_unrolled(s, i, iteration))) - for i in 1:n + for i in start_n:N ) + return prev_deriv + new_deriv, n end @derived function lr_predicted_unrolled(s::Runtime, i::Int, iteration::Int) ``` And after that, as you can see, the incremental algorithm performs in constant time on every inserted batch, which is really pretty amazing! 😁 ```julia julia> s = let s = SalsaML.new_lr_runtime() for i in 1:10 for _ in 1:100 SalsaML.insert_training_pair!(s, (i,), i + rand(-0.2:0.2)) end # Gets slower as the number of total observed samples increases @time SalsaML.predict(s, (rand(),)) end s end [ Info: timed out in learn iterations 0.080560 seconds (837.95 k allocations: 20.501 MiB) [ Info: timed out in learn iterations 0.173123 seconds (2.06 M allocations: 48.929 MiB) [ Info: timed out in learn iterations 0.553986 seconds (3.26 M allocations: 74.498 MiB, 50.89% gc time) [ Info: timed out in learn iterations 0.420706 seconds (4.47 M allocations: 101.702 MiB, 10.86% gc time) [ Info: timed out in learn iterations 0.640236 seconds (5.68 M allocations: 135.161 MiB, 20.79% gc time) [ Info: timed out in learn iterations 0.752063 seconds (6.90 M allocations: 156.269 MiB, 21.50% gc time) [ Info: timed out in learn iterations 0.892328 seconds (8.12 M allocations: 183.637 MiB, 22.46% gc time) [ Info: timed out in learn iterations 1.014956 seconds (9.34 M allocations: 211.028 MiB, 22.25% gc time) [ Info: timed out in learn iterations 1.152366 seconds (10.56 M allocations: 238.399 MiB, 22.00% gc time) [ Info: timed out in learn iterations 1.424530 seconds (11.78 M allocations: 265.786 MiB, 30.55% gc time) Salsa.Runtime(Salsa.DefaultStorage(3001, ...)) julia> SalsaML.predict(s, (10,)) 9.527912879192533 julia> SalsaML.predict(s, (100,)) 95.27912879192533 julia> s1 = let s = SalsaMLIncr.new_lr_runtime() for i in 1:10 for _ in 1:100 SalsaMLIncr.insert_training_pair!(s, (i,), i + rand(-1.0:1.0)) end # Time remains constant: O(the batch size, or, 100) @time SalsaMLIncr.predict(s, (rand(),)) end s end [ Info: timed out in learn iterations 0.156828 seconds (819.17 k allocations: 19.918 MiB, 51.31% gc time) [ Info: timed out in learn iterations 0.082489 seconds (874.45 k allocations: 21.896 MiB) [ Info: timed out in learn iterations 0.073752 seconds (875.01 k allocations: 20.342 MiB) [ Info: timed out in learn iterations 0.142901 seconds (875.00 k allocations: 20.342 MiB, 43.37% gc time) [ Info: timed out in learn iterations 0.087172 seconds (875.02 k allocations: 26.592 MiB) [ Info: timed out in learn iterations 0.076289 seconds (875.62 k allocations: 20.352 MiB) [ Info: timed out in learn iterations 0.140228 seconds (875.76 k allocations: 20.354 MiB, 39.38% gc time) [ Info: stopped after 99 iterations 0.077766 seconds (866.83 k allocations: 20.147 MiB) [ Info: stopped after 81 iterations 0.067455 seconds (709.44 k allocations: 16.488 MiB) [ Info: stopped after 66 iterations 0.055761 seconds (577.98 k allocations: 13.434 MiB) Salsa.Runtime(Salsa.DefaultStorage(3001, ...)) julia> SalsaMLIncr.predict(s1, (10,)) 10.696869285338447 julia> SalsaMLIncr.predict(s1, (100,)) 106.96869285338448 julia> ``` --------- So i think this is a pretty neat example of a bunch of things: - The unrolled loop is sort of like staging? TJ and I originally imagined we would unroll recursion via a loop parameter exactly like this, but it proved too hard to implement right away. - The Incremental Maintenance is really neat. Even in this simple example, we can see why we also need the diffs for all of the inputs, between their values when _we last run_ and their values now. We can start tracking this in Salsa, but without it we have to do some kind of hack like we've done here. For cheap inputs like `n`, though, this is not bad. - It's also just another interesting application of Salsa to a new domain: simple ML algorithms. People usually think of ML as an imperative process performed over some data, but of course as we know at RAI, it can also be expressed as a declarative transformation over input data that includes looping (a.k.a. recursion), which is exactly what is done here.

NHDaly self-assigned this Jul 4, 2020

NHDaly added the examples label Jul 4, 2020

Base automatically changed from nhd-rai-upstream to master July 4, 2020 02:10

NHDaly merged commit 7138b1b into master Jul 4, 2020

ghost deleted the nhd-examples-ml-linear-regression branch July 6, 2020 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Salsa Example: Linear Regression + Incremental Maintenance #24

Salsa Example: Linear Regression + Incremental Maintenance #24

NHDaly commented Jul 4, 2020

Salsa Example: Linear Regression + Incremental Maintenance #24

Salsa Example: Linear Regression + Incremental Maintenance #24

Conversation

NHDaly commented Jul 4, 2020