# User Item Biases With Regularization
* Prediction for user $i$ and item $j$ is $\tilde r_{ij} = u_i + a_j$
* Loss function is $L = \sum_{\Omega}w_{ij}\text{loss}(r_{ij}, \tilde r_{ij}) + \lambda_u \sum_i (u_i - \bar u) ^2 + \lambda_a \sum_j (a_j - \bar a)^2 $
* $\bar u$ is the mean of $u_i$ and $\bar a$ is the mean of $a_j$ 
* $\Omega$ is the set of oberved pairs $(i, j)$
* $r_{ij}$ is the rating for user $i$ and item $j$
* $w_{ij}$ is the weight for the prediction $r_{ij}$ and is modeled as a power-law in the number of items seen by $i$ and users than have seen $j$: $w_{ij} = |j' : (i, j') \in \Omega| ^ {\lambda_{wu}} |i' : (i', j) \in \Omega| ^ {\lambda_{wa}}$
* $\text{loss}$ is mean squared error

In [1]:
const name = "ExplicitUserItemBiases"
const implicit = false;

In [2]:
import NBInclude: @nbinclude
@nbinclude("Alpha.ipynb");
@nbinclude("ExplicitUserItemBiasesBase.ipynb");

In [3]:
const training = get_split("training", implicit)
const validation = get_split("validation", implicit);

## Alternating Least Squares
* Given some hyperparameters $\lambda$, we can solve for $U$ and $A$ via Alternating Least Squares
* This is an iterative algorithm where we fix $A$, then solve for the $U$ that minimizes the loss function
* Then we fix $U$ and solve for the best $A$
* These two steps are repeated until the matrices $U$ and $A$ converge
### More details
* If we fix $a$, then for each user $i$, $u_i$ is optimized when
* $u_i = \dfrac{\sum_{j \in \Omega_i}(r_{ij} - a_j) w_{ij} + \bar u \lambda_u}{ \sum_{j \in \Omega_i} w_{ij} + \lambda_u}$
* $\Omega$ is the set of (user, item) pairs that we have ratings for
* $\Omega_i$ is subset of $\Omega$ for which the user is the $i$-th user

In [None]:
function train_model(training, stop_criteria, λ)
    @info "training model with parameters $λ"
    λ_u, λ_a, λ_wu, λ_wa = λ
    users, items, ratings = training.user, training.item, training.rating
    weights =
        expdecay(get_counts("training", implicit), log(λ_wu)) .*
        expdecay(get_counts("training", implicit; by_item = true), log(λ_wa))
    u = zeros(eltype(λ_u), num_users())
    a = zeros(eltype(λ_a), num_items())

    ρ_u = zeros(eltype(u), length(u), Threads.nthreads())
    Ω_u = zeros(eltype(u), length(u), Threads.nthreads())
    ρ_a = zeros(eltype(a), length(a), Threads.nthreads())
    Ω_a = zeros(eltype(a), length(a), Threads.nthreads())

    while !stop!(stop_criteria, [u, a])
        update_users!(users, items, ratings, weights, u, a, λ_u, ρ_u, Ω_u)
        update_users!(items, users, ratings, weights, a, u, λ_a, ρ_a, Ω_a)
    end
    u, a
end;

In [7]:
function validation_mse(λ)
    λ = exp.(λ) # ensure λ is nonnegative
    stop_criteria = convergence_stopper(1e-6, max_iters = 16)
    u, a = train_model(training, stop_criteria, λ)
    r = make_prediction(validation.user, validation.item, u, a)
    residualized_loss([], implicit, r)
end;

In [8]:
# Find the best regularization hyperparameters
res = Optim.optimize(
    validation_mse,
    fill(0.0f0, 4),
    Optim.BFGS(),
    autodiff = :forward,
    Optim.Options(show_trace = true, extended_trace = true),
);
λ = exp.(Optim.minimizer(res));

[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:01:16 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0,1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0,0.0,0.0,1.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0,0.0,0.0,0.0,1.0)]
[32mProgress: 100%|███████████████████████████| Time: 0:00:00 (53.04 ns/it)[39m39m


Iter     Function value   Gradient norm 
     0     1.778537e+00     7.680463e-02
 * Current step size: 1.0
 * time: 0.008833169937133789
 * g(x): Float32[-0.0065774093, -1.8496149f-6, 0.019703476, 0.07680463]
 * ~inv(H): Float32[1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0]
 * x: Float32[0.0, 0.0, 0.0, 0.0]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:01:47 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0065991,1.0065991,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000019,0.0,1.0000019,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9804894,0.0,0.0,0.9804894,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.92607075,0.0,0.0,0.0,0.92607075)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:02:10 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0039606,1.0039606,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000011,0.0,1.0000011,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9882287,0.0,0.0,0.9882287,0.0), Dual{Forwar

     1     1.776097e+00     1.640687e-02
 * Current step size: 0.6009688
 * time: 44.82783508300781
 * g(x): Float32[-0.00197358, -2.6437733f-6, 0.016406871, 0.011619643]
 * ~inv(H): Float32[1.000341 2.7888984f-6 -0.0145523 0.010747191; 2.7888984f-6 1.0 -1.2159333f-5 -2.8423645f-5; -0.0145523 -1.2159333f-5 1.0841252 0.1257996; 0.010747187 -2.8423647f-5 0.12579961 0.7024923]
 * x: Float32[0.0039528175, 1.1115608f-6, -0.011841174, -0.046157185]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:02:32 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0060593,1.0060593,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000043,0.0,1.0000043,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9693605,0.0,0.0,0.9693605,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9451963,0.0,0.0,0.0,0.9451963)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:02:54 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0144975,1.0144975,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.000017,0.0,1.000017,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.8974221,0.0,0.0,0.8974221,0.0), Dual{ForwardDif

     2     1.775774e+00     1.966514e-02
 * Current step size: 1.27296
 * time: 111.10748219490051
 * g(x): Float32[0.00046438412, -3.2511734f-6, 0.012925716, -0.019665137]
 * ~inv(H): Float32[1.0123938 3.6858783f-5 -0.1905426 0.015131294; 3.6858783f-5 1.0000001 -0.00042522332 -9.858053f-5; -0.19054261 -0.00042522332 3.3063543 0.4016363; 0.015131268 -9.858055f-5 0.4016364 0.37174293]
 * x: Float32[0.0066109267, 5.1583597f-6, -0.036380745, -0.05914835]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:03:38 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0089409,1.0089409,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000119,0.0,1.0000119,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.93133986,0.0,0.0,0.93133986,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.94455963,0.0,0.0,0.0,0.94455963)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:04:00 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0182266,1.0182266,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000391,0.0,1.0000391,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.81047714,0.0,0.0,0.81047714,0.0), Dual{Fo

     3     1.773997e+00     5.667517e-02
 * Current step size: 5.177169
 * time: 199.42511200904846
 * g(x): Float32[0.0041741333, -5.634783f-6, 0.00068079436, -0.056675166]
 * ~inv(H): Float32[1.0753716 0.00028660655 -1.3252076 0.22585917; 0.00028660652 1.0000011 -0.0047457465 0.0005844027; -1.3252076 -0.004745747 23.240063 -2.9608421; 0.22585915 0.00058440276 -2.9608421 0.70682037]
 * x: Float32[0.018468294, 4.032052f-5, -0.21628883, -0.04821463]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:05:06 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.028069,1.028069,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000811,0.0,1.0000811,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6740943,0.0,0.0,0.6740943,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9929425,0.0,0.0,0.0,0.9929425)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:05:28 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0274218,1.0274218,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000783,0.0,1.0000783,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6823496,0.0,0.0,0.6823496,0.0), Dual{ForwardDif

     4     1.772833e+00     2.355569e-02
 * Current step size: 0.9316544
 * time: 243.5141670703888
 * g(x): Float32[0.0023460775, -2.1898916f-7, -0.005226848, -0.02355569]
 * ~inv(H): Float32[1.0455711 0.00018076552 -0.84519076 0.1661436; 0.00018076548 1.0000007 -0.0031249207 0.00043649; -0.84519076 -0.0031249213 15.817319 -2.2351243; 0.16614358 0.00043649023 -2.2351246 0.76753336]
 * x: Float32[0.027052611, 7.832312f-5, -0.38221312, -0.009893708]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:05:50 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0243882,1.0243882,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000721,0.0,1.0000721,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.70454335,0.0,0.0,0.70454335,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.99612117,0.0,0.0,0.0,0.99612117)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:06:12 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0255562,1.0255562,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000745,0.0,1.0000745,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.69590634,0.0,0.0,0.69590634,0.0), Dual{Fo

     5     1.772745e+00     5.477091e-04
 * Current step size: 0.6146303
 * time: 287.61205315589905
 * g(x): Float32[0.0003168242, -2.1243325f-6, 0.0005232881, 0.0005477091]
 * ~inv(H): Float32[1.0368905 0.00013218846 -0.69322425 0.17726777; 0.0001321884 1.0000005 -0.002476379 0.00052139984; -0.69322425 -0.00247638 13.544536 -2.473375; 0.17726775 0.00052140007 -2.4733753 0.758161]
 * x: Float32[0.02523512, 7.4477524f-5, -0.36254022, -0.006201418]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:06:34 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0254917,1.0254917,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000776,0.0,1.0000776,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6920801,0.0,0.0,0.6920801,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9946359,0.0,0.0,0.0,0.9946359)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:06:57 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.025234,1.025234,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.00009,0.0,1.00009,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67698437,0.0,0.0,0.67698437,0.0), Dual{ForwardDiff.

     6     1.772743e+00     3.758553e-04
 * Current step size: 1.1086383
 * time: 353.76730608940125
 * g(x): Float32[0.00037585528, -1.7127486f-6, 2.304113f-5, 0.000167877]
 * ~inv(H): Float32[1.0579392 -0.00043456792 0.2328557 0.041176725; -0.00043456798 1.0000052 -0.006900351 0.0010770167; 0.2328557 -0.006900352 14.038434 -2.3605082; 0.041176703 0.0010770172 -2.360509 0.71346825]
 * x: Float32[0.025165446, 7.7906254f-5, -0.36865255, -0.005289148]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:07:41 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0250645,1.0250645,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000798,0.0,1.0000798,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6916555,0.0,0.0,0.6916555,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9946444,0.0,0.0,0.0,0.9946444)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:08:03 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0233852,1.0233852,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0000871,0.0,1.0000871,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6916148,0.0,0.0,0.6916148,0.0), Dual{ForwardD

     7     1.772742e+00     3.045545e-03
 * Current step size: 20.18761
 * time: 441.73513412475586
 * g(x): Float32[0.0006191753, -1.573021f-6, -0.000512142, -0.0030455454]
 * ~inv(H): Float32[40.34989 -0.17752174 1.958013 5.304354; -0.17752174 1.0008034 -0.014680797 -0.022602467; 1.958013 -0.014680799 14.111665 -2.10952; 5.304355 -0.022602465 -2.1095207 1.2609664]
 * x: Float32[0.016890328, 0.00011533969, -0.36894968, -0.0069215335]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:09:09 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.009105,1.009105,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0001504,0.0,1.0001504,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6911769,0.0,0.0,0.6911769,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9925817,0.0,0.0,0.0,0.9925817)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:09:31 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9780033,0.9780033,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0002911,0.0,1.0002911,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6900447,0.0,0.0,0.6900447,0.0), Dual{ForwardDif

     8     1.772694e+00     2.604062e-04
 * Current step size: 32.185417
 * time: 552.1630251407623
 * g(x): Float32[-5.385749f-6, 1.5282944f-6, 0.0002604062, -5.5135028f-5]
 * ~inv(H): Float32[683.0526 -3.0630753 45.46489 46.680702; -3.063075 1.0137587 -0.21001069 -0.20837119; 45.464893 -0.2100107 16.905798 0.7172562; 46.680702 -0.20837119 0.7172552 3.920292]
 * x: Float32[-0.23500894, 0.0012462023, -0.3821405, -0.023799732]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:10:59 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.78616303,0.78616303,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012722,0.0,1.0012722,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.6795951,0.0,0.0,0.6795951,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9767558,0.0,0.0,0.0,0.9767558)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:11:21 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.78627104,0.78627104,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012715,0.0,1.0012715,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67966396,0.0,0.0,0.67966396,0.0), Dual{Fo

     9     1.772694e+00     1.762123e-05
 * Current step size: 0.97539425
 * time: 596.2922401428223
 * g(x): Float32[1.859895f-6, 1.987044f-6, -3.8121045f-6, -1.7621225f-5]
 * ~inv(H): Float32[685.1778 -3.0621037 46.025127 46.72199; -3.0621035 1.0137067 -0.20475462 -0.20908837; 46.025135 -0.20475464 16.575941 0.79843825; 46.721985 -0.20908837 0.79843724 3.910745]
 * x: Float32[-0.24045375, 0.0012707366, -0.38615683, -0.023525553]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:11:43 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.78605914,0.78605914,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012708,0.0,1.0012708,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67965853,0.0,0.0,0.67965853,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.9767348,0.0,0.0,0.0,0.9767348)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:12:05 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.7852121,0.7852121,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012677,0.0,1.0012677,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67963696,0.0,0.0,0.67963696,0.0), Dual{Fo

    10     1.772694e+00     1.990028e-06
 * Current step size: 1.0213934
 * time: 662.778510093689
 * g(x): Float32[-5.409738f-8, 1.9900276f-6, 1.15627984f-7, 8.306294f-7]
 * ~inv(H): Float32[701.3097 -0.5484613 46.069664 48.020363; -0.54846114 1.0281942 -0.13199009 -0.07236117; 46.069668 -0.1319901 16.564577 0.81346804; 48.02036 -0.07236118 0.813467 4.003841]
 * x: Float32[-0.24072903, 0.0012699357, -0.38616493, -0.02354039]


[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:12:50 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.7860497,0.7860497,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012687,0.0,1.0012687,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67965853,0.0,0.0,0.67965853,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.97673386,0.0,0.0,0.0,0.97673386)]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:13:12 training model with parameters ForwardDiff.Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}, Float32, 4}[Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.7860303,0.7860303,0.0,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(1.0012608,0.0,1.0012608,0.0,0.0), Dual{ForwardDiff.Tag{typeof(validation_mse), Float32}}(0.67965895,0.0,0.0,0.67965895,0.0), Dual{Fo

    11     1.772694e+00     1.201726e-05
 * Current step size: 22.339348
 * time: 751.7949130535126
 * g(x): Float32[5.4693913f-7, 1.9903284f-6, -6.620153f-7, -1.2017262f-5]
 * ~inv(H): Float32[540.6965 43.46041 48.073666 33.137325; 43.460415 46.311863 -2.027276 5.6351824; 48.07367 -2.0272758 16.571175 0.9609119; 33.137318 5.635182 0.96091086 2.6710274]
 * x: Float32[-0.24086717, 0.0012252473, -0.38616127, -0.023555536]


In [9]:
@info "The optimal λ is $λ, found in " * repr(Optim.f_calls(res)) * " function calls"

[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:14:19 The optimal λ is Float32[0.785946, 1.001226, 0.6796609, 0.97671974], found in 35 function calls


In [10]:
stop_criteria = convergence_stopper(1e-6, max_iters = 16)
u, a = train_model(training, stop_criteria, λ);

[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:14:19 training model with parameters Float32[0.785946, 1.001226, 0.6796609, 0.97671974]


In [11]:
validation_mse(Optim.minimizer(res))

[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:14:26 training model with parameters Float32[0.785946, 1.001226, 0.6796609, 0.97671974]


1.7725976f0

## Inference

In [12]:
model(users, items) = make_prediction(users, items, u, a)
write_alpha(model, [], implicit, name);

[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:14:36 validation loss: 1.7725976, β: Float32[1.0017253]
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220621 21:14:39 training loss: 1.6141207, β: Float32[1.0017253]


In [13]:
write_params(Dict("u" => u, "a" => a, "λ" => λ), name);