# Simple Explicit Baseline
* Computes a bias for each user and for each item
* Prediction for user $i$ and item $j$ is $\tilde r_{ij} = m + u_i + a_j$
* $m = \text{mean}_{ij}(r_{ij})$
* $u_i = \text{mean}_j(r_{ij}) - m$
* $a_j = \text{mean}_i(r_{ij}) - m$
* $r_{ij}$ is the rating for user $i$ and item $j$
* Useful as a benchmark to compare more sophicated algorithms against

In [1]:
const name = "SimpleExplicitBaseline";
const residual_alphas = [];

In [2]:
using NBInclude
@nbinclude("Alpha.ipynb");

In [3]:
using DataFrames
using DataStructures

In [4]:
const training = get_residuals("training", residual_alphas);

## Training

In [5]:
training_df =
    DataFrame(user = training.user, item = training.item, rating = training.rating);

In [6]:
μ = mean(training.rating);

In [7]:
u = combine(groupby(training_df, :user), :rating => mean => :rating)
u = Dict(Pair.(u.user, u.rating .- μ));

In [8]:
a = combine(groupby(training_df, :item), :rating => mean => :rating)
a = Dict(Pair.(a.item, a.rating .- μ));

## Inference

In [9]:
function make_prediction(users, items, u, a, μ)
    u = DefaultDict(zero(eltype(μ)), u)
    a = DefaultDict(zero(eltype(μ)), a)
    r = zeros(eltype(μ), length(users))
    Threads.@threads for i = 1:length(r)
        r[i] = u[users[i]] + a[items[i]] + μ
    end
    r
end;

model(users, items) = make_prediction(users, items, u, a, μ);

In [10]:
write_predictions(model; residual_alphas = residual_alphas);

[32mProgress: 100%|███████████████████████████| Time: 0:00:01 ( 0.15 μs/it)[39m
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220514 17:05:39 training set weighted-loss: RMSE 1.3280821 MAE 0.99296266 R2 0.45764953
[38;5;6m[1m[ [22m[39m[38;5;6m[1mInfo: [22m[39m20220514 17:05:41 validation set weighted-loss: RMSE 1.3916637 MAE 1.0381515 R2 0.3538339


In [11]:
write_params(Dict("u" => u, "a" => a));