layout | title |
---|---|
site |
Algorithms Reference Factorization Machines |
The Factorization Machine (FM), is a general predictor like SVMs but is also able to estimate reliable parameters under very high sparsity. The factorization machine models all nested variable interactions (compared to a polynomial kernel in SVM), but uses a factorized parameterization instead of a dense parameterisation like in SVMs.
$$ \hat{y}(x) =
w_0 +
\sum_{i=1}^{n} w_i x_i +
\sum_{i=1}^{n} \sum_{j=i+1}^{n} \left <v_i, v_j \right > x_i x_j
$$
where the model parameters that have to be estimated are:
and
is the dot product of two vectors of size
$$
\left <v_i, v_j \right > = \sum_{f=1}^{k} v_{i,f} \cdot v_{j,f}
$$
A row
- $ w_0 $ : global bias
- $ w_j $ : models the strength of the ith variable
- $ w_{i,j} = \left <v_i, v_j \right> $ : models the interaction between the $i$th & $j$th variable.
Instead of using an own model parameter
for each interaction, the FM models the interaction by factorizing it.
It is well known that for any positive definite matrix
In sparse settings, there is usually not enough data to estimate interaction between variables directly & independently. FMs can estimate interactions even in these settings well because they break the independence of the interaction parameters by factorizing them.
Due to factorization of pairwise interactions, there is not model parameter
that directly depends on two variables ( e.g., a parameter with an index
$$
\sum_{i=1}^n \sum_{j=i+1}^n \left <v_i, v_j \right > x_i x_j
$$
$$
= {1 \over 2} \sum_{i=1}^n \sum_{j=1}^n x_i x_j - {1 \over 2} \sum_{i=1}^n \left <v_i, v_j \right > x_i x_i
$$
The gradient vector taken for each of the weights, is
The train()
function in the fm-regression.dml script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training.
train = function(matrix[double] X, matrix[double] y, matrix[double] X_val, matrix[double] y_val)
return (matrix[double] w0, matrix[double] W, matrix[double] V) {
/*
* Trains the FM model.
*
* Inputs:
* - X : n examples with d features, of shape (n, d)
* - y : Target matrix, of shape (n, 1)
* - X_val : Input validation data matrix, of shape (n, d)
* - y_val : Target validation matrix, of shape (n, 1)
*
* Outputs:
* - w0, W, V : updated model parameters.
*
* Network Architecture:
*
* X --> [model] --> out --> l2_loss::backward(out, y) --> dout
*
*/
...
# 7.Call adam::update for all parameters
[w0,mw0,vw0] = adam::update(w0, dw0, lr, beta1, beta2, epsilon, t, mw0, vw0);
[W, mW, vW] = adam::update(W, dW, lr, beta1, beta2, epsilon, t, mW, vW );
[V, mV, vV] = adam::update(V, dV, lr, beta1, beta2, epsilon, t, mV, vV );
}
Once the train
function returns the weights for the fm
model, these are passed to the predict
function.
predict = function(matrix[double] X, matrix[double] w0, matrix[double] W, matrix[double] V)
return (matrix[double] out) {
/*
* Computes the predictions for the given inputs.
*
* Inputs:
* - X : n examples with d features, of shape (n, d).
* - w0, W, V : trained model parameters.
*
* Outputs:
* - out : target vector, y.
*/
out = fm::forward(X, w0, W, V);
}
The sign of
The train
function in the fm-binclass.dml
script, takes in the input variable matrix and the corresponding target vector
with some input kept for validation during training. This script also contain
train()
and predict()
function as in the case of regression.