FM公式，相比一般线性模型:   
$$y=\omega_{0}+\sum_{i=1}^{n} \omega_{i} x_{i}$$
增加了特征建的交叉组合:  
$$y=\omega_{0}+\sum_{i=1}^{n} \omega_{i} x_{i}+\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \omega_{i j} x_{i} x_{j}$$
根据组合部分的公式可以看出，参数量大，复杂度高，在特征稀疏情况下$\omega_{i j}$起不到作用,所以进行了引入k维辅助向量一系列的trick，最终转变为求解：  
$$\begin{aligned} & \sum_{i=1}^{n} \sum_{j=i+1}^{n}\left\langle\mathbf{v}_{i}, \mathbf{v}_{j}\right\rangle x_{i} x_{j} \\=& \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n}\left\langle\mathbf{v}_{i}, \mathbf{v}_{j}\right\rangle x_{i} x_{j}-\frac{1}{2} \sum_{i=1}^{n}\left\langle\mathbf{v}_{i}, \mathbf{v}_{i}\right\rangle x_{i} x_{i} \\=& \frac{1}{2}\left(\sum_{i=1}^{n} \sum_{j=1}^{n} \sum_{f=1}^{k} v_{i, f} v_{j, f} x_{i} x_{j}-\sum_{i=1}^{n} \sum_{f=1}^{k} v_{i, f} v_{i, f} x_{i} x_{i}\right) \\=& \frac{1}{2} \sum_{f=1}^{k}\left(\left(\sum_{i=1}^{n} v_{i, f} x_{i}\right)\left(\sum_{j=1}^{n} v_{j, f} x_{j}\right)-\sum_{i=1}^{n} v_{i, f}^{2} x_{i}^{2}\right) \\=& \frac{1}{2} \sum_{f=1}^{k}\left(\left(\sum_{i=1}^{n} v_{i, f} x_{i}\right)^{2}-\sum_{i=1}^{n} v_{i, f}^{2} x_{i}^{2}\right) \end{aligned}$$
上述过程是如何一步步拆解的建议参考：[全能的FM模型](https://zhuanlan.zhihu.com/p/58160982)真的是讲的太精彩了

上述公式用伪代码简单表示:  
``y_linear = tf.add(w0, tf.reduce_sum(tf.matmul(wi, x)))
y_cross = 0.5 * tf.reduce_sum(
    tf.subsubtract(
        tf.pow(tf.matmul(v,x),2),
        tf.matmul(tf.pow(v,2),tf.pow(x,2))
    )##暂时不考虑矩阵维度问题
)``  
tf.multiply,tf.matmul一个是元素相乘，一个是矩阵相乘。 
看着简单，实际写的时候还是要注意与公式对应的，网上看了一些博客感觉写的有点乱，甚至有的是错的，其实用numpy对应公式写会更直观一些，因为numpy利用循环可以直接对应multiply的元素相乘，[例如这个](https://blog.csdn.net/lieyingkub99/article/details/80897743) ,用tf.keras我们输入的是矩阵，所以矩阵的相乘其实就是元素相乘的累加，这个地方要仔细想想
参考：  
1.[推荐系统召回四模型之：全能的FM模型](https://zhuanlan.zhihu.com/p/58160982)  
2.https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

In [1]:
import tensorflow as tf
from tensorflow.keras import backend as K

from tensorflow.keras.layers import Layer
from tensorflow.keras.regularizers import l2

import pandas as pd
from sklearn.model_selection import train_test_split

2023-01-09 11:13:45.349307: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
import sys
sys.path.append('./util/')
from utils import load_data

users, movies, ratings = load_data()

In [None]:
#为了方便处理把timestamp删除，按道理这个特征是有用的，数据集的划分最好按照时间划分，因为用户的兴趣会随时间发生改变,
#但是这里数据量太小在这里我们应该拿不到什么收益，我主要是为了对比模型，特征工程简单做
data1 = pd.merge(ratings.drop(columns = ['timestamp'],axis = 1), movies, how = 'left', on = 'movieid')
data = pd.merge(data1, users, how = 'left', on = 'userid')

X = data.drop(columns = ['userid', 'movieid', 'genres', 'title', 'rating'])
Y = data['rating'].values

from sklearn import preprocessing
X_norm = preprocessing.scale(X)

train_x, test_x, train_y, test_y = train_test_split(X_norm, Y)

In [None]:
class FM(Layer):
    def __init__(self, units, k, **kwargs):
        self.units = units
        self.k = k
        super(FM, self).__init__(**kwargs)

    def build(self, input_shape):
        input_dim = input_shape[-1]
        self.w0 = self.add_weight(name = 'W0', 
                                 shape=(self.units,),
                                 initializer='glorot_uniform',
                                 trainable=True)
        self.w = self.add_weight(name = 'W', 
                                 shape=(input_dim, self.units),
                                 initializer='glorot_uniform',
                                 trainable=True)
        self.v = self.add_weight(name='V',
                                 shape=(input_dim, self.k),
                                 initializer='glorot_uniform',
                                 trainable=True)

        super(FM, self).build(input_shape)

    def call(self, inputs, **kwargs):
        x = inputs
        linear_terms = tf.add(tf.matmul(x, self.w), self.w0) #(None, units)
        #tf.matmul(x, self.w) 刚好就是(wi*xi)的累加
        pair_interactions = 0.5 * tf.reduce_sum(
            tf.subtract(
                tf.pow(tf.matmul(x, self.v), 2),              #(None, 10) 
                tf.matmul(tf.pow(x, 2), tf.pow(self.v, 2))    #(None, 10)
            ),                                                              
            1, keepdims=True)                                 #(None, 1) 
        #print (pair_interactions.shape, linear_terms.shape)
        output = tf.add(linear_terms, pair_interactions)  
        return output
    def compute_output_shape(self, input_shape):
        return (None,self.units)

In [None]:
input_shape = train_x.shape[1]
learning_rate = 0.01

linear_input = tf.keras.layers.Input(shape = (input_shape,), name = "linear")
fm = FM(32,10)(linear_input)
outputs = tf.keras.layers.Dense(1, name = "outputs")(fm)

model = tf.keras.Model(inputs = [linear_input], outputs = [outputs])

optimizer = tf.keras.optimizers.RMSprop(learning_rate = 0.001)
model.compile(loss='mean_squared_error',
            optimizer=optimizer,
            metrics=['mean_absolute_error', 'mean_squared_error'])

In [None]:
EPOCHS = 50
model.fit(
    train_x, train_y,
    epochs=EPOCHS, 
    validation_data=(test_x, test_y,),
    batch_size=256, shuffle=True
)