## Factorization Machine

- Factorization Machines (FM) is a flexible and powerful modeling framework for collaborative filtering recommendation. 
- The two broad high-level approaches to recommender systems are Content-Based Filtering (CBF) and Collaborative Filtering (CF). CBF models represent users and items as vectors of attributes or features (e.g. user age, state, income, activity level; item department, category, genre, price). In contrast, CF methods rely only on past user behavior: the model analyzes co-occurrence patterns to determine user and/or item similarities and attempts to infer a user’s preferences over unseen items using only the user’s recorded interactions. CF-based approaches have the advantages of being domain-free (i.e. no specific business knowledge or feature engineering required) as well as generally more accurate and more scalable than CBF models
- Factorization Machines (FM) are generic supervised learning models that map arbitrary real-valued features into a low-dimensional latent factor space and can be applied naturally to a wide variety of prediction tasks including regression, classification, and ranking. FMs can estimate model parameters accurately under very sparse data and train with linear complexity, allowing them to scale to very large data sets — these characteristics make FMs an ideal fit for real-world recommendation problems. Unlike the classic MF model discussed above which inputs a user-item interaction matrix, FM models represent user-item interactions as tuples of real-valued feature vectors and numeric target variables — this data format should be familiar to anyone who’s ever trained a standard regression or classification model.
- Typically for collaborative filtering the base features will be binary vectors of user and item indicators, such that each training sample has exactly two non-zero entries corresponding to the given user/item combination. However, these user/item indicators can be augmented with arbitrary auxiliary features, for example, user or item attributes and/or contextual features relevant to the interaction itself (e.g. day-of-week, add-to-cart order, etc.).
![MFtoFM.png](attachment:MFtoFM.png)
- The FM model equation is comprised of n-way interactions between features. A second-order model (by far the most common) includes weights for each base feature as well as interaction terms for each pairwise feature combination.
----

Formally, let $x \in \mathbb{R}^d$ denote the feature vectors of one sample, and $y$ denote the corresponding label which can be real-valued label or class label such as binary class "click/non-click". The model for a factorization machine of degree two is defined as:

$$ \hat{y}(x) = \mathbf{w}0 + \sum{i=1}^d \mathbf{w}i x_i + \sum{i=1}^d\sum_{j=i+1}^d \langle\mathbf{v}_i, \mathbf{v}_j\rangle x_i x_j $$

where $\mathbf{w}_0 \in \mathbb{R}$ is the global bias; $\mathbf{w} \in \mathbb{R}^d$ denotes the weights of the i-th variable; $\mathbf{V} \in \mathbb{R}^{d\times k}$ represents the feature embeddings; $\mathbf{v}_i$ represents the $i^\mathrm{th}$ row of $\mathbf{V}$; $k$ is the dimensionality of latent factors; $\langle\cdot, \cdot \rangle$ is the dot product of two vectors. $\langle \mathbf{v}_i, \mathbf{v}_j \rangle$ model the interaction between the $i^\mathrm{th}$ and $j^\mathrm{th}$ feature. Some feature interactions can be easily understood so they can be designed by experts. However, most other feature interactions are hidden in data and difficult to identify. So modeling feature interactions automatically can greatly reduce the efforts in feature engineering. It is obvious that the first two terms correspond to the linear regression model and the last term is an extension of the matrix factorization model. If the feature $i$ represents an item and the feature $j$ represents a user, the third term is exactly the dot product between user and item embeddings. It is worth noting that FM can also generalize to higher orders (degree > 2). Nevertheless, the numerical stability might weaken the generalization.

----

### Field-aware Factorization Machine
----

### Deep FM

-----
