We will consider the following five simple baseline models to provide foundational insights. These will serve as the baseline performance metrics that any subsequent machine learning models aim to beat.

1. **Global Mean Rating**:
This model predicts the global mean rating for all user-item pairs, serving as the most basic form of recommendation.

2. **User Mean Rating**:
For this model, the mean rating of each user is calculated and used to predict ratings for all items the user has not yet interacted with.

3. **Item Mean Rating**:
In contrast to the User Mean Rating, this model focuses on the mean rating of each item and uses it to predict ratings for all users.

4. **User-Item Mean Rating**:
This model takes a more nuanced approach by predicting a rating for a user-item pair as the average of the user's mean rating and the item's mean rating. The formula is:
$$prediction = \frac{User Mean Rating + Item Mean Rating}{2}$$

5. **Weighted Mean Ratings**:
This model employs a weighted average of the user mean and item mean ratings. The weight ( w ) can be adjusted based on domain understanding. The formula is :
$$prediction = w \times User Mean Rating + (1 - w) \times Item Mean Rating, \space where \space 0 \leq w \leq 1$$

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from math import sqrt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity