<div align="center">
  <h2>Basic recommender system</h2>
</div>




We try to build a basic recommender system using the singular value
decomposition technique SVD. MovieLens 100K is a public dataset that has
943 users, 1683 movies, and around 100K ratings given by users to movers in
a range of 1 to 5. Each rating is associated with a timestamp as shown in the
next sample. <br>

 *Table 1 Example of MovieLens dataset structure with 2 users and 3 movies.*
|User_id |Movie_id | Rating | Timestamp|
|--------|---------|--------|----------|
|1       |1        |       4| 455664   |
|1       |2        |1       |455555    |
|2       |2        |4       |444555    |
|2       |3        |2       |555554    |


A main challenge in building a reliable recommender system is data sparsity,
which represents the portion of missing values that limits the performances of
any statistical technique. The previous table can be turned into 2D table ($i.e.$,
rows represents users, columns represents movies.) to show the sparsity of
data, where $X$ represents a missing value or a target value that we like to
predict.

*Table 2 Structure of dense table with missing values.*

|1  | 2 |3  |
|---|---|---|
|4  |1  |X  |
|X  |4  |2  |

The application of SVD on MovieLens 100k generates three matrixes $U, S,$
and $V$. Where, $U$ represents a matrix of $|U|$ vector represents the hidden
preferences of a given user u from $U$ to the set of movies. Similarly, $V$
represents a matrix of $|V|$ vectors each of which explains how a given movie $v$
from $M$ is liked by users.

To apply and test SVD:
1. Load MovieLens 100k <b>data</b>, and omit timestamp column. Data takes
the form lines, separated by a token ($i.e.$, comma, or a tabular.) such
as a CSV file.
2. Split the<b> data</b> into <b>train</b> and <b>test</b> parts by considering $80\%$ and $20\%$
for the splits respectively.
3. Convert the <b>train</b>, and <b>test</b> into dense tables named <b>training_data</b>,
and <b>testing_data</b> ($i.e.$, as in the previous example, we pass from table
1 to table 2).
4. Calculate the <b>global_mean</b> of all ratings in the training data.
5. Fix the next controlling parameters: <br>
-  <b>Lamb:</b> a normalization parameter that we set equal to $0.99$.
6. Calculate the users’ bias $b_u$ of each user $u$ calculated as:
$$
 b_u = \frac{\sum_{i \in I_u} (\text{rating}(u, i) - \text{global\_mean})}{\text{lamb} + |I_u|}
$$
Where $I_u$ is the set of items rated by $u$



|Ratings        | Users’ Bias |
|---------------|------       |
|1 2  3         | ?           |
|4 1 X          | ?           |
|X 4 2          |?            |


7. Calculate the bias $b_i$ of each item calculated as:
$$
 b_i = \frac{\sum_{i \in U_i} (\text{rating}(u, i) - \text{global\_mean})}{\text{lamb} + |U_i|}
$$

8. Fill the missing values of the <b> traing_data </b> using the formula:
$$
\text{missing}(u,i) = b_u + b_i + global\_mean
$$

|1 |2 | 3|
|--|--|--|
|4 |1 |? |
|? |4 |2 |

9. Apply SVD on the <b>training_data<b/> and get $U, S, V.$ <br>
$(U,S,V) = SVD(training\_data) $

-  $U$ = $U$[:, <b>Approx</b>.]
-  $V$ = $V$[<b>Approx</b>, :]
-  $S$ = $S$[<b>Approx</b>; <b>Approx</b>]

10. Reduce $U, S, V$ to keep only <b> Approx </b> column for U, and V, and
<b>Approx</b> columns and row for S.

11. Calculate<b> $Z$</b> as:
$$Z = U.S.V$$

12. Calculate the MAE of the model using the next formula:
$$
MAE =  \sum_{(u, i) \in (U, I)} |\text{testing\_data}(u,i) - Z(u,i) |
$$
Where $ \text{testing\_data}(u,i) \neq 0 $

13. Redo the steps 9 to 12 by setting <b>Approx</b> equal to
$[5,10,15,20,25,30,35,40,45,50]$
14. Plot a bar graph representing the <b>MAE </b>for each configuration.