### 🧠 Recommendation System with NMF in Python
You’ve been hired by BookHive, a growing online bookstore that wants to improve how users find books they’ll enjoy. The current system shows trending titles and editor picks, but it doesn’t adapt to individual user preferences.

Your task as a datascentist is to build a personalized book recommendation system using Non-negative Matrix Factorization (NMF). BookHive has shared a dataset of user-book ratings. These ratings are the only source of interaction data available—there’s no browsing history, no purchase logs, just how users rated different books.

Using this data, you'll create a recommendation model that:

- Learns hidden patterns in user preferences
- Predicts how users might rate books they haven’t seen yet
- Suggests books each user is likely to enjoy
---

Now before we start let me tell you about the algorithm we want to use to solve the Bookhive's problem


<img src="./assets/nmf.png">

**Non-negative Matrix Factorization (NMF)** is a dimensionality reduction technique commonly used in recommendation systems. It decomposes a matrix `A` into two smaller non-negative matrices, `W` and `H` whose product approximates the original matrix.

Here’s how this relates to recommendation:

<img src="./assets/matrixA.png" height=300 width=300>

Matrix \( A \) is the user-item rating matrix:  
- Rows = users  
- Columns = items  

$$
A_{ij} = \text{rating by user } i \text{ for item } j
$$


Now in this matrix most entries are missing (sparse), since users have not read all different books!


As you know NMF approximates \( A \) as:

$$
A \approx W \cdot H
$$

- \( W \): user-feature matrix  
- \( H \): feature-item matrix  

Now if you calculate the product of the `W` and `H` the result is a **dense** matrix — it predicts ratings for all user-item pairs:

$$
\hat{A} = W \cdot H
$$

Now let's start our project by importing the needed modules

Let's import our dataset

We can see that this has three columns: `UserID`, `BookID`, and `Rating`.  
Each row indicates how much a user has rated a specific book.

Now let's find out how many unique users and unique books are in our dataset.

Now let's create our User-Item Matrix by pivoting our dataset.

Now this is the user-item matrix. As you can see, most of the values are `NaN`, which means not all users have rated all the books.
Let's fill this `NaN` value with `0`

This is our matrix `A`. Let's apply the NMF

The `init` parameter in NMF plays an important role in how the algorithm starts the factorization process. by default this value is set as `random`

In this case, we used `init='nndsvd'`, which stands for **Nonnegative Double Singular Value Decomposition (NNDSVD)**. This method provides a smart, data-driven initialization of the matrices `W` and `H`.

- **Better convergence**: Good initialization helps the algorithm converge faster and more reliably.
- **Improved results**: It avoids poor local minima that can occur with random starting values.
- **More stability**: Especially useful on sparse datasets like user-item rating matrices.

Using `nndsvd` is generally a better choice than the default `random`, especially in recommendation systems.

Now let's calculate the by product of `W` and `H`

Now, as you can see, this matrix is dense and contains non-zero values that represent the predicted ratings.showing us `How a user would rate a book if they had read it.`

Now let's define a method to recommend books to a user

This method should generate top-N book recommendations for a given user.  
So it needs to:
- Identify books the user has already rated
- Use the predicted ratings matrix to find unrated books with the highest predicted scores
- Return the top-N recommended book IDs for that user

Now lets try it out with a random User ID

We can see that if this user reads the book with ID `0312980140`, they would likely rate it above 6.3. This makes it a strong recommendation for them.

Now imagine your boss has asked you:  
**"How would you evaluate the performance of this recommendation system?"** 😒

A simple and effective way is to check how close the model’s predicted ratings are to the actual ratings that users have given — but only for the ratings we already know (ignoring the missing ones).

This is where **Root Mean Squared Error (RMSE)** comes in. It measures the average difference between the predicted and actual ratings.

A lower RMSE means the predictions are more accurate.

Let’s take a look at the code to calculate it.

The average difference between the predicted and actual ratings is around `4.5`. 🫢  
This suggests that our recommendation system isn't performing very well at the moment.

Now how can we make it better?

This is where **parameter optimization** comes in.

In real-world scenarios, your first model is rarely the best one. But by tuning the model’s parameters, you can often make significant improvements.

One of the key parameters we can adjust in **NMF** is the number of **latent features**, also known as `n_components`. This determines how many hidden patterns (e.g. user preferences or book characteristics) the model tries to learn.

Let’s see how different values of `n_components` affect the model’s accuracy. If you remember the current model has 2 `latent features`. Let's use a for loop to try different values.

Now let visualize `n_components vs error`

We can see that as we increase `n_components`, the error (RMSE) tends to decrease.  
This means the model is capturing more subtle patterns in the data.

But we should be careful — if `n_components` is too high:

- The model may **overfit**: it memorizes the training data instead of learning general patterns.
- **Sparse data** can't support too many latent features — leading to noisy, unstable `W` and `H` matrices.
- **Training time increases** significantly with more components.
- You might see **lower RMSE** on training data, but worse performance on unseen users/items.

so by looking at this chart a good trade-off between performance and efficiency seems to be around **`n_components = 55`**.  
It gives you a significantly lower RMSE compared to smaller values, without the added complexity of going all the way to 110.

Amazing! 

You now have a working NMF-based recommender system and a basic understanding of model tuning and evaluation.  
In future tutorials, we’ll explore other use cases of NMF — starting with **topic modeling for text data**.

Stay Tuned!