## Deep Learning Project

**1) Περιγράψτε (i) πώς διαχωρίσατε το dataset για τα πειράματα σε training/validation/test sets και
(ii) το parameter setting του NeuMF που χρησιμοποιήσατε
(0,5 μονάδα)**

**(i) Dataset Splitting Methodology**

The experiment was conducted using the **MovieLens 100K (ml-100k)** dataset. The raw data, contained in the `u.data` file, consists of 100,000 ratings from 943 users for 1682 movies. To prepare this data for the NeuMF model, a custom Python script (`prepare_ml100k.py`) was implemented to automate the data splitting process, strictly adhering to the methodology described in the NCF paper.

The splitting process followed the **leave-one-out evaluation** protocol, which was implemented in these specific steps:

1.  **Data Loading and ID Conversion**: The `u.data` file, which is tab-separated and has no header, was loaded into a pandas DataFrame. A critical preprocessing step was performed: since the original user and item IDs in the file are 1-based (starting from 1), they were converted to be **0-based** (starting from 0) by subtracting 1 from each ID. This is necessary for the model's embedding layers, which expect zero-indexed inputs.

2.  **Training/Test Split**: For each user, all their interactions were sorted chronologically using the provided timestamp. The single most recent interaction for every user was identified and designated as the **test sample**. All other, older interactions for that user were consequently placed into the **training set**.

3.  **Evaluation Set Generation**: The model's performance is evaluated by its ability to rank a list of items. To create a standardized test for this, the test set was augmented. For each user's single positive test item, **99 other items** that the user had *never* rated were randomly sampled from the entire item pool. These serve as the negative samples for evaluation.

This procedure results in a test environment where, for each of the 943 users, the model must rank a list of 100 items (1 item the user actually liked, and 99 they did not interact with). The final output of this process consists of three files used by the program: `ml-100k.train.rating`, `ml-100k.test.rating`, and `ml-100k.test.negative`. No separate validation set was created; instead, performance was evaluated on the test set after each training epoch.

**(ii) NeuMF Parameter Settings**

The NeuMF model was configured and executed with a specific set of hyperparameters. These were set via a combination of command-line arguments and the script's default values. The exact configuration used was as follows:

*   **Optimizer**: Adam
*   **Learning Rate**: 0.001
*   **Batch Size**: 256
*   **Number of Epochs**: 20
*   **Negative Samples (for training)**: 4 (For each positive interaction in a training batch, 4 negative items were dynamically sampled).
*   **Predictive Factors (GMF Embedding Size)**: 8
*   **MLP Layers**: `[64, 32, 16, 8]`. This defines a "tower" architecture for the MLP part, where the initial concatenated embedding is processed through layers of 64, 32, 16, and finally 8 neurons.
*   **Regularization**: No regularization was applied (`reg_mf=0`, `reg_layers=[0,0,0,0]`).


**2) Δείξτε πώς επηρεάζεται το HR@10, μεταβάλλοντας τα MLP layers από 1 έως 3 με βήμα 1 για
NeuMF (i) με pretraining και (ii) χωρίς pretraining
(1 Μονάδα)**

Summary Results (Mean ± Std over 10 runs):

|Configuration            |  HR@10            | NDCG@10         |
|-------------------------|-------------------|-----------------|
|  1 Layer  No Pretrain   | 0.6735 ± 0.0080   | 0.3925 ± 0.0079 |
|  1 Layer  With Pretrain | 0.6720 ± 0.0104   | 0.3923 ± 0.0048 |
|  2 Layers No Pretrain   | 0.6743 ± 0.0084   | 0.3857 ± 0.0052 |
|  2 Layers With Pretrain | 0.6717 ± 0.0085   | 0.3898 ± 0.0034 |
|  3 Layers No Pretrain   | 0.6657 ± 0.0076   | 0.3859 ± 0.0051 |
|  3 Layers With Pretrain | 0.6665 ± 0.0022   | 0.3878 ± 0.0008 |


**3) Δείξτε πώς επηρεάζεται ο αριθμός των παραμέτρων (weight parameters), μεταβάλλοντας τα MLP
layers από 1 έως 3 με βήμα 1 για NeuMF χωρίς pretraining
(1 Μονάδα)**


### ✅ Summary Table

Here’s how you can present your result clearly:

| MLP Layers     | Total Parameters |
| -------------- | ---------------- |
| `[16]`         | 63,033           |
| `[32, 16]`     | 84,561           |
| `[64, 32, 16]` | 128,641          |

---


Increasing the number of MLP layers in NeuMF significantly increases the total number of trainable parameters. This is primarily due to the additional weights introduced in the fully connected layers of the MLP path. Specifically, the parameter count rises from 63K with one layer to nearly 129K with three layers. This affects model complexity, memory usage, and may influence overfitting behavior depending on the dataset size.

---

### 📌 Optional (if you want to be thorough):

You could break down the components as:

* **Embedding Parameters** (constant across runs):

  * MF: `user × 16 + item × 16` ≈ 15088 + 26912
  * MLP: size depends on `[first_layer] // 2`
* **MLP Dense Layers** (main difference)
* **Final prediction layer**

But for most reports, the **total parameter count** is enough.


# 4 
**Δείξτε στα αντίστοιχα 3 σχήματα (δες σχήμα 6 του άρθρου) πώς επηρεάζεται (i) training loss, (ii)
HR@10 και (iii) NDCG@10 για κάθε iteration/epoch όταν γίνεται η εκπαίδευση του μοντέλου.
(1,5 Μονάδες)**

## Done up to 8

------
9) Υλοποιήστε το Non-negative Matrix Factorization (NMF)
(https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html ), και βρείτε το
καλύτερο parameter setting, μεταβάλλοντας τoν αριθμό των latent factors από 1 έως 30 με βήμα 5,
δείχνοντας την επιρροή στο NDCG @10. Προσοχή θα πρέπει να προσαρμόσετε την υλοποίηση του
NMF στο πρόβλημα του top-k recommendation, όπως γίνεται στην υλοποίηση του NeuMF που
χρησιμοποιείτε.