Mugdho & Imtiaz (2023) address a core privacy risk in recommendation systems: if someone gains access to a trained model, they may be able to reverse-engineer individual users' rating histories. Their solution is a differentially private variant of matrix factorization, where the rating matrix is approximated as a product of a movie-profile matrix X and a user-profile matrix Theta. At every training iteration, Gaussian noise is injected into the user gradient with standard deviation σ = (τ·C/ε_i)·√(2·ln(1.25/δ)), masking each user's individual contribution. Privacy costs accumulate across iterations and are tracked via Rényi DP composition, giving a closed-form overall ε. The paper studies the resulting privacy-utility tradeoff across three datasets — MovieLens 1M, Netflix Prize, and Anime Recommendations — sweeping per-iteration privacy budget ε_i, latent dimension n, and step size μ.
We reconstructed the full implementation from scratch using PyTorch for GPU acceleration, since the paper only provides pseudocode. This involved building sparse-to-dense data pipelines, a pause/resume training system, and a separate Netflix script to handle the dataset's size within 8GB VRAM. All 15+ figures from the paper are reproduced and compared side-by-side below.
Replication of:
"Privacy-Preserving Matrix Factorization for Recommendation Systems using Gaussian Mechanism" Mugdho & Imtiaz (2023) — arXiv:2304.09096
.
├── src/
│ ├── recommender.py # MovieLens + Anime — training, experiments, recommendations
│ └── netflix_experiments.py # Netflix — separate script (large dataset)
│
├── data/
│ └── Dataset.txt # Dataset download instructions / links
│
├── docs/
│ ├── Paper 23.pdf # Original paper (Mugdho & Imtiaz, 2023)
│ └── summary.docx # Project summary document
│
├── results/ # All figures (paper screenshots + our reproductions)
├── requirements.txt
├── .gitignore
└── README.md
V ≈ X @ Theta.T — Rating matrix decomposed into movie profiles X and user profiles Theta.
Gaussian noise added only to the user gradient each iteration to protect user privacy:
σ = (τ · C / ε_i) · √(2 · ln(1.25/δ))
grad_Theta_hat = grad_Theta + η, η ~ N(0, σ²I)
Overall privacy budget (Theorem 2 / Eq. 9):
ε_opt = J·ε_i² / [4·ln(1.25/δ)] + 2·√(J·ε_i²·ln(1/δ_r) / [4·ln(1.25/δ)])
| Dataset | Movies | Users | Ratings | Density | Rating Range | τ |
|---|---|---|---|---|---|---|
| MovieLens 1M | 3,706 | 6,040 | ~1M | 4.47% | 1–5 | 4 |
| Netflix Prize | ~5,466 | ~11,345 | ~5.36M | 8.65% | 1–5 | 4 |
| Anime Reco | 2,772 | 4,623 | ~1.54M | 12.03% | 1–10 | 9 |
| Parameter | Default | Sweep values |
|---|---|---|
| n | 20 | 20, 50, 100, 200, 320, 500 |
| ε_i | 0.30 | 0.15, 0.20, 0.30, 0.50, 0.75, 0.90 |
| μ | 0.0005 | 0.0005, 0.0008, 0.0010, 0.0013 |
| δ | 0.01 | — |
| δ_r | 1e-5 | — |
| λ | 0.01 | — |
| C | 1.0 | — |
# Install dependencies
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install scipy pandas matplotlib
# Run MovieLens + Anime
python src/recommender.py
# Run Netflix (separate — large dataset)
python src/netflix_experiments.pyfrom recommender import RecommenderSystem
rec = RecommenderSystem(dataset="movielens", private=True, eps_i=0.30)
rec.load_data()
rec.train()
rec.save()
rec.recommend(user_id=1, top_k=10) # Top-10 movies for a user
rec.similar_movies(movie_id=1, top_k=5) # Movies similar to a given movieLeft = Paper (Mugdho & Imtiaz, 2023) | Right = Our Reproduction
RMSE decreases over iterations. Lower ε_i = more noise = slower convergence and higher final RMSE.
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
As ε_i increases: RMSE falls (better accuracy) while overall ε rises (less private). Classic privacy-utility tradeoff.
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
RMSE first improves as n increases (richer model), then degrades as noise grows with matrix size.
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
| Paper | Ours |
|---|---|
![]() |
![]() |
Overall privacy budget grows with n — larger latent dimension needs more iterations, accumulating more privacy cost.
| Paper | Ours |
|---|---|
![]() |
![]() |
Combined view: RMSE (solid) and overall ε (dashed) vs n on same plot.
| Paper | Ours |
|---|---|
![]() |
![]() |
Computation time scales with n. Private and non-private are nearly identical — noise addition is negligible vs matrix multiplication.
| Paper | Ours |
|---|---|
![]() |
![]() |
Fig 1–3 (RMSE vs Iterations): Our reproduction captures the same convergence shape. RMSE values are slightly higher than the paper due to dataset size differences and the 90/10 test split penalising generalisation.
Fig 4–6 (Privacy-Utility Tradeoff): Both paper and ours show the same X-shaped crossing of RMSE and overall ε. The tradeoff is faithfully reproduced across all three datasets.
Fig 7–9 (RMSE vs n): Paper shows RMSE decreasing monotonically (sign of train-set overfitting). Our results show RMSE rising after n=20–50, reflecting proper test-set generalisation behaviour.
Fig 10–12 (Overall ε vs n): Paper shows ε rising with n. Our Netflix result is flat because n was capped at 100 due to 8GB VRAM constraint.
Fig 16–18 (Time vs n): Paper shows linear growth. Our Netflix times are flatter due to chunked implementation.
- GPU-accelerated via PyTorch (NVIDIA RTX 4060 8GB)
- Netflix uses chunked matrix multiplication (100 rows/pass) to stay within 8GB VRAM
- Raw ratings — no mean-centering (matches paper)
- Unit-norm row initialization as per Algorithm 1
- 90/10 train/test split (paper does not specify; standard practice applied)
- Netflix n-sweep limited to n ≤ 100 due to GPU memory constraint
Riyansh Saxena BTech — Year 3, Semester 6 Recommendation Systems Project
Mugdho, S. A., & Imtiaz, H. (2023). Privacy-Preserving Matrix Factorization for Recommendation Systems using Gaussian Mechanism. arXiv:2304.09096.























