# Paper Implementation: An Improved Collaborative Filtering Recommendation Algorithm and Recommendation Strategy

This project is based on the paper “An Improved Collaborative Filtering Recommendation Algorithm and Recommendation Strategy” by Xiaofeng Li and Dong Li ​. All research rights and intellectual property belong to the original authors under the Creative Commons Attribution License. We—Matteo and Julian, students at the University of Bolzano—have chosen this work as the foundation for a full analysis and software implementation of its proposed methods, in order to both validate and extend its contributions to community‑aware collaborative filtering.

## 1. Introduction

Li & Li (2019) address key limitations of traditional collaborative filtering (CF)—data sparsity, cold start, and scalability—by integrating overlapping community detection into the CF pipeline. They propose two algorithms to mine user communities from a social network projection of user–item interactions (central‑node‐based and k‑faction). By localizing neighbor selection within these communities and combining rating‐based similarity with category‐based similarity, they demonstrate significant reductions in MAE and RMSE on MovieLens‑100K.

## 2. Implementation Roadmap

Below is our high‑level plan to reproduce and extend Li & Li’s community‑aware CF framework:

1. **Dataset Preparation**
   - Download and preprocess the MovieLens 100K dataset.
   - Build the user–item rating matrix.

2. **Community Detection**
   1. **Central‑Node Algorithm**
      - Compute node degrees; seed each community with the highest‑degree node.
      - Iteratively add neighbors that maximize the local contribution \(q\).
      - Merge any two communities whose overlap \(S \ge 0.7\).
   2. **k‑Faction Algorithm**
      - Use Bron–Kerbosch to extract all cliques of size ≥ *k*.
      - Merge cliques based on an overlap threshold \(T\) and inter‑community connectivity.
      - Assign remaining nodes to their closest community; refine by maximizing modularity \(Q_c\).

3. **Community‑Based Collaborative Filtering**
   - For each target user, restrict neighbor search to their detected community.
   - Construct a user–category binary matrix (e.g. item genres or tags).
   - Compute hybrid similarity:
     \[
       \text{sim}(u,v) = (1 - \lambda)\,\text{sim}_R(u,v) + \lambda\,\text{sim}_{\text{cate}}(u,v).
     \]
   - Predict ratings by aggregating the top‑*K* most similar users’ ratings.

4. **Evaluation Framework**
   - Perform 5‑fold cross‑validation with varying training:test splits (20–80 %).
   - Measure MAE and RMSE for:
     - **CFCD** (Community‑based CF)
     - **CFC** (Cosine CF)
     - **CFP** (Pearson CF)

5. **Parameter Tuning & Experiments**
   - **Experiment 1:** Fix *K* = 30; vary train:test ratio → assess sparsity impact.
   - **Experiment 2:** Fix training ratio at 80 %; vary *K* → find optimal neighbor set size.

6. **Optimizations & Extensions**
   - Scale community detection to large graphs (e.g. using NetworkX/igraph).
   - Incorporate implicit feedback (timestamps, clicks).
   - Prototype a real‑time recommendation pipeline with incremental updates.
   - Explore deep‑learning–based community embeddings as an alternative to classic detection.

7. **Documentation & Reporting**
   - Write clear API docs and usage examples for each module.
   - Produce reproducible scripts and Jupyter notebooks.
   - Summarize results with tables, charts, and a discussion of future work.