# Collaborative Filtering Recommender

This notebook demonstrates how to train the `src.recommendation` ALS pipeline on a small sample and how to query for user recommendations.

In [None]:
from pyspark.sql import SparkSession
from src import recommendation

spark = SparkSession.builder.master("local[*]").appName("movielens-demo").getOrCreate()

In [None]:
ratings = spark.createDataFrame([
    ("1", "10", 4.0, 1000),
    ("1", "20", 3.5, 1001),
    ("2", "10", 4.5, 1002),
    ("2", "30", 4.0, 1003),
    ("3", "20", 2.5, 1004),
    ("3", "30", 4.0, 1005),
], ["userId", "movieId", "rating", "timestamp"])

movies = spark.createDataFrame([
    ("10", "The First Movie", "Adventure"),
    ("20", "The Second Movie", "Comedy"),
    ("30", "The Third Movie", "Drama"),
], ["movieId", "title", "genres"])

training = recommendation.train_model(ratings, rank=4, reg_param=0.05, max_iter=6)
training.metrics

In [None]:
recs = recommendation.recommend_for_user(training, "1", top_n=3, movies=movies)
recs.orderBy("score", ascending=False).toPandas()

When running inside the Docker cluster, point `load_ratings` to the HDFS CSV paths, or execute the CLI command:

```bash
python -m src.recommendation --user-id 123 --top-n 10
```