This project is a modular and scalable Movie Recommendation System built using Ploomber for pipeline orchestration and DuckDB for efficient in-process analytics. It demonstrates how modern data tools can be combined to build a fast, lightweight, and reproducible machine learning pipeline for content-based movie recommendations.
- Ploomber – For defining, orchestrating, and running modular data pipelines.
- DuckDB – An in-process SQL OLAP database optimized for analytical workloads.
- Python – Core language for data processing and modeling.
- scikit-learn – For building and training the recommendation model.
- FastAPI – For serving the model via a lightweight REST API (if included).
- Docker – For containerized deployment and reproducibility.
- Cleans and transforms raw movie metadata.
- Generates movie embeddings using TF-IDF on genres, overviews, and more.
- Computes cosine similarity for movie recommendations.
- Returns top-N similar movies for a given title.
- Powered by SQL queries on DuckDB for fast, memory-efficient processing.
- Modular pipeline with Ploomber for easy debugging, testing, and extension.
This system is ideal for:
- Small to medium-scale movie recommendation tasks
- Educational purposes in data science and ML pipelines
- A starting point for building more advanced recommender systems using collaborative filtering or deep learning
docker build -t movierec:latest -f Dockerfile .
docker run -it -p 8000:8000 movierec:latest
Open new terminal window & docker ps & copy container id
docker docker exec -ti YOURCONTAINERID /bin/bash
./movies_data.duckdb
Explore the database
Navigate to http://localhost:8000 in browser
Navigate to http://localhost:8000/docs in browser
- Create new environment
conda create --name poetry-env python=3.10
- Activate environment
conda activate poetry-env
- Install poetry
pip install poetry
- Install dependencies
poetry lock
poetry install
- Run the as a Ploomber pipeline
cd mini-projects/
poetry run ploomber build