Skip to content

MichaelFirstAC/MovieCatalog

Repository files navigation

Movie Recommender and Catalog System

Introduction

This project is a web-based movie recommendation application developed as a final project for the Fundamentals of Data Science. It addresses the "choice overload" and "cold-start" problems common in streaming platforms by providing intelligent, content-based suggestions without requiring user history.

The system utilizes a hybrid approach:

  1. Weighted Content-Based Filtering: Uses TF-IDF Vectorization and Cosine Similarity to find semantically similar movies, with higher weights assigned to Directors and Cast to capture "auteur" style and star power.
  2. Bayesian Quality Scoring: Implements the IMDb weighted rating formula to ensure that recommended movies are statistically high-quality, balancing raw ratings with vote counts.

Key Features

  • Content-Based Recommender: Suggests movies based on a "Weighted Soup" of metadata (Director x3, Cast x2, Keywords x1, Genres x1).
  • Browse by Star: Search for movies featuring specific Actors or Directors. Includes a "Cold Start" fix that suggests popular stars if the search is empty.
  • Surprise Me!: A discovery feature that randomly selects a high-quality movie from the top-rated 500 films.
  • Smart Catalog: A full, paginated library of over 4,800 movies, filterable by Genre and sorted by Bayesian Quality Score.
  • Interactive Metadata: All Directors, Cast members, and Genres are clickable, allowing seamless navigation to related content.
  • Modern UI: A responsive, dark-themed interface built with Tailwind CSS.

Tech Stack

  • Backend: Python, Flask
  • Data Manipulation: Pandas, NumPy
  • Machine Learning: Scikit-learn (TF-IDF, Cosine Similarity)
  • Frontend: HTML5, Tailwind CSS (via CDN)

Running the Application

Prerequisites

Before you begin, ensure you have the following installed on your system:

  • Python 3.x (This program is made in 3.14, but any version of 3.x python should work.)
  • The pip package manager

Installation and Setup

Follow these steps to set up and run the application locally.

  1. Clone the Repository
git clone [https://github.com/MichaelFirstAC/MovieCatalog.git](https://github.com/MichaelFirstAC/MovieCatalog.git)
cd MovieCatalog
  1. Install Dependencies Install the required Python libraries using pip:
pip install flask pandas scikit-learn
  1. Prepare the Data and Model (One-Time Setup)

The application requires the raw CSV files (tmdb_5000_movies.csv and tmdb_5000_credits.csv) to be present in the root directory.

  • Note: If these files are zipped (archive.zip), please extract them into the root folder first.

Run the prepare_model.py script. This script will:

  • Clean and parse the JSON datasets.
  • Calculate the Bayesian Quality Score for every movie.
  • Build the TF-IDF and Cosine Similarity matrices.
  • Save the processed models (movies.pkl and cosine_sim.pkl).
python prepare_model.py
  1. Run the Web Application Once the model files are generated, start the Flask server:
python app.py

You should see output indicating the server is running, typically on http://127.0.0.1:5000/.

  1. Access the Application Open your web browser and navigate to:

http://127.0.0.1:5000/

Project Structure

  • app.py: The main Flask application containing routing logic and the recommendation engine.
  • prepare_model.py: The data pipeline script for cleaning, feature engineering, and model training.
  • templates/index.html: The unified frontend template handling all views (Home, Catalog, Browse, etc.).
  • static/: Contains CSS assets and team images.
  • movies.pkl & cosine_sim.pkl: Serialized model files generated by the preparation script.
  • OTHER FILES OTHER THAN THE ONES MENTIONED ARE NOT REQUIRED FOR THE PROGRAM TO RUN, THEY ARE ALL DOCUMENTATION FILES.

About

FINAL PROJECT OF DATA SCIENCE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •