Automated aesthetic quality assessment of movie trailers. Aggregates per-frame neural aesthetic scores into a histogram feature vector and trains a linear SVM to distinguish high-rated from low-rated trailers — validated against IMDb audience ratings.
Image aesthetics (is this photo beautiful?) has been studied extensively. This project extends aesthetic analysis to video by treating a trailer as the aggregate of aesthetics across its perceptually salient key-frames:
Movie Trailer
↓
Scene Detection → Exemplar Frame Extraction
↓
Per-frame Aesthetic Score [ILGnet / NIMA, trained on AVA2]
↓
16-bin Histogram of scores [distribution over the trailer]
↓
Linear SVM → High-rated vs Low-rated trailer
The hypothesis: visually compelling trailers show a characteristic distribution of aesthetic scores — more high-quality frames, consistent composition, balanced exposure — that a simple linear classifier can detect.
flowchart TB
subgraph Acquire["Data Acquisition (scripts/)"]
IMDB["IMDb Top Lists\n(acquire_imdb_list.py)"] --> RATINGS["Movie IDs + Ratings"]
RATINGS --> DL["Download Trailers\n(acquire_trailers.py)\nYouTube Data API v3 + yt-dlp"]
end
subgraph Extract["Feature Extraction (src/)"]
DL --> SCENES["Scene Detection\nPySceneDetect content threshold=30"]
SCENES --> FRAMES["Exemplar Frame Extraction\nscene.SceneExtractor"]
FRAMES --> SCORE["Per-frame Aesthetic Score\nscorer.AestheticScorer ← ILGnet/Caffe"]
end
subgraph Classify["Classification (src/)"]
SCORE --> HIST["16-bin Aesthetic Histogram\nfeatures.build_aesthetic_histogram"]
RATINGS --> LABEL["Binary Label\nIMDb > 8.2 = high, < 2.8 = low"]
HIST --> SVM["Linear SVM 6-fold Stratified CV\nclassifier.TrailerClassifier"]
LABEL --> SVM
SVM --> ROC["ROC / AUC\nclassifier.CVResults"]
end
VideoAesthetics/
├── src/
│ └── video_aesthetics/ # Installable Python package
│ ├── __init__.py # Public API exports
│ ├── _timeout.py # Thread-based timeout decorator
│ ├── scorer.py # AestheticScorer — lazy-loading Caffe/ILGnet wrapper
│ ├── scene.py # SceneExtractor — PySceneDetect + pims frame extraction
│ ├── features.py # build_aesthetic_histogram, build_dataset, CSV I/O
│ ├── classifier.py # TrailerClassifier — linear SVM, CV, ROC/AUC
│ └── cli.py # video-aesthetics console entry point
├── scripts/
│ ├── acquire_imdb_list.py # Scrape IMDb Top/Bottom/Genre lists (cinemagoer)
│ └── acquire_trailers.py # YouTube search + yt-dlp download pipeline
├── tests/
│ ├── test_features.py # 18 tests: histogram, CSV I/O, dataset construction
│ ├── test_classifier.py # 11 tests: SVM CV, ROC, plot output
│ └── test_timeout.py # 6 tests: timelimit decorator
├── util/ # Legacy Python 2/3 scripts (reference only)
│ ├── _legacy_aesthetic_score.py
│ ├── _legacy_trailer_aesthetic_desc.py
│ ├── _legacy_find_frames.py
│ └── ...
├── aesthetic_data.csv # Sample aesthetic histogram features
├── ROC.png # ROC curve from SVM evaluation
├── pyproject.toml # Build config (hatchling), uv deps, pytest settings
└── requirements.txt # Legacy flat requirements (see pyproject.toml)
Requires Python 3.10+. Uses uv for dependency management.
git clone https://github.com/ashish-code/VideoAesthetics.git
cd VideoAesthetics
# Using uv (recommended)
uv sync --extra dev
# Or using pip
pip install -e ".[dev]"For data acquisition (IMDb + YouTube + download):
uv sync --extra acquisitionFor scene detection and frame extraction:
uv sync --extra scene# Set your YouTube Data API v3 key (never hard-code it)
export YOUTUBE_API_KEY="your-key-here"
# Scrape IMDb Top 250 movie IDs
python scripts/acquire_imdb_list.py --list top250 --output data/top250_ids.txt
# Download trailers (YouTube search → yt-dlp)
python scripts/acquire_trailers.py \
--imdb-list data/top250_ids.txt \
--output-dir data/trailer_video \
--trailer-list data/trailer_ratings.txt# Set the dataset root
export VIDEO_AESTHETICS_DATA_ROOT=/path/to/data
# Extract exemplar frames via scene detection
video-aesthetics scenes --data-root $VIDEO_AESTHETICS_DATA_ROOT
# Score frames with ILGnet (requires Caffe + model weights)
export ILGNET_ROOT=~/Repos/ILGnet
video-aesthetics score --data-root $VIDEO_AESTHETICS_DATA_ROOT
# Build the aesthetic histogram feature CSV
video-aesthetics features --data-root $VIDEO_AESTHETICS_DATA_ROOT# Train SVM and plot ROC curve
video-aesthetics classify \
--data-root $VIDEO_AESTHETICS_DATA_ROOT \
--roc-output results/ROC.pngfrom pathlib import Path
from video_aesthetics.features import build_dataset, save_feature_csv
from video_aesthetics.classifier import TrailerClassifier
# Load pre-computed features
X, y, ids = build_dataset(
rating_file=Path("data/trailer_ratings.txt"),
score_dir=Path("data/scene_aesthetics"),
)
# Cross-validate and plot ROC
clf = TrailerClassifier(n_folds=6)
results = clf.cross_validate(X, y)
print(f"Mean AUC: {results.mean_auc:.3f} ± {results.std_auc:.3f}")
clf.plot_roc(results, output_path=Path("results/ROC.png"))# Run all 35 tests
pytest
# With coverage
pytest --cov=video_aesthetics --cov-report=term-missingFrame-level aesthetic scores are computed using a pre-trained ILGnet (Image-Level Generative network) model trained on the AVA2 dataset (Murray et al., CVPR 2012). The model is loaded via the Caffe framework and outputs a continuous aesthetic quality score in [0, 1] for each frame.
Note: Caffe has no official Python 3.10+ support. The
AestheticScorerclass uses lazy model loading to avoid import-time errors when Caffe is absent. A drop-in PyTorch NIMA replacement (e.g.idealo/image-quality-assessment) can be substituted by subclassingAestheticScorer.
Set the model path via environment variable:
export ILGNET_ROOT=~/Repos/ILGnet # directory with deploy.prototxt + .caffemodel
export VIDEO_AESTHETICS_GPU=1 # optional: use GPU (requires GPU-enabled Caffe)| Source | Content | API |
|---|---|---|
| IMDb Top 250 | Movie ratings | cinemagoer (Python IMDb API) |
| YouTube Data API v3 | Trailer video URLs | YOUTUBE_API_KEY env var |
| yt-dlp | Video download | Open-source, no API key required |
Rating thresholds: Trailers from movies rated > 8.2 (high) or < 2.8 (low) are used for binary classification, focusing on the discriminative extremes of the rating distribution.
Lazy-loading wrapper for the ILGnet Caffe model.
score(image_path)→float— aesthetic quality in [0, 1].score_trailer(trailer_id, ...)→Path— score all frames, write CSV.process_trailer_list(rating_file, ...)— batch mode with skip-on-exist.
PySceneDetect-based scene boundary detection and frame extraction.
extract(trailer_id)— detect scenes, save one PNG per scene (with timeout).process_trailer_list(rating_file)— batch mode with shuffle + skip.
Convert per-frame scores to a normalised histogram feature vector.
Build feature matrix and binary labels from scored trailers.
Linear SVM with stratified k-fold cross-validation.
cross_validate(X, y)→CVResults— per-fold and aggregate ROC/AUC.evaluate(X, y)→float— accuracy on a held-out test split.plot_roc(results, output_path=None)— render and optionally save the ROC curve.
- Murray, N., Marchesotti, L., Perronnin, F. (2012). AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR.
- Talebi, H., Milanfar, P. (2018). NIMA: Neural Image Assessment. IEEE TIP.
- Gupta, A. (2018). Video Aesthetics Analysis via Aggregate Frame Aesthetic Descriptors.
MIT — see LICENSE.