loopback

Two-tower neural recommender for music, built from scratch in PyTorch.

What this is

An open-source implementation of the canonical two-tower retrieval architecture used in production at YouTube, Spotify, and other recommenders. Trained on the Last.fm 1K users dataset (19M listening events).

Paper-style writeup: docs/PAPER.md — abstract, method, math, results, limitations.

The goal: a reference implementation that's readable, reproducible, and honest about the math — contrastive loss, in-batch negatives, sampled softmax, and FAISS retrieval explained from first principles.

Architecture

User tower:    user_id  ──► Embedding ──► MLP ──► user_vec  (dim=128)
                                                       │
                                          dot product  │
                                                       │
Track tower:   track_id ──► Embedding ──► MLP ──► track_vec (dim=128)
                + artist_id, audio features

Trained with InfoNCE contrastive loss using in-batch negatives. At inference, all track vectors are indexed in FAISS for sub-millisecond top-K retrieval.

Quickstart

uv sync
uv run python -m loopback.data prepare    # download + process Last.fm 1K
uv run python -m loopback.train           # train two-tower
uv run python -m loopback.eval            # recall@10, recall@50

Dataset

Last.fm 1K users — 19M listening events from 1000 users (2005-2009). Schema: (user_id, timestamp, artist_mbid, artist_name, track_mbid, track_name).

Temporal split: oldest 80% train, next 10% val, last 10% test.

After processing: 992 users · 1.5M unique tracks · 174K artists · 19.1M interactions (15.3M train / 1.9M val / 1.9M test).

Results

3 epochs, batch 4096, embed dim 128, in-batch negatives, AdamW lr=1e-3, Apple M-series MPS, ~9 min/epoch.

Metric	Value	Random baseline
Recall@10	0.0708	6.7e-6
Recall@50	0.2172	3.3e-5
Recall@100	0.3140	6.7e-5

Evaluated on 847 test users with seen-track filtering against the full 1.5M-track catalog.

Math: why InfoNCE with in-batch negatives works

Given a batch of B (user, positive-track) pairs, we compute the B × B similarity matrix S where S[i,j] = u_i · t_j. The diagonal is the positive pair, off-diagonal entries are treated as negatives — every other track in the batch is an "implicit" negative for user i.

The loss is symmetric cross-entropy on S (user→track) and S.T (track→user), exactly the CLIP objective. With a learnable temperature, the model learns how peaked the softmax should be. This avoids ever materializing the full 1.5M-track softmax denominator that would make naive maximum-likelihood training infeasible.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
scripts		scripts
space		space
src/loopback		src/loopback
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

loopback

What this is

Architecture

Quickstart

Dataset

Results

Math: why InfoNCE with in-batch negatives works

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

loopback

What this is

Architecture

Quickstart

Dataset

Results

Math: why InfoNCE with in-batch negatives works

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages