Skip to content

AbdelStark/awesome-jepa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome JEPA Awesome

A curated list of papers, models, code, datasets, and learning resources for Joint Embedding Predictive Architectures (JEPA), the self-supervised approach to world models proposed by Yann LeCun.

JEPA learns by predicting representations rather than reconstructing pixels or tokens. This page collects the canonical work from Meta FAIR alongside the wider research that has grown around it. Every link was checked and every attribution verified against primary sources in June 2026.

Contents

What is JEPA?

A Joint Embedding Predictive Architecture predicts the representation of a target signal from the representation of a context signal, entirely in an abstract latent space. Where generative models reconstruct every pixel or token, a JEPA predicts features, so it can discard unpredictable detail and keep the structure that matters for understanding, reasoning, and planning.

A JEPA has three parts: a context encoder, a target encoder, and a predictor that maps context embeddings to predicted target embeddings. Predicting in embedding space admits a trivial solution where everything collapses to a constant, so JEPAs use an asymmetry to prevent this, such as a stop-gradient target encoder updated as an exponential moving average, or an explicit variance and covariance penalty.

This design is the centerpiece of LeCun's proposal for autonomous machine intelligence, where an agent learns a predictive world model in representation space and plans by searching for actions that lead to desired predicted states. The family began with images (I-JEPA) and video (V-JEPA, V-JEPA 2) and now reaches audio, point clouds, graphs, time series, and many scientific domains.

Foundations

Core Architectures

The canonical JEPA line from Meta FAIR.

Theory, Analysis, and Recipes

Variants by Domain

Audio and Speech

3D and Point Clouds

Graphs and Molecules

Time Series and Tabular Data

Medical Imaging and Biosignals

Earth Observation and Remote Sensing

Language and Recommendation

Generative Modeling

World Models, Robotics, and Planning

Models and Weights

Code and Frameworks

Datasets

  • ImageNet. The image pretraining corpus for I-JEPA. See the ILSVRC paper (Olga Russakovsky et al., 2014).
  • Kinetics (Will Kay et al., 2017). Human action video dataset used to pretrain V-JEPA. Downloader: cvdfoundation/kinetics-dataset.
  • Something-Something v2 (Raghav Goyal et al., 2017). Fine-grained motion video dataset used to evaluate V-JEPA models.
  • EPIC-KITCHENS-100 (Dima Damen et al., 2020). Egocentric video used for action anticipation.
  • DROID. A large in-the-wild robot manipulation dataset used in JEPA world-model planning.

Benchmarks

Physical-reasoning benchmarks released with V-JEPA 2.

  • IntPhys 2 (Florian Bordes et al., 2025). Measures whether a model can tell physically plausible scenes from implausible ones.
  • Minimal Video Pairs (MVPBench) (Benno Krojer et al., 2025). A shortcut-aware video question-answering benchmark for physical understanding.
  • CausalVQA (Aaron Foss et al., 2025). Tests physical cause-and-effect reasoning in video models.

Talks and Lectures

Courses

Articles and Explainers

Contributing

Contributions are welcome. Please open a pull request that follows the existing format: link to the primary source, attribute the first author and year accurately, and write one factual sentence describing the resource. Verify that every link resolves and that arXiv identifiers match the cited title before submitting.

License

CC0

To the extent possible under law, the contributors have waived all copyright and related or neighboring rights to this work.

About

Curated resources for JEPA (Joint Embedding Predictive Architecture) world models and self-supervised learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors