`Awesome object-centric learning (OCL)`

The list will be continually updated. Stay tuned!

OCL models decompose and reconstruct the synthetic/real-world scenes via learning multiple disentangled abstract representations, which interpret multiple levels of object-centric concepts, in a fully unsupervised manner.

< Last updated: Aug/07/2023 >

Table of Contents

Datasets and Benchmarks
- Synthetic Data
- Real-World Data
Methodologies
Contact

Datasets & Benchmarks

Synthetic Data

Name	Source
Multi-Object Datasets	Link

Real-World Data

Name	Source
PASCAL VOC, COCO	Link
CUB200 Birds, Stanford Dogs, Stanford Cars, and Caltech Flowers	Link
YCB,ScanNet and COCO	Link

Methodologies

Year 2024

Year	Publication	Title	Source	Forum	Real-World?	Using VLMs?
2024	ArXiv	Representation Alignment for Generation: Training Diffusion Transformers is Easier Than You Think _{REPA: Aligning DINO and DiT features to potentially enable scalable object-centric learning.}	Code		✅

Year 2023

Year	Publication	Title	Source	Forum	Real-World?	Using VLMs?
2023	ICCV	Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models _{UCCD: Discovering a set of compositional concepts given a dataset of unlabeled images.}	Code		✅	✅
2023	ICML	Composer: Creative and Controllable Image Synthesis with Composable Conditions _{Composer: Learning multiple concepts of the given real-world image and synthesize a new one by altering and compose them.}		PMLR	✅	✅
2023	ICML	Provably Learning Object-Centric Representations _{ProvablyOCL: Analyzing when object-centric representations can be learned without supervision and introducing two assumptions, compositionality and irreducibility, to prove that ground-truth object representations can be identified.}	Code	PMLR
2023	ICML	Unlocking Slot Attention by Changing Optimal Transport Costs _{MESH: A cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport.}		PMLR
2023	ICML	Slot-VAE: Object-Centric Scene Generation with Slot Attention _{SlotVAE: A generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation.}		PMLR
2023	ICML	Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames _{InvSlotAttns: Incorporating equivariance to per-object pose transformations into the attention and generation mechanism of Slot Attention by translating, scaling, and rotating position encodings.}	Code	PMLR	✅
2023	ICML	An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning _{OCRL: Examining critical aspects of incorporating object-centric representation pre-training in reinforcement learning, such as performance in visually complex environments and the selection of an appropriate pooling layer for aggregating object representations.}	Code	PMLR
2023	ICML	Discovering Object-Centric Generalized Value Functions From Pixels _{OCGVFs: Introducing a method that tries to discover meaningful features from objects, translating them to temporally coherent ‘question’ functions and leveraging the subsequent learned general value functions for control.}		PMLR
2023	UAI	Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction _{TCGM-OCL: Introducing a time-conditioned generative model for videos.}		PMLR
2023	CVPR	Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models _{PHYCINE: A system that infers physical concepts in different abstract levels without supervision.}		CVF Proceedings
2023	CVPR	Object Discovery from Motion-Guided Tokens _{MoTok: Enabling the emergence of interpretable object-specific mid-level features, demonstrating the benefits of motion-guidance (no labeling) and quantization (interpretability, memory efficiency).}	Code	CVF Proceedings	✅
2023	CVPR	Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning _{SLASH: Consisting of two simple-yet-effective modules on top of Slot Attention.}	Code	CVF Proceedings
2023	CVPR	Multi-Object Manipulation via Object-Centric Neural Scattering Functions _{OSFs-MOM: Combining object-centric neural scattering functions with inverse parameter estimation, and graph-based neural dynamics models.}	Code	CVF Proceedings	✅
2023	ICLR	Bridging the Gap to Real-World Object-Centric Learning _{DINOSAUR: Using slot attention with self-supervised DINO features to discover objects on real-world data.}	Code	OpenReview	✅
2023	ICLR	Improving Object-centric Learning with Query Optimization _{BO-QSA: Extending slot attention, outperforming previous baselines on both synthetic and real images.}	Code	OpenReview	✅
2023	ICLR	Learning to Reason over Visual Objects _{STSN: Combining slot attention, an objectcentric encoding method, and a transformer reasoning module.}		OpenReview
2023	ICLR	Learning What and Where: Disentangling Location and Identity Tracking Without Supervision _{Loci: An unsupervised disentangled location and identity tracking system, which excels on the CATER and related object tracking challenges featuring emergent object permanence and stable entity disentanglement via fully unsupervised learning.}	Code	OpenReview	✅
2023	ICLR	Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement _{NCS: Demonstrating how to generalize over a combinatorially large space of rearrangement tasks from only pixel observations by constructing from video demonstrations a factorized transition graph over entity state transitions that we use for control.}	Code	OpenReview
2023	ICLR	Robust and Controllable Object-Centric Learning through Energy-based Models _{EGO: A conceptually simple and general approach to learning object-centric representation through energy-based model.}		OpenReview
2023	ICLR	Neural Groundplans: Persistent Neural Scene Representations from a Single Image _{GroundPlans: Training a self-supervised model that learns to map a single image to a 3D representation of the scene, with separate components for the immovable and movable 3D regions.}	Code	OpenReview
2023	ICLR	Neural Systematic Binder _{NSB: Proposing a novel object-centric representation called block-slots, which unlike the conventional slots, provides within-slot disentanglement via vector-formed factor representations.}	Code	OpenReview
2023	ICLR	SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models _{SlotFormer: Proposing a general Transformer-based dynamic model to enable consistent future prediction in object-centric models.}	Code	OpenReview
2023	CLeaR	Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning _{CausalTriplet: Presenting a causal representation learning benchmark that is close to realistic settings and empirically demonstrate the strengths and weaknesses of recent hypotheses and methods.}	Code	OpenReview	✅
2023	NeurIPS	SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models _{SlotDiff: An object-centric latent diffusion model designed for both synthetic/real-world image and video data.}	Code		✅
2023	arXiv	Object-Centric Slot Diffusion _{LSD: Replacing the conventional slot decoders with a latent diffusion model conditioned on the object slots.}	Code		✅
2023	arXiv	Sensitivity of Slot-Based Object-Centric Models to their Number of Slots _{NumSlots: Proposing to use analogs to precision and recall based on the Adjusted Rand Index to accurately quantify model behavior over a large range of slots.}
2023	arXiv	Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior _{LSP: Incorporating a spatial-locality prior into state-of-the-art object-centric vision models, and obtaining significant improvements in segmenting objects in both synthetic and real-world datasets.}			✅
2023	arXiv	Unsupervised Open-Vocabulary Object Localization in Videos _{RWV-OCL: Proposing an unsupervised approach to localize and name objects in real-world videos.}			✅	✅

Year 2022

Year	Publication	Title	Source	Forum	Real-World?	Using VLMs?
2022	TMLR	Complex-Valued Autoencoders for Object Discovery _{CAE: Introducing complex-valued activations into a convolutional autoencoder, it learns to encode feature information in the activations’ magnitudes and object affiliation in their phase values.}	Code
2022	NeurIPSw	Object-Centric Causal Representation Learning _{CausalOCL: Advancing causal representation learning by developing an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties.}		OpenReview
2022	NeurIPSw	Unlocking Slot Attention by Changing Optimal Transport Costs _{SA-MESH: Slot attention can do tiebreaking by changing the costs for optimal transport to minimize entropy, which improves results significantly on object detection.}	Code	OpenReview
2022	NeurIPS	Visual Concepts Tokenization _{VCT: Proposing an unsupervised transformer-based Visual Concepts Tokenization framework, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept.}	Code	OpenReview	✅
2022	NeurIPS	Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns _{PPMP: Segmenting independent objects in still images by predicting regions that contain motion patterns likely to arise from such objects.}	Code	OpenReview	✅
2022	NeurIPS	Unsupervised Causal Generative Understanding of Images _{UCGU: A framework for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images.}		OpenReview
2022	NeurIPS	SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos _{SAVi++: An object-centric video model which is trained to predict depth signals from a slot-based video representation. SAVi++ is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset.}	Code	OpenReview	✅
2022	NeurIPS	Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images _{UnsupObjSeg: Training more than 200 models to demonstrate that current unsupervised methods cannot segment generic objects from real-world single images, unless the complex objectness biases are removed.}	Code	OpenReview	✅
2022	NeurIPS	Object Scene Representation Transformer _{OSRT: Proposing Object Scene Representation Transformer, a highly efficient 3D-centric model in which individual object representations naturally emerge through novel view synthesis.}	Code	OpenReview
2022	NeurIPS	Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation _{iSlotAttns: Improving the training of object-centric learning methods by applying implicit differentiation to slot attention.}	Code	OpenReview
2022	NeurIPS	Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos _{STEVE: A simple fully unsupervised model for object-centric learning in complex and naturalistic videos.}	Code	OpenReview	✅
2022	ICML	Unsupervised Image Representation Learning with Deep Latent Particles _{DLP: Decomposing the visual input into low-dimensional latent particles, where each particle is described by its spatial location and features of its surrounding region.}	Code	PMLR	✅
2022	ICML	Toward Compositional Generalization in Object-Oriented World Modeling _{HOWM: Formalizing the compositional generalization problem with an algebraic approach and studying how a world model can achieve that.}	Code	PMLR
2022	ICML	COAT: Measuring Object Compositionality in Emergent Representations _{COAT: Directly measuring compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models.}		PMLR
2022	ICML	Generalization and Robustness Implications in Object-Centric Learning _{OCLLib: when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts.}	Code	PMLR
2022	CVPR	HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network _{HPCapsule: Extending the application of capsule networks from digits to human faces and takes a step forward to show how the neural networks understand homologous objects without human intervention.}		CVF Proceedings	✅
2022	CVPR	Discovering Objects that Can Move _{ObjMove: Simplifying auto-encoders' architecture, and augmenting the resulting model with a weak learning signal from general motion segmentation algorithms.}	Code	CVF Proceedings	✅
2022	ICLRw	Towards Self-Supervised Learning of Global and Object-Centric Representations _{SSL-OCL: Discussing the interplay of attention, global and per-object contrastive losses, and data augmentation for learning object representations through self-supervision.}	Code	OpenReview
2022	ICLR	Illiterate DALL-E Learns to Compose _{SLATE: To learn compositional slot-based representation of an image and perform slot composition for zero-shot novel image generation.}	Code	OpenReview	✅	✅
2022	ICLR	Conditional Object-Centric Learning from Video _{SAVi: A sequential extension to Slot Attention.}	Code	OpenReview
2022	ICLR	Unsupervised Discovery of Object Radiance Fields _{uORF: Inferring object-centric factorized 3D scene representations from a single image, learned without 3D geometry or segmentation supervision.}	Code	OpenReview	✅
2022	ICLR	Evaluating Disentanglement of Structured Representations _{SLR-Metric: Introducing the first metric for evaluating disentanglement at individual hierarchy levels of a structured latent representation, and applying it to object-centric generative models.}		OpenReview
2022	CLeaR	VIM: Variational Independent Modules for Video Prediction _{VIM: Defining an object-centric video prediction model that learns modular object dynamics and displays good compositional generalization skills.}		OpenReview
2022	SIGGRAPH	Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation _{ToonDecompose: Decomposing a cartoon animation into several components (a.k.a., "Sprites" in terminology), where the optical flow is the only external prior used for model training.}	Code		✅

Year 2021

Year	Publication	Title	Source	Forum	Real-World?	Using VLMs?
2021	NeurIPS	Unsupervised Foreground Extraction via Deep Region Competition _{DRC: An unsupervised foreground extraction algorithm to unseen objects, in both synthetic and low-resolution real-world scenes.}	Code	OpenReview	✅
2021	NeurIPS	SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition _{SIMONe: A video scene model which separates the time-invariant, object-level contents of the scene from global time-varying elements such as viewpoint.}	Code	OpenReview
2021	NeurIPS	Object-Centric Representation Learning with Generative Spatial-Temporal Factorization _{DyMON: Extending unsupervised object-centric representation learning to multi-view-dynamic-scene settings.}		OpenReview	✅
2021	NeurIPS	Neural Production Systems _{NPS: Modelling sparse interactions among seperate entities using dynamically selected rules.}		OpenReview
2021	NeurIPS	MarioNette: Self-Supervised Sprite Learning _{MarioNette: Jointly learning a dictionary of texture patches and training a network that places them onto a canvas, effectively deconstructing sprite-based content video content.}	Code	OpenReview	✅
2021	NeurIPS	GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement _{GENESISv2: Presenting an improved object-centric generative model of visual scenes that uses a stochastic clustering algorithm for inferring object representations without imposing a fixed ordering on objects or using iterative refinement.}	Code	OpenReview	✅
2021	NeurIPS	Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language _{VRDP: A unified framework to learn visual concepts and infer physics models of objects and their interactions jointly from videos and language.}	Code	OpenReview	✅	✅
2021	NeurIPS	Attention over Learned Object Embeddings Enables Complex Visual Reasoning _{ALOE: A general framework of attention over learned object embeddings outperforms task-specific models on complex visual reasoning tasks thought to be too challenging for general models.}	Code	OpenReview
2021	ICML	Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations _{EfficientMORL: A framework for efficient multi-object representation learning consisting of a hierarchical VAE and a lightweight network for iterative refinement.}	Code	PMLR
2021	ICCV	PARTS: Unsupervised segmentation with slots, attention and independence maximization _{PARTS: Introducing a recurrent slot-attention like encoder which allows for top-down influence during inference, to both 3D synthetic and real-world robotics' scenes.}		CVF Proceedings	✅
2021	ICLR	Self-supervised Visual Reinforcement Learning with Object-centric Representations _{SMORL: The combination of object-centric representations and goal-conditioned attention policies helps autonomous agents to learn useful multi-task policies in visual multi-object environments.}	Code	OpenReview

Year 2020

Year	Publication	Title	Source	Forum	Real-World?
2020	NeurIPS	Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views _{MulMON: Extending IODINE with Generative Query Network (GQN)-based module, for unsupervised object segmentation in 2D multi-view synthetic data.}	Code	NeurIPS Proceedings
2020	NeurIPS	Object-Centric Learning with Slot Attention _{SLotAttns: Learning abstract representations of Convolutional Neural Networks (CNNs) for unsupervised object segmentation in 2D synthetic data.}	Code	NeurIPS Proceedings
2020	NeurIPS	Unsupervised object-centric video generation and decomposition in 3D _{O3V: Generation and decomposition of 3D synthetic scenes with 2D synthetic videos.}	Code	NeurIPS Proceedings
2020	NeurIPS	BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images _{BlockGAN: Representing 3D synthetic objects using GANs based on 2D synthetic data.}	Code	NeurIPS Proceedings	✅
2020	ICLR	GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations _{GENESIS: Modeling relationship of scene components for decomposition and generation of 3D synthetic scenes.}	Code	OpenReview
2020	ICLR	Structured Object-Aware Physics Prediction for Video Modeling and Planning _{STOVE: Proposing a structured object-aware video prediction model, which explicitly reasons about objects and demonstrate that it provides high-quality long term video predictions for planning.}	Code	OpenReview
2020	ICLR	SCALOR: Generative World Models with Scalable Object Representations _{SCALOR: Generation of both synthetic and low-resolution real-world scenes where a large number of objects exist.}	Code	OpenReview	✅
2020	ICML	Improving Generative Imagination in Object-Centric World Models _{G-SWM: Modeling multi-modal uncertainty and situation-awareness for 3D synthetic scene generation.}	Code	PMLR

Year 2019

Year	Publication	Title	Source	Forum	Real-World?	Using VLMs?
2019	ICML	Multi-Object Representation Learning with Iterative Variational Inference _{IODINE: Learning multi-object disentangled representations for unsupervised object segmentation in 2D synthetic data.}	Code	PMLR	✅
2019	arXiv	MONet: Unsupervised Scene Decomposition and Representation _{MoNet: Training a Variational Autoencoder (VAE) and a recurrent attention network to decompose and represent 3D synthetic scenes.}	Code

Contact

Feel free to drop an e-mail to yi23zhang.2022@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Awesome object-centric learning (OCL)`

Datasets & Benchmarks

Synthetic Data

Real-World Data

Methodologies

Year 2024

Year 2023

Year 2022

Year 2021

Year 2020

Year 2019

Contact

About

Releases

Packages

Jun-Pu/Awesome-Object-Centric-Learning

Folders and files

Latest commit

History

Repository files navigation

Awesome object-centric learning (OCL)

Datasets & Benchmarks

Synthetic Data

Real-World Data

Methodologies

Year 2024

Year 2023

Year 2022

Year 2021

Year 2020

Year 2019

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

`Awesome object-centric learning (OCL)`

Packages