Skip to content

Jun-Pu/Awesome-Object-Centric-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 

Repository files navigation


Awesome object-centric learning (OCL)

Awesome

MIT License Forks Stargazers Issues LinkedIn

The list will be continually updated. Stay tuned!

OCL models decompose and reconstruct the synthetic/real-world scenes via learning multiple disentangled abstract representations, which interpret multiple levels of object-centric concepts, in a fully unsupervised manner.

< Last updated: Aug/07/2023 >

Table of Contents
  1. Datasets and Benchmarks
  2. Methodologies
  3. Contact

Datasets & Benchmarks

Synthetic Data

Name Source
Multi-Object Datasets Link

Real-World Data

Name Source
PASCAL VOC, COCO Link
CUB200 Birds, Stanford Dogs, Stanford Cars, and Caltech Flowers Link
YCB,ScanNet and COCO Link

Methodologies

Year 2024

Year Publication Title Source Forum Real-World? Using VLMs?
2024 ArXiv Representation Alignment for Generation: Training Diffusion Transformers is Easier Than You Think
REPA: Aligning DINO and DiT features to potentially enable scalable object-centric learning.
Code

Year 2023

Year Publication Title Source Forum Real-World? Using VLMs?
2023 ICCV Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
UCCD: Discovering a set of compositional concepts given a dataset of unlabeled images.
Code
2023 ICML Composer: Creative and Controllable Image Synthesis with Composable Conditions
Composer: Learning multiple concepts of the given real-world image and synthesize a new one by altering and compose them.
PMLR
2023 ICML Provably Learning Object-Centric Representations
ProvablyOCL: Analyzing when object-centric representations can be learned without supervision and introducing two assumptions, compositionality and irreducibility, to prove that ground-truth object representations can be identified.
Code PMLR
2023 ICML Unlocking Slot Attention by Changing Optimal Transport Costs
MESH: A cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport.
PMLR
2023 ICML Slot-VAE: Object-Centric Scene Generation with Slot Attention
SlotVAE: A generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation.
PMLR
2023 ICML Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
InvSlotAttns: Incorporating equivariance to per-object pose transformations into the attention and generation mechanism of Slot Attention by translating, scaling, and rotating position encodings.
Code PMLR
2023 ICML An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
OCRL: Examining critical aspects of incorporating object-centric representation pre-training in reinforcement learning, such as performance in visually complex environments and the selection of an appropriate pooling layer for aggregating object representations.
Code PMLR
2023 ICML Discovering Object-Centric Generalized Value Functions From Pixels
OCGVFs: Introducing a method that tries to discover meaningful features from objects, translating them to temporally coherent ‘question’ functions and leveraging the subsequent learned general value functions for control.
PMLR
2023 UAI Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
TCGM-OCL: Introducing a time-conditioned generative model for videos.
PMLR
2023 CVPR Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
PHYCINE: A system that infers physical concepts in different abstract levels without supervision.
CVF Proceedings
2023 CVPR Object Discovery from Motion-Guided Tokens
MoTok: Enabling the emergence of interpretable object-specific mid-level features, demonstrating the benefits of motion-guidance (no labeling) and quantization (interpretability, memory efficiency).
Code CVF Proceedings
2023 CVPR Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
SLASH: Consisting of two simple-yet-effective modules on top of Slot Attention.
Code CVF Proceedings
2023 CVPR Multi-Object Manipulation via Object-Centric Neural Scattering Functions
OSFs-MOM: Combining object-centric neural scattering functions with inverse parameter estimation, and graph-based neural dynamics models.
Code CVF Proceedings
2023 ICLR Bridging the Gap to Real-World Object-Centric Learning
DINOSAUR: Using slot attention with self-supervised DINO features to discover objects on real-world data.
Code OpenReview
2023 ICLR Improving Object-centric Learning with Query Optimization
BO-QSA: Extending slot attention, outperforming previous baselines on both synthetic and real images.
Code OpenReview
2023 ICLR Learning to Reason over Visual Objects
STSN: Combining slot attention, an objectcentric encoding method, and a transformer reasoning module.
OpenReview
2023 ICLR Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Loci: An unsupervised disentangled location and identity tracking system, which excels on the CATER and related object tracking challenges featuring emergent object permanence and stable entity disentanglement via fully unsupervised learning.
Code OpenReview
2023 ICLR Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
NCS: Demonstrating how to generalize over a combinatorially large space of rearrangement tasks from only pixel observations by constructing from video demonstrations a factorized transition graph over entity state transitions that we use for control.
Code OpenReview
2023 ICLR Robust and Controllable Object-Centric Learning through Energy-based Models
EGO: A conceptually simple and general approach to learning object-centric representation through energy-based model.
OpenReview
2023 ICLR Neural Groundplans: Persistent Neural Scene Representations from a Single Image
GroundPlans: Training a self-supervised model that learns to map a single image to a 3D representation of the scene, with separate components for the immovable and movable 3D regions.
Code OpenReview
2023 ICLR Neural Systematic Binder
NSB: Proposing a novel object-centric representation called block-slots, which unlike the conventional slots, provides within-slot disentanglement via vector-formed factor representations.
Code OpenReview
2023 ICLR SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models
SlotFormer: Proposing a general Transformer-based dynamic model to enable consistent future prediction in object-centric models.
Code OpenReview
2023 CLeaR Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning
CausalTriplet: Presenting a causal representation learning benchmark that is close to realistic settings and empirically demonstrate the strengths and weaknesses of recent hypotheses and methods.
Code OpenReview
2023 NeurIPS SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
SlotDiff: An object-centric latent diffusion model designed for both synthetic/real-world image and video data.
Code
2023 arXiv Object-Centric Slot Diffusion
LSD: Replacing the conventional slot decoders with a latent diffusion model conditioned on the object slots.
Code
2023 arXiv Sensitivity of Slot-Based Object-Centric Models to their Number of Slots
NumSlots: Proposing to use analogs to precision and recall based on the Adjusted Rand Index to accurately quantify model behavior over a large range of slots.
2023 arXiv Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
LSP: Incorporating a spatial-locality prior into state-of-the-art object-centric vision models, and obtaining significant improvements in segmenting objects in both synthetic and real-world datasets.
2023 arXiv Unsupervised Open-Vocabulary Object Localization in Videos
RWV-OCL: Proposing an unsupervised approach to localize and name objects in real-world videos.

Year 2022

Year Publication Title Source Forum Real-World? Using VLMs?
2022 TMLR Complex-Valued Autoencoders for Object Discovery
CAE: Introducing complex-valued activations into a convolutional autoencoder, it learns to encode feature information in the activations’ magnitudes and object affiliation in their phase values.
Code
2022 NeurIPSw Object-Centric Causal Representation Learning
CausalOCL: Advancing causal representation learning by developing an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties.
OpenReview
2022 NeurIPSw Unlocking Slot Attention by Changing Optimal Transport Costs
SA-MESH: Slot attention can do tiebreaking by changing the costs for optimal transport to minimize entropy, which improves results significantly on object detection.
Code OpenReview
2022 NeurIPS Visual Concepts Tokenization
VCT: Proposing an unsupervised transformer-based Visual Concepts Tokenization framework, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept.
Code OpenReview
2022 NeurIPS Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns
PPMP: Segmenting independent objects in still images by predicting regions that contain motion patterns likely to arise from such objects.
Code OpenReview
2022 NeurIPS Unsupervised Causal Generative Understanding of Images
UCGU: A framework for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images.
OpenReview
2022 NeurIPS SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
SAVi++: An object-centric video model which is trained to predict depth signals from a slot-based video representation. SAVi++ is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset.
Code OpenReview
2022 NeurIPS Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images
UnsupObjSeg: Training more than 200 models to demonstrate that current unsupervised methods cannot segment generic objects from real-world single images, unless the complex objectness biases are removed.
Code OpenReview
2022 NeurIPS Object Scene Representation Transformer
OSRT: Proposing Object Scene Representation Transformer, a highly efficient 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
Code OpenReview
2022 NeurIPS Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation
iSlotAttns: Improving the training of object-centric learning methods by applying implicit differentiation to slot attention.
Code OpenReview
2022 NeurIPS Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
STEVE: A simple fully unsupervised model for object-centric learning in complex and naturalistic videos.
Code OpenReview
2022 ICML Unsupervised Image Representation Learning with Deep Latent Particles
DLP: Decomposing the visual input into low-dimensional latent particles, where each particle is described by its spatial location and features of its surrounding region.
Code PMLR
2022 ICML Toward Compositional Generalization in Object-Oriented World Modeling
HOWM: Formalizing the compositional generalization problem with an algebraic approach and studying how a world model can achieve that.
Code PMLR
2022 ICML COAT: Measuring Object Compositionality in Emergent Representations
COAT: Directly measuring compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models.
PMLR
2022 ICML Generalization and Robustness Implications in Object-Centric Learning
OCLLib: when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts.
Code PMLR
2022 CVPR HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
HPCapsule: Extending the application of capsule networks from digits to human faces and takes a step forward to show how the neural networks understand homologous objects without human intervention.
CVF Proceedings
2022 CVPR Discovering Objects that Can Move
ObjMove: Simplifying auto-encoders' architecture, and augmenting the resulting model with a weak learning signal from general motion segmentation algorithms.
Code CVF Proceedings
2022 ICLRw Towards Self-Supervised Learning of Global and Object-Centric Representations
SSL-OCL: Discussing the interplay of attention, global and per-object contrastive losses, and data augmentation for learning object representations through self-supervision.
Code OpenReview
2022 ICLR Illiterate DALL-E Learns to Compose
SLATE: To learn compositional slot-based representation of an image and perform slot composition for zero-shot novel image generation.
Code OpenReview
2022 ICLR Conditional Object-Centric Learning from Video
SAVi: A sequential extension to Slot Attention.
Code OpenReview
2022 ICLR Unsupervised Discovery of Object Radiance Fields
uORF: Inferring object-centric factorized 3D scene representations from a single image, learned without 3D geometry or segmentation supervision.
Code OpenReview
2022 ICLR Evaluating Disentanglement of Structured Representations
SLR-Metric: Introducing the first metric for evaluating disentanglement at individual hierarchy levels of a structured latent representation, and applying it to object-centric generative models.
OpenReview
2022 CLeaR VIM: Variational Independent Modules for Video Prediction
VIM: Defining an object-centric video prediction model that learns modular object dynamics and displays good compositional generalization skills.
OpenReview
2022 SIGGRAPH Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation
ToonDecompose: Decomposing a cartoon animation into several components (a.k.a., "Sprites" in terminology), where the optical flow is the only external prior used for model training.
Code

Year 2021

Year Publication Title Source Forum Real-World? Using VLMs?
2021 NeurIPS Unsupervised Foreground Extraction via Deep Region Competition
DRC: An unsupervised foreground extraction algorithm to unseen objects, in both synthetic and low-resolution real-world scenes.
Code OpenReview
2021 NeurIPS SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
SIMONe: A video scene model which separates the time-invariant, object-level contents of the scene from global time-varying elements such as viewpoint.
Code OpenReview
2021 NeurIPS Object-Centric Representation Learning with Generative Spatial-Temporal Factorization
DyMON: Extending unsupervised object-centric representation learning to multi-view-dynamic-scene settings.
OpenReview
2021 NeurIPS Neural Production Systems
NPS: Modelling sparse interactions among seperate entities using dynamically selected rules.
OpenReview
2021 NeurIPS MarioNette: Self-Supervised Sprite Learning
MarioNette: Jointly learning a dictionary of texture patches and training a network that places them onto a canvas, effectively deconstructing sprite-based content video content.
Code OpenReview
2021 NeurIPS GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
GENESISv2: Presenting an improved object-centric generative model of visual scenes that uses a stochastic clustering algorithm for inferring object representations without imposing a fixed ordering on objects or using iterative refinement.
Code OpenReview
2021 NeurIPS Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
VRDP: A unified framework to learn visual concepts and infer physics models of objects and their interactions jointly from videos and language.
Code OpenReview
2021 NeurIPS Attention over Learned Object Embeddings Enables Complex Visual Reasoning
ALOE: A general framework of attention over learned object embeddings outperforms task-specific models on complex visual reasoning tasks thought to be too challenging for general models.
Code OpenReview
2021 ICML Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations
EfficientMORL: A framework for efficient multi-object representation learning consisting of a hierarchical VAE and a lightweight network for iterative refinement.
Code PMLR
2021 ICCV PARTS: Unsupervised segmentation with slots, attention and independence maximization
PARTS: Introducing a recurrent slot-attention like encoder which allows for top-down influence during inference, to both 3D synthetic and real-world robotics' scenes.
CVF Proceedings
2021 ICLR Self-supervised Visual Reinforcement Learning with Object-centric Representations
SMORL: The combination of object-centric representations and goal-conditioned attention policies helps autonomous agents to learn useful multi-task policies in visual multi-object environments.
Code OpenReview

Year 2020

Year Publication Title Source Forum Real-World? Using VLMs?
2020 NeurIPS Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
MulMON: Extending IODINE with Generative Query Network (GQN)-based module, for unsupervised object segmentation in 2D multi-view synthetic data.
Code NeurIPS Proceedings
2020 NeurIPS Object-Centric Learning with Slot Attention
SLotAttns: Learning abstract representations of Convolutional Neural Networks (CNNs) for unsupervised object segmentation in 2D synthetic data.
Code NeurIPS Proceedings
2020 NeurIPS Unsupervised object-centric video generation and decomposition in 3D
O3V: Generation and decomposition of 3D synthetic scenes with 2D synthetic videos.
Code NeurIPS Proceedings
2020 NeurIPS BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
BlockGAN: Representing 3D synthetic objects using GANs based on 2D synthetic data.
Code NeurIPS Proceedings
2020 ICLR GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
GENESIS: Modeling relationship of scene components for decomposition and generation of 3D synthetic scenes.
Code OpenReview
2020 ICLR Structured Object-Aware Physics Prediction for Video Modeling and Planning
STOVE: Proposing a structured object-aware video prediction model, which explicitly reasons about objects and demonstrate that it provides high-quality long term video predictions for planning.
Code OpenReview
2020 ICLR SCALOR: Generative World Models with Scalable Object Representations
SCALOR: Generation of both synthetic and low-resolution real-world scenes where a large number of objects exist.
Code OpenReview
2020 ICML Improving Generative Imagination in Object-Centric World Models
G-SWM: Modeling multi-modal uncertainty and situation-awareness for 3D synthetic scene generation.
Code PMLR

Year 2019

Year Publication Title Source Forum Real-World? Using VLMs?
2019 ICML Multi-Object Representation Learning with Iterative Variational Inference
IODINE: Learning multi-object disentangled representations for unsupervised object segmentation in 2D synthetic data.
Code PMLR
2019 arXiv MONet: Unsupervised Scene Decomposition and Representation
MoNet: Training a Variational Autoencoder (VAE) and a recurrent attention network to decompose and represent 3D synthetic scenes.
Code

Contact

Feel free to drop an e-mail to yi23zhang.2022@gmail.com

About

A curated list of researches in object-centric learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published