Here we will provide the source code and supplementary material for DIRAC: Diffusion-Based Representation Learning for Modality-Agnostic Compositionality.
Please, refer to our website https://anonymsubicml24.github.io/anonymsubicml24/ for listening to the audio results.
In this paper, we target the extrapolation and out-of-distribution generation problem in generative models by introducing a generic compositional inductive bias. Leveraging state-of-the-art generative models in an encoder-decoder scheme, our approach focuses on compositional representation learning without any form of supervision. We perform experiments on image and audio data, demonstrating the adaptability of our model to different modalities and representations. Our Diffusion-based Representation Learning for Modality-Agnostic Compositionality (DIRAC), builds upon diffusion models and shows promising results in separating meaningful entities in both images and music, serving as a powerful baseline for future investigations around compositional generation and representation learning.
We will provide the code for the DIRAC model in this repository. The code will be available soon.