Feature request: implement VideoLDM-approach #462

kapsner · 2024-02-01T15:37:20Z

kapsner
Feb 1, 2024

First of all thanks for that great library, bringing generative models to the monai framework and also for the nice tutorials!

I was wondering if it would be possible to add the implementation described by Blattmann et al. in their paper "Align your Latents:
High-Resolution Video Synthesis with Latent Diffusion Models" (https://arxiv.org/abs/2304.08818) (see also https://research.nvidia.com/labs/toronto-ai/VideoLDM/), where they, as far as I understand, trained for the sake of efficiency a pre-trained 2D Autoencoder that was fine-tuned on a temporal dimension (for video-generation, which would be the z-dimension in medical images) by adding 3D layers to the decoder.

For GenerativeModels-integration, I was thinking of

a new dataloader class, that makes a 2D dataset out of 3D images by taking slice from each volume to train the 2D autoencoder
add 3D-layers to the autoencoder's decoder to patch-wise fine-tune it on 3D-data

Maybe, this approach could also help to generate synthetic medical 3D datasets in a diagnostic resolution.

Looking forward to hear your opinion on this topic.

Best, Lorenz

marksgraham · 2024-02-08T15:55:13Z

marksgraham
Feb 8, 2024
Maintainer

Hi there,

This seems possible to me. The new dataloader shouldn't be necessary, we could define dataset transforms which take a 3D volume and extracts 2D slices from it. Something along the lines of the example here where the RandSpatialCrop picks a random 2d slice from the dataset.

Adding temporal components to the DDPM and the autoencoder should be fairly doable, too.

Do you have any interest in having a go at implementing this?

Mark

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: implement VideoLDM-approach #462

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Feature request: implement VideoLDM-approach #462

kapsner Feb 1, 2024

Replies: 1 comment

marksgraham Feb 8, 2024 Maintainer

kapsner
Feb 1, 2024

marksgraham
Feb 8, 2024
Maintainer