MIRA - Multimodal Image Reconstruction with Attention

MIRA is a multimodal transformer (Encoder-Decoder) based architecture for Text or Image to 3D reconstruction focussing on generating the 3D representation just using single 2D image of object within seconds. Text pipeline utilizes the stable diffusion methods to generate image from prompt and passing to model after necessary preprocessing.

The architecture uses a pre-trained DINO-V2 as the image encoder and a custom triplane decoder. The decoder learns to project image features on triplane via cross-attention and model the relations among the spatially-structured triplane tokens via self-attention, camera features are modulated within the decoder.

It is highly efficient and adaptable, capable of handling a wide range of multi-view image datasets. It’s trained by minimizing the difference between the rendered images and ground truth images at novel views, without the need for excessive 3D-aware regularization or delicate hyper-parameter tuning.

Due to limited resources, I wasn't able to perform a robust training so attached samples are from the limited trained checkpoint (which is useless for public release)

Image	Prompt	3D generation
	None
	None
	A photograph of cat sitting on table
	None
	None

Setup

Clone the repository

Only For dataset preprocessing/rendering (Linux)

apt-get update -y
apt-get install -y xvfb
apt-get install libxrender1
apt-get install libxi6 libgconf-2-4
apt-get install libxkbcommon-x11-0
apt-get install -y libgl1-mesa-glx

echo "Installing Blender-4.0.2..."
wget https://ftp.nluug.nl/pub/graphics/blender//release/Blender4.0/blender-4.0.2-linux-x64.tar.xz && tar -xf blender-4.0.2-linux-x64.tar.xz && rm blender-4.0.2-linux-x64.tar.xz

Install the python requirements

```
pip install -r requirements.txt
```

Dataset preparation

Run the load_input_data.py as python load_input_data.py. Update the dataset directory in config.json

For computational contraint, it is recommended to download the object data and render them individually as

Linux

DISPLAY=:0.0 && xvfb-run --auto-servernum blender --background --python blender_scripts/dataset_rendering.py -- --object_path 'path to 3D object' --num_renders 32 --output_dir 'path to dataset_dir' --engine CYCLES

Others

blender --background --python blender_scripts/dataset_rendering.py -- --object_path 'path to 3D object' --num_renders 32 --output_dir 'path to dataset_dir' --engine CYCLES

Training

Get the hostname as hostname -i

Run the following command

torchrun --nnodes=2 --nproc_per_node=8 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:29400 train_ddp.py

Replace the $MASTER_ADDR by the hostname of main system

Inference

Run the test.py script as

python test.py --checkpoint_path=<path to model checkpoint> --config_path=<path to config.json file> --mode=<text/image> --input=<prompt/image_path> --output_path=<path to output directory> --export_video --export_mesh

This will save the rendered video and mesh as .ply format inside the specified output directory.

References

Papers
- LRM, Efficient Geometry-aware 3D Generative Adversarial Networks, TensoRF: Tensorial Radiance Fields
Open-Source repos
- OpenLRM, TensorRF, Eg3D

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
blender_scripts		blender_scripts
models		models
.gitignore		.gitignore
README.md		README.md
combined_loss.py		combined_loss.py
config.json		config.json
config.py		config.py
dataset.py		dataset.py
load_input_data.py		load_input_data.py
requirements.txt		requirements.txt
setup.sh		setup.sh
test.py		test.py
train.py		train.py
train_ddp.py		train_ddp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIRA - Multimodal Image Reconstruction with Attention

Setup

Dataset preparation

Training

Inference

References

About

Releases

Packages

Languages

SwayamInSync/MIRA

Folders and files

Latest commit

History

Repository files navigation

MIRA - Multimodal Image Reconstruction with Attention

Setup

Dataset preparation

Training

Inference

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages