Repository for the "Multi-Modal World Models in Autonomous Driving" project in the 3D Vision course at ETHZ, 2025. There are two parts to this project in that depth distillation for IJEPA features has been conducted through ground truth depth on one hand and through DepthFM features on the other.
This main branch contains the code and setup instructions for ground truth depth distillation, while the branch ijepa_image_features contains the code and setup instructions for DepthFM distillation. The two are different and the most interesting results are within DepthFM distillation.
In order to run the latest training with ground truth depth prediction with DPT head, you need to clone this repository and download the dataset(s). First, start by cloning this repo:
cd /path/to/desired/workspace
git clone git@github.com:Juan5713/MM_WM_AD.gitJust to keep things clean, it is good to remove git-related files from the repo so as to freeze it and avoid git issues with nested repos. Run the following if you want to do so:
cd MM_WM_AD # now in MM_WM_AD/ijepa
rm -rf .git .github .gitignoreWe will now grab the nyuv2 dataset from https://cs.nyu.edu/~fergus/datasets/nyu_depth_v2.html. In the download section, select the labeled dataset (will download a .mat file). Back in the repository create a datasets folder and a subfolder for nyuv2:
# now back in MM_WM_AD
mkdir datasets && cd datasets && mkdir nyuv2and then move the downloaded file into this subfolder. Your folder structure should look something like this now:
MM_WM_AD/
|----- datasets/
| |----- nyuv2/
| | |----- nyu_depth_v2_labeled.mat
| | |...
| |...
|----- ijepa/
| |...
|----- dataloaders/
| |...
|----- jobscripts/
| |...
|----- notebooks/
| |...
|----- scripts/
| |...
|----- utils/
| |...
|...Keep in mind the datasets folder has to be set up manually because they take up a considerable amount of space and it is not viable to keep them in a repo. The ijepa folder contains the adapted code from original IJEPA (https://github.com/facebookresearch/ijepa/tree/main).
If you are running this on cluster, you will need to make some changes to the IJEPA configuration files. Particularly, open the file MM_WM_AD/ijepa/configs/in1k_vith14_ep300_GTDEPTH.yaml. In here, replace the scratch_dir argument with your absolute path to scratch, particularly /cluster/scratch/[your-eth-id]. Moreover, adapt the wandb logging parameters in the same config file under the wandb argument.
Now, ssh into Euler and make a directory in your cluster home directory and name it as you please. For example:
ssh [your-eth-id]@euler.ethz.ch # ssh into the cluster
cd ~
mkdir 3dvisionNow copy over the datasets, ijepa, jobscripts, dataloaders, utils and scripts directories to the cluster via scp. Run the following to copy the required directories:
exit # in case you were still ssh'ed in the cluster
cd /path/to/MM_WM_AD
scp -r scripts [your-eth-id]@euler.ethz.ch:/cluster/home/[your-eth-id]/3dvision/Repeat the scp command replacing scripts with the other directory names. Moreover, you will need to head to your scratch directory and create directories to store outputs and checkpoints from the training process as follows:
cd $SCRATCH
mkdir ijepa && cd ijepa && mkdir predictionsFinally, make sure that you pip install -U -r requirements.txt. For this, you need to make sure that the modules stack/2024-06, python/3.11.6 and eth-proxy are loaded, similarly to what is in the jobscripts.
This should complete the required setup for running on the cluster. Now, to run the finetuning, execute:
ssh [your-eth-id]@euler.ethz.ch # ssh back into the cluster
cd ~/3dvision/jobscripts
sbatch < gt_depth.shIn order to monitor the script running, you can execute watch squeue which will update the status every 2 seconds by default. To exit out of the watching window run ctrl+c. The output of the run will be stored in the jobscripts folder and the graphs will be in your selected wand project.
DISCLAIMER: Running these training scripts requires a large amount of memory and processing power and we recommend running on cluster resources where possible.
If you are running this locally, you will need to make some changes to the IJEPA configuration files similarly to the cluster setup. Particularly, open the file MM_WM_AD/ijepa/configs/in1k_vith14_ep300_GTDEPTH.yaml. In here, replace the scratch_dir argument with your absolute path to the directory where you want to store output depth maps and checkpoints. Moreover, adapt the wandb logging parameters in the same config file under the wandb argument.
We can skip the step of copying folders like we would have done in the cluster. Next, move to the directory you indicated earlier to store output depth maps and checkpoints. Here, make folders to store the output:
cd /path/to/desired/out/dir
mkdir ijepa && cd ijepa && mkdir predictionsFinally, make sure that you pip install -r requirements.txt in a virtual environment or another software environment manager of your choice. This should complete the setup for running locally. Next, launch the training script as follows:
cd /path/to/MM_WM_AD
PYTHONPATH=./ijepa:./ python3 ./scripts/gt_depth_pred.py --fname ./ijepa/configs/in1k_vith14_ep300_GTDEPTH.yaml --devices cuda:0