Questions about visualizing predicted image.


Hi,

Thank you for sharing your source code publicly. I appreciate your contribution. However, I encountered an issue while trying to reproduce the future frame prediction results as described in the paper. The predicted images do not seem to perform well, as illustrated as following.

![Image](https://github.com/user-attachments/assets/a228c6cf-d179-45df-ae2a-7cce4d9fd606)

To visualize the predictions, I added the following code in the `rollout` function of `eval_utils_calvin.py`. Some parts of the code are omitted for brevity:

```python
for step in range(EP_LEN):
    action, img_pred = model.step(obs, lang_annotation, step)
    os.makedirs('./saved_img/', exist_ok=True)
    if step % 50 == 0:
        img1 = unpatchify(img_pred[:, 0, ...], patch_size=args.patch_size, img_size=(args.calvin_input_image_size, args.calvin_input_image_size))
        img2 = unpatchify(img_pred[:, 1, ...], patch_size=args.patch_size, img_size=(args.calvin_input_image_size, args.calvin_input_image_size))
        for i in range(img1.shape[0]):
            torchvision.utils.save_image(img1[i, ...], f"./saved_img/img_{step}_t{i}_1.png")
            torchvision.utils.save_image(img2[i, ...], f"./saved_img/img_{step}_t{i}_2.png")
    if len(planned_actions) == 0: ...
 ````

Additionally, here is the unpatchify function I used:
```python
def unpatchify(x, patch_size, img_size):
    N, L, _ = x.shape
    H, W = img_size
    h = H // patch_size
    w = W // patch_size

    x = x.view(N, h, w, 3, patch_size, patch_size)
    x = x.permute(0, 3, 1, 4, 2, 5).contiguous()
    x = x.view(N, 3, H, W)
    return x 
````

I used the pre-trained weights located in the folder finetune_bs=640_lr1e-4_atten_goal_state4_atten_only_obs_sv10_abc_reset_act_obs_ep5_abc, specifically 19.pth.

Here is a part of my eval.sh script:
```python
calvin_dataset_path="calvin/dataset/task_ABC_D"
calvin_conf_path="calvin/calvin_models/conf"
vit_checkpoint_path="checkpoints/vit_mae/mae_pretrain_vit_base.pth" # downloaded from https://drive.google.com/file/d/1bSsvRI4mDM3Gg51C6xO0l9CbojYw3OEt/view?usp=sharing
save_checkpoint_path="checkpoints/"
### NEED TO CHANGE the checkpoint path ###
resume_from_checkpoint="checkpoints/calvin/19.pth"
````
Could you help me understand why the predicted images are not performing well? Am I missing something in the evaluation setup?
Thank you again for your time and assistance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about visualizing predicted image. #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Questions about visualizing predicted image. #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions