vision chat error #13

Minyoung1005 · 2024-02-16T02:54:57Z

Hi,

I'm trying to run run_vision_chat.sh but getting the following error:

(lwm) minyoung@claw2:~/Projects/LWM$ bash scripts/run_vision_chat.sh 
I0215 18:19:20.605390 140230836105600 xla_bridge.py:689] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0215 18:19:20.607900 140230836105600 xla_bridge.py:689] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
2024-02-15 18:19:29.755994: W external/xla/xla/service/gpu/nvptx_compiler.cc:744] The NVIDIA driver's CUDA version is 12.1 which is older than the ptxas CUDA version (12.3.107). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Traceback (most recent call last):
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 254, in <module>
    run(main)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 249, in main
    sampler = Sampler()
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 42, in __init__
    self.mesh = VideoLLaMAConfig.get_jax_mesh(FLAGS.mesh_dim)
  File "/home/minyoung/Projects/LWM/lwm/llama.py", line 260, in get_jax_mesh
    return get_jax_mesh(axis_dims, ('dp', 'fsdp', 'tp', 'sp'))
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/tux/distributed.py", line 140, in get_jax_mesh
    mesh_shape = np.arange(jax.device_count()).reshape(dims).shape
ValueError: cannot reshape array of size 1 into shape (1,newaxis,32,1)

These are the model configs I used.

export llama_tokenizer_path="./LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="./LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="./LWM-Chat-1M-Jax/params"
export input_file="./traj0.mp4"

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-02-16T17:28:50Z

FYI what works for me:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"

python3 -u -m lwm.vision_chat \
    --prompt="What is the image about?" \
    --input_file="$input_file" \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --max_n_frames=8 \
    --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
    --load_checkpoint="params::$lwm_checkpoint" \
    --tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read

But I didn't get video to work yet. Probably doesn't input mp4.

Also the --mesh_dim='!1,-1,32,1' \ seems off always, or has to be chosen or removed.

I wish the creators gave minimal running examples using the scripts.

Minyoung1005 · 2024-02-16T19:13:51Z

Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.

cyj95 · 2024-02-20T02:37:29Z

is the .avi video format supported?

ghost · 2024-02-21T05:15:09Z

I got the same problem. It cannot process .mp4 file.

mileyan · 2024-02-21T09:10:43Z

.mkv format works for me.

ghost · 2024-02-21T09:35:33Z

.mkv format works for me.

Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.

wilson1yan · 2024-02-21T20:33:11Z

The mesh_dim argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim should be 1,1,8,1. The default 32 might be too high if your machine doesn't have 32 devices.

Regarding supported video files, the code here:

LWM/lwm/vision_chat.py

Line 84 in 0f441d3

vr = decord.VideoReader(f, ctx=decord.cpu(0))

just uses decord to read the video, so any video format that works for decord should work.

Minyoung1005 changed the title ~~Video file format~~ vision chat error Feb 16, 2024

yfb-xieyu mentioned this issue Feb 21, 2024

Mesh dim setting error #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision chat error #13

vision chat error #13

Minyoung1005 commented Feb 16, 2024

pseudotensor commented Feb 16, 2024

Minyoung1005 commented Feb 16, 2024

cyj95 commented Feb 20, 2024

ghost commented Feb 21, 2024

mileyan commented Feb 21, 2024

ghost commented Feb 21, 2024

wilson1yan commented Feb 21, 2024

vision chat error #13

vision chat error #13

Comments

Minyoung1005 commented Feb 16, 2024

pseudotensor commented Feb 16, 2024

Minyoung1005 commented Feb 16, 2024

cyj95 commented Feb 20, 2024

ghost commented Feb 21, 2024

mileyan commented Feb 21, 2024

ghost commented Feb 21, 2024

wilson1yan commented Feb 21, 2024