Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vision chat error #13

Open
Minyoung1005 opened this issue Feb 16, 2024 · 7 comments
Open

vision chat error #13

Minyoung1005 opened this issue Feb 16, 2024 · 7 comments

Comments

@Minyoung1005
Copy link

Hi,

I'm trying to run run_vision_chat.sh but getting the following error:

(lwm) minyoung@claw2:~/Projects/LWM$ bash scripts/run_vision_chat.sh 
I0215 18:19:20.605390 140230836105600 xla_bridge.py:689] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0215 18:19:20.607900 140230836105600 xla_bridge.py:689] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
2024-02-15 18:19:29.755994: W external/xla/xla/service/gpu/nvptx_compiler.cc:744] The NVIDIA driver's CUDA version is 12.1 which is older than the ptxas CUDA version (12.3.107). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Traceback (most recent call last):
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 254, in <module>
    run(main)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 249, in main
    sampler = Sampler()
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 42, in __init__
    self.mesh = VideoLLaMAConfig.get_jax_mesh(FLAGS.mesh_dim)
  File "/home/minyoung/Projects/LWM/lwm/llama.py", line 260, in get_jax_mesh
    return get_jax_mesh(axis_dims, ('dp', 'fsdp', 'tp', 'sp'))
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/tux/distributed.py", line 140, in get_jax_mesh
    mesh_shape = np.arange(jax.device_count()).reshape(dims).shape
ValueError: cannot reshape array of size 1 into shape (1,newaxis,32,1)

These are the model configs I used.

export llama_tokenizer_path="./LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="./LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="./LWM-Chat-1M-Jax/params"
export input_file="./traj0.mp4"
@Minyoung1005 Minyoung1005 changed the title Video file format vision chat error Feb 16, 2024
@pseudotensor
Copy link

FYI what works for me:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"

python3 -u -m lwm.vision_chat \
    --prompt="What is the image about?" \
    --input_file="$input_file" \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --max_n_frames=8 \
    --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
    --load_checkpoint="params::$lwm_checkpoint" \
    --tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read

But I didn't get video to work yet. Probably doesn't input mp4.

Also the --mesh_dim='!1,-1,32,1' \ seems off always, or has to be chosen or removed.

I wish the creators gave minimal running examples using the scripts.

@Minyoung1005
Copy link
Author

Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.

@cyj95
Copy link

cyj95 commented Feb 20, 2024

is the .avi video format supported?

@ghost
Copy link

ghost commented Feb 21, 2024

I got the same problem. It cannot process .mp4 file.

@mileyan
Copy link

mileyan commented Feb 21, 2024

.mkv format works for me.

@ghost
Copy link

ghost commented Feb 21, 2024

.mkv format works for me.

Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.

@wilson1yan
Copy link
Contributor

The mesh_dim argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim should be 1,1,8,1. The default 32 might be too high if your machine doesn't have 32 devices.

Regarding supported video files, the code here:

vr = decord.VideoReader(f, ctx=decord.cpu(0))

just uses decord to read the video, so any video format that works for decord should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants