-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Running video_chat2 on Multi-GPU Setup with Nvidia Titan Xp #115
Comments
Thank you for your interest in our work. Could you provide more error information? We haven't attempted to load the model in shards before. We've successfully run it on a graphics card with at least 16GB of VRAM. We're happy to research this issue together if you can provide the error information from these lines. |
I'm not sure all the error is general in this reopsitory. because I've modified demo.py and mvbench.py to inference without gradio. there wasn't significant changes in the code. but for reference, I will attach my code file. Anyway, enabling low_resource does returns multiple device mismatch error inside llama_model. The location of the issue is as follows. few more locations could be show afterwards from this examples.
Here is one of the error code if I enable low_resource flag.
I found a way to perform inference using about 9GiB of GPU memory by enabling low_resource. here is the way I modified the code.
what I'm struggling now is that I want to inference without low_resource's int8 model loading with llama-7b.
after setting device_map returns value error
|
@ddoron9 For the first error, I think this may be caused by the hard coding in Ask-Anything/video_chat2/conversation.py Line 233 in 389d886
Changing Ask-Anything/video_chat2/conversation.py Line 233 in d57c30f
Please fix it and try again. |
@ddoron9 For the second question, the ValueError indicates that I'm not very familiar with this, but adding
Perhaps |
Have you solved this problem? I faced a similar problem -- I can perform inference by setting the flag 'low_resource = true' in the config file, but I would always get the "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!" Error when I set 'low_resource = false'. The above fixes do not work. |
@Coronal-Halo Hi, It seems that the error occurs only when |
Thanks for your reply. I solved this problem by trying every combination of putting variables to cpu vs. gpu. |
Hi, I tried to load the model with dual 4090 and still faced the same error after applying the changes. I looked into debugger and realized that it is because the input tensor device is switched automictically by pre forward hook which I believe is implemented by huggingface when setting
In my case, it is model.layer.16 Ask-Anything/video_chat2/models/blip2/modeling_llama.py Lines 565 to 572 in fedc486
3.add
I did not encounter similar issue when I loaded v1 with the same setting for inference. Is it because v1 is using original llama while v2 isn't? Is there any workaround or fix here? Thanks. |
Hi,
I'm currently attempting to run the video_chat2 model on a multi-GPU setup consisting of 8 Nvidia Titan Xp GPUs, each with 12GiB of memory. I'm using the mvbench.ipynb notebook from the Ask-Anything/video_chat2 repository for this purpose.
To ensure the model loads on my GPUs, I've enabled the low_resource option in config.json. Additionally, I've specified device_map="auto" during the initialization of the llama_model in videochat2_it.py. The relevant code snippet is as follows:
However, when I execute the code, I encounter multiple errors originating from the following lines:
Could you provide some guidance or suggestions on how to effectively perform inference with sharded models in this multi-GPU environment?
Thank you for your incredible works.
The text was updated successfully, but these errors were encountered: