Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[demo] RuntimeError: std::bad_alloc #165

Closed
Nntsyeo opened this issue Jun 23, 2023 · 6 comments
Closed

[demo] RuntimeError: std::bad_alloc #165

Nntsyeo opened this issue Jun 23, 2023 · 6 comments
Assignees
Labels
area:demo code of demo

Comments

@Nntsyeo
Copy link

Nntsyeo commented Jun 23, 2023

The example code is pulled from here.
Error:

(otter) user@env:~/Otter$ python test-model.py                             
                                                                                    
Using pad_token, but it is not set yet.                                            
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/
███████████████████████████████████████████████| 4/4 [00:35<00:00,  8.92s/it]       
Enter prompts (comma-separated): what are they doing?                              

Prompt: what are they doing?                                                       
Traceback (most recent call last):                                                 
  File "/home/user/Otter/test-model.py", line 141, in <module>                                                                                                        
    response = get_response(frames_list, prompt, model, image_processor)                                                                                              
  File "/home/user/Otter/test-model.py", line 98, in get_response                                                                                                     
    generated_text = model.generate(                                               
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                        
    return func(*args, **kwargs)                                                   
  File "/home/user/Otter/otter/modeling_otter.py", line 873, in generate                                                                                              
    self._encode_vision_x(vision_x=vision_x)                                       
  File "/home/user/Otter/otter/modeling_otter.py", line 831, in _encode_vision_x                                                                                      
    vision_x = self.vision_encoder(vision_x)[0][:, 1:, :]                                                                                                             
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                             
    return forward_call(*args, **kwargs)                                           
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 940, in forward                                  
    return self.vision_model(                                                      
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                             
    return forward_call(*args, **kwargs)                                           
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward                                  
    hidden_states = self.embeddings(pixel_values)                                  
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                             
    return forward_call(*args, **kwargs)                                           
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward                                  
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]                                                                               
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                             
    return forward_call(*args, **kwargs)                                           
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward                                                   
    return self._conv_forward(input, self.weight, self.bias)                                                                                                          
  File "/home/user/miniconda3/envs/otter/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward                                             
    return F.conv2d(input, weight, bias, self.stride,                              
RuntimeError: std::bad_alloc

Not sure if this is correct, but I've used this as stated in the example (main):

    model = OtterForConditionalGeneration.from_pretrained(
        "luodian/otter-9b-dc-hf",
    )

My packages are:

(otter) user@env:~/Otter$ pip list | grep -e torch -e xformers
open-clip-torch          2.20.0
torch                    2.0.1
torchaudio               2.0.2
torchvision              0.15.2

Originally posted by @Nntsyeo in #147 (comment)

@ZhangYuanhan-AI ZhangYuanhan-AI self-assigned this Jun 23, 2023
@ZhangYuanhan-AI
Copy link
Collaborator

#147

@Nntsyeo
Copy link
Author

Nntsyeo commented Jun 24, 2023

I may have found the issue to my problem. The GPU I'm using (1x RTX4090 24Gb) is not enough to run the model with video input. It works well when I used image instead.

nvcc version: 12.1.105
cuda version: 12.1

@ZhangYuanhan-AI
Copy link
Collaborator

Great.

Just to be specific. You mean the 1x RTX4090 24Gb is not capable for 128 frames?

@king159 king159 added the area:demo code of demo label Jun 25, 2023
@Nntsyeo
Copy link
Author

Nntsyeo commented Jun 26, 2023

Have tested a few more times (with different video lengths). Videos within 10-20secs are able to work. My first few attempts with 1min video couldn't work and would prompt the error above.

May I know if this 128 frames is part of the parameter in the model? It doesn't seem to take any effect when I changed its config value under the config.json in my cache folder.

@Nntsyeo Nntsyeo closed this as completed Jun 26, 2023
@Luodian
Copy link
Owner

Luodian commented Jun 26, 2023

We can run videos more than 3mins on our machine (dual 3090 and A100). The max_num_frames in config.json means the cross attentions are build across the 128 frames during training. You can actually uniformly extract your videos into 16, 32...128 frames at your wish.

Please refer to this to see how we organize the input for video demo, and consider to give it a ❤️~

@ZhangYuanhan-AI
Copy link
Collaborator

Have tested a few more times (with different video lengths). Videos within 10-20secs are able to work. My first few attempts with 1min video couldn't work and would prompt the error above.

May I know if this 128 frames is part of the parameter in the model? It doesn't seem to take any effect when I changed its config value under the config.json in my cache folder.

Generally, 128 is the upper bound number of the video frames.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:demo code of demo
Projects
None yet
Development

No branches or pull requests

4 participants