Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm backend faild #2028

Closed
chunniunai220ml opened this issue Jun 27, 2024 · 8 comments
Closed

vllm backend faild #2028

chunniunai220ml opened this issue Jun 27, 2024 · 8 comments

Comments

@chunniunai220ml
Copy link

chunniunai220ml commented Jun 27, 2024

hi, i tried to eval :
export CUDA_VISIBLE_DEVICES="2,3" accelerate launch -m lm_eval --model vllm \ --model_args pretrained="THUDM/glm-4-9b",dtype=bfloat16 \ --tasks mmlu \ --device cuda \ --batch_size 2 \ --trust_remote_code \ --cache_requests true \ --num_fewshot 5 \
for accelerate, but failed in A100 (vllm=0.5.0.post1, vllm-flash-attn =2.5.9, torch2.3.0+cu121):
image

@haileyschoelkopf
Copy link
Contributor

Hi, in order to help solve your issue we'd need more information. Namely:

  • can you provide the full traceback, not just this snippet from it?
  • can you provide the library version that you are running with?

@chunniunai220ml
Copy link
Author

full traceback , it's too long, abstract key lines:
vllm/vllm/engine/llm_engine.py", line 230,
vllm/vllm/executor/executor_base.py", line 41
/vllm/vllm/executor/gpu_executor.py"
vllm/vllm/distributed/parallel_state.py", line 771,

image

library version:

vllm=0.5.0.post1, vllm-flash-attn =2.5.9, torch2.3.0+cu121) , lm-eval 0.4.2

@haileyschoelkopf
Copy link
Contributor

I see the problem-- accelerate launch should not be used with VLLM. instead use just lm_eval --model vllm --model_args data_parallel_size=NUMGPUS

@chunniunai220ml
Copy link
Author

chunniunai220ml commented Jun 28, 2024

thank for ur comment, another error, test:
export CUDA_VISIBLE_DEVICES="2,7" #accelerate launch -m lm_eval --model vllm \ --model_args pretrained="THUDM/glm-4-9b",dtype=bfloat16,data_parallel_size=2 \ --tasks mmlu \ --device cuda \ --batch_size 2 \ --trust_remote_code \ --cache_requests true \ --num_fewshot 5
/*/python3.10/site-packages/lm_eval/api/model.py", line 300, in _encode_pair
if self.AUTO_MODEL_CLASS == transformers.AutoModelForCausalLM:
AttributeError: 'VLLM' object has no attribute 'AUTO_MODEL_CLASS'

and when pip install lm_eval[vllm], and fix refer #1953
but, always OOM, no matter i use 4 card A100 or 2 card A100

@haileyschoelkopf
Copy link
Contributor

Hi, could you try the following:

  • setting enforce_eager=True as described in OOM Issue #1923
  • setting gpu_memory_utilization=0.8 or lower
  • using lm_eval==0.4.3 from PyPI ; or the most recent commit from main

@chunniunai220ml
Copy link
Author

pip install lm_eval[vllm] . error report:

lm-evaluation-harness/lm_eval/evaluator.py", line 15, in
from lm_eval.caching.cache import delete_cache
ModuleNotFoundError: No module named 'lm_eval.caching.cache'

but i can import in python client:
image

image

@haileyschoelkopf
Copy link
Contributor

@chunniunai220ml are you by chance running your PyPI-installed lm_eval in the same folder as an older git clone of lm-evaluation-harness?

@chunniunai220ml
Copy link
Author

chunniunai220ml commented Jul 2, 2024

at first , pip install lm_eval[vllm], error as reported. then git pull the latest code, pip install -e . , same error.
i tried to create init.py in lm_eval/cache, doesn't work untill now
@haileyschoelkopf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants