Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS backend out of memory evaluating fine-tuned Mixtral-8x7B-Instruct-v0.1 on a machine with 100+ GB #1835

Closed
chimezie opened this issue May 13, 2024 · 2 comments

Comments

@chimezie
Copy link

I'm trying to evaluate a locally fine-tuned and unquantized Mixtral-8x7B-Instruct-v0.1 model on an Apple Mac Studio M1 Ultra with 128GB of memory vi a the following command-line:

lm_eval --model hf --model_args pretrained=/path/to/model,dtype="float" \
        --tasks medqa_4options \
        --device mps \
        --batch_size 1

Note that the batch_size is 1, because, despite having over 100 GB of memory, which should be plenty for that model, I get a MPS backend out of memory error even when I specify auto for the batch_size:

2024-05-13:11:49:02,913 INFO     [__main__.py:254] Verbosity set to INFO
2024-05-13:11:49:05,262 INFO     [__main__.py:341] Selected Tasks: ['medqa_4options']
2024-05-13:11:49:05,264 INFO     [evaluator.py:141] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-05-13:11:49:05,264 INFO     [evaluator.py:178] Initializing hf model, with arguments: {'pretrained': '/Users/oori/medical_llm/raw_models/mlx/MrGrammaticaOntology-Mixtral-8x7B-Instruct-v
0.1-clinical-problems-0.6.0', 'dtype': 'float'}
2024-05-13:11:49:05,276 INFO     [huggingface.py:165] Using device 'mps'
Loading checkpoint shards:  83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                     | 15/18 [01:43<00:20,  6.89s/it]
Traceback (most recent call last):
  File "/path/to/mmlu-eval/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
[..snip..]
  File "/path/to/lm_eval/api/model.py", line 134, in create_from_arg_string
    return cls(**args, **args2)
           ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/lm_eval/models/huggingface.py", line 204, in __init__
    self._create_model(
  File "/Users/oori/medical_llm/lm-evaluation-harness/lm_eval/models/huggingface.py", line 547, in _create_model
    self._model = self.AUTO_MODEL_CLASS.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[..snip..]
  File "/path/to/python3.11/site-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/path/to/python3.11/site-packages/accelerate/utils/modeling.py", line 387, in set_module_tensor_to_device
    new_value = value.to(device)
                ^^^^^^^^^^^^^^^^
RuntimeError: MPS backend out of memory (MPS allocated: 163.01 GB, other allocations: 384.00 KB, max allowed: 163.20 GB). Tried to allocate 224.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

I'm wary about setting PYTORCH_MPS_HIGH_WATERMARK_RATIO as I should have plenty of memory for this and I don't want to crash the server.

I can run the evaluation with other models (such as Llama 3 8B, for instance) without an issue..

@LSinev
Copy link
Contributor

LSinev commented May 13, 2024

/path/to/python3.11/site-packages/accelerate this part shows the problem is with accelerate package, not with the lm-evaluation-harness and should probably be investigated with its developers.

@chimezie
Copy link
Author

Thanks. I have created an issue with accelerate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants