-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: OpenLLM not loading the model #125
Comments
This cuda 11.3, which I didn't test on. Can you try cuda 11.8? Let me add a section to the readme about known CUDA support. |
Thanks for your answer (and the great lib by the way!) Starting from another fresh install and running: # uninstall previous coda install
sudo /usr/bin/nvidia-uninstall
# install cuda 11.8
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run --silent
# install openllm
conda create -n py10 python=3.10 -y
conda activate py10
pip install "openllm[llama, fine-tune, vllm]"
openllm start llama --model-id huggyllama/llama-13b The missing SciPy issue still shows up. After installing it, the logs go straight to the checkpoint shards loading (without displaying anything about downloading the model weights). Then, nothing much happens (OpenLLM slowly uses more and more RAM but barely any CPU and no GPU). Any chance loading via CPU may be the bottleneck here ? (despite the GPU being found as evidenced by Deepspeed setting the right accelerator). |
I just fixed a bug for loading on single gpu. Can u try with 0.2.6? I guess since you are using a100, it should be good to load the whole model into memory |
The logs one hour and a half after running
Still nothing loaded on the GPU by that time unfortunately. |
What happens with |
Pretty much the same thing at first (using 0.2.9):
But things got moving when I tried to shutdown the command: ^C^C^C^C^C^CStarting server with arguments: ['/opt/conda/envs/py10/bin/python3.10', '-m', 'bentoml', 'serve-http', '_service.py:svc', '--host', '0.0.0.0', '--port', '3000', '--backlog', '2048', '--api-workers', '12', '--working-dir', '/opt/conda/envs/py10/lib/python3.10/site-packages/openllm', '--ssl-version', '17', '--ssl-ciphers', 'TLSv1']
2023-07-25T14:25:28+0000 [DEBUG] [cli] Importing service "_service.py:svc" from working dir: "/opt/conda/envs/py10/lib/python3.10/site-packages/openllm"
2023-07-25T14:25:31+0000 [DEBUG] [cli] Initializing MLIR with module: _site_initialize_0
2023-07-25T14:25:31+0000 [DEBUG] [cli] Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/opt/conda/envs/py10/lib/python3.10/site-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
2023-07-25T14:25:32+0000 [DEBUG] [cli] No jax_plugins namespace packages available
2023-07-25T14:25:33+0000 [DEBUG] [cli] etils.epath found. Using etils.epath for file I/O.
2023-07-25T14:25:51+0000 [INFO] [cli] Created a temporary directory at /tmp/tmpgwt7mutk
2023-07-25T14:25:51+0000 [INFO] [cli] Writing /tmp/tmpgwt7mutk/_remote_module_non_scriptable.py
[2023-07-25 14:25:52,312] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-25T14:26:01+0000 [DEBUG] [cli] Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Trying paths: ['/home/user/.docker/config.json', '/home/qlutz/.dockercfg']
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found file at path: /home/user/.docker/config.json
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found 'credHelpers' section
2023-07-25T14:26:11+0000 [DEBUG] [cli] [Tracing] Create new propagation context: {'trace_id': '663640676af84209a41185161a0d1eac', 'span_id': 'b2ab05f9966f5d45', 'parent_span_id': None, 'dynamic_sampling_context': None}
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Either way, nothing is loaded on the GPU. |
how many GPUs do you have? |
Still the same setup as in the original post: 1xA100 80GB. I tested on Cuda 11.6 and 11.8 |
Fixed in the last version ( |
@aarnphm still has the same problem when use |
Describe the bug
Starting from a clean setup (Python 3.10), trying to start a LLaMa 13B results in a
ModuleNotFoundError
which, when corrected (by installing SciPy), results in nothing much happening after the weights are loaded.To reproduce
conda create -n py10 python=3.10 -y conda activate py10 pip install "openllm[llama, fine-tune, vllm]" pip install scipy openllm start llama --model-id huggyllama/llama-13b
Logs
Also,
nvidia-smi
reveals that nothing is loaded on the GPU (after 20+ minutes):Environment
Debian 10
Python 3.10
OpenLLM 0.2.0
Environment variable
System information
bentoml
: 1.0.24python
: 3.10.12platform
: Linux-4.19.0-22-cloud-amd64-x86_64-with-glibc2.28uid_gid
: 1004:1005conda
: 22.9.0in_conda_env
: Trueconda_packages
pip_packages
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/py10 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
[2023-07-20 13:31:57,679] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.31.0System information (Optional)
a2-highgpu-1g GCP instance (1xA100 80GB)
The text was updated successfully, but these errors were encountered: