bug: OpenLLM not loading the model #125

QLutz · 2023-07-20T13:32:59Z

Describe the bug

Starting from a clean setup (Python 3.10), trying to start a LLaMa 13B results in a ModuleNotFoundError which, when corrected (by installing SciPy), results in nothing much happening after the weights are loaded.

To reproduce

conda create -n py10 python=3.10 -y
conda activate py10
pip install "openllm[llama, fine-tune, vllm]"
pip install scipy
openllm start llama --model-id huggyllama/llama-13b

Logs

Make sure to have the following dependencies available: ['openllm[vllm]']
bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
[2023-07-20 13:18:51,311] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Downloading (…)fetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.4k/33.4k [00:00<00:00, 98.5MB/s]
Downloading (…)of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [00:24<00:00, 408MB/s]
Downloading (…)of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.90G/9.90G [00:24<00:00, 401MB/s]
Downloading (…)of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.18G/6.18G [00:15<00:00, 402MB/s]
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:04<00:00, 21.61s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.25s/it]
Downloading (…)neration_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 621kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [00:00<00:00, 2.61MB/s]
Downloading tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 375MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 4.55MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 1.95MB/s]

Also, nvidia-smi reveals that nothing is loaded on the GPU (after 20+ minutes):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   33C    P0    58W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Environment

Debian 10
Python 3.10
OpenLLM 0.2.0

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.0.24
python: 3.10.12
platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-glibc2.28
uid_gid: 1004:1005
conda: 22.9.0
in_conda_env: True

conda_packages

name: py10
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - bzip2=1.0.8=h7f98852_4
  - ca-certificates=2023.5.7=hbcca054_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=13.1.0=he5830b7_0
  - libgomp=13.1.0=he5830b7_0
  - libnsl=2.0.0=h7f98852_0
  - libsqlite=3.42.0=h2797004_0
  - libuuid=2.38.1=h0b41bf4_0
  - libzlib=1.2.13=hd590300_5
  - ncurses=6.4=hcb278e6_0
  - openssl=3.1.1=hd590300_1
  - pip=23.2=pyhd8ed1ab_0
  - python=3.10.12=hd12c33a_0_cpython
  - readline=8.2=h8228510_1
  - setuptools=68.0.0=pyhd8ed1ab_0
  - tk=8.6.12=h27826a3_0
  - wheel=0.40.0=pyhd8ed1ab_1
  - xz=5.2.6=h166bdaf_0
  - pip:
    - accelerate==0.21.0
    - aiofiles==23.1.0
    - aiohttp==3.8.5
    - aiosignal==1.3.1
    - altair==5.0.1
    - anyio==3.7.1
    - appdirs==1.4.4
    - asgiref==3.7.2
    - async-timeout==4.0.2
    - attrs==23.1.0
    - bentoml==1.0.24
    - bitsandbytes==0.39.1
    - build==0.10.0
    - cattrs==23.1.2
    - certifi==2023.5.7
    - charset-normalizer==3.2.0
    - circus==0.18.0
    - click==8.1.6
    - click-option-group==0.5.6
    - cloudpickle==2.2.1
    - cmake==3.27.0
    - coloredlogs==15.0.1
    - contextlib2==21.6.0
    - contourpy==1.1.0
    - cuda-python==12.2.0
    - cycler==0.11.0
    - cython==3.0.0
    - datasets==2.13.1
    - deepmerge==1.1.0
    - deepspeed==0.10.0
    - deprecated==1.2.14
    - dill==0.3.6
    - docker-pycreds==0.4.0
    - exceptiongroup==1.1.2
    - fairscale==0.4.13
    - fastapi==0.100.0
    - ffmpy==0.3.1
    - filelock==3.12.2
    - filetype==1.2.0
    - fonttools==4.41.0
    - frozenlist==1.4.0
    - fs==2.4.16
    - fschat==0.2.3
    - fsspec==2023.6.0
    - gitdb==4.0.10
    - gitpython==3.1.32
    - gradio==3.23.0
    - grpcio==1.51.3
    - grpcio-health-checking==1.51.3
    - h11==0.14.0
    - hjson==3.1.0
    - httpcore==0.17.3
    - httpx==0.24.1
    - huggingface-hub==0.16.4
    - humanfriendly==10.0
    - idna==3.4
    - importlib-metadata==6.0.1
    - inflection==0.5.1
    - jinja2==3.1.2
    - jsonschema==4.18.4
    - jsonschema-specifications==2023.7.1
    - kiwisolver==1.4.4
    - linkify-it-py==2.0.2
    - lit==16.0.6
    - markdown-it-py==2.2.0
    - markdown2==2.4.9
    - markupsafe==2.1.3
    - matplotlib==3.7.2
    - mdit-py-plugins==0.3.3
    - mdurl==0.1.2
    - mpmath==1.3.0
    - msgpack==1.0.5
    - multidict==6.0.4
    - multiprocess==0.70.14
    - mypy-extensions==1.0.0
    - networkx==3.1
    - ninja==1.11.1
    - numpy==1.25.1
    - nvidia-cublas-cu11==11.10.3.66
    - nvidia-cuda-cupti-cu11==11.7.101
    - nvidia-cuda-nvrtc-cu11==11.7.99
    - nvidia-cuda-runtime-cu11==11.7.99
    - nvidia-cudnn-cu11==8.5.0.96
    - nvidia-cufft-cu11==10.9.0.58
    - nvidia-curand-cu11==10.2.10.91
    - nvidia-cusolver-cu11==11.4.0.1
    - nvidia-cusparse-cu11==11.7.4.91
    - nvidia-nccl-cu11==2.14.3
    - nvidia-nvtx-cu11==11.7.91
    - openllm==0.2.0
    - opentelemetry-api==1.18.0
    - opentelemetry-instrumentation==0.39b0
    - opentelemetry-instrumentation-aiohttp-client==0.39b0
    - opentelemetry-instrumentation-asgi==0.39b0
    - opentelemetry-instrumentation-grpc==0.39b0
    - opentelemetry-sdk==1.18.0
    - opentelemetry-semantic-conventions==0.39b0
    - opentelemetry-util-http==0.39b0
    - optimum==1.9.1
    - orjson==3.9.2
    - packaging==23.1
    - pandas==2.0.3
    - pathspec==0.11.1
    - pathtools==0.1.2
    - peft==0.4.0
    - pillow==10.0.0
    - pip-requirements-parser==32.0.1
    - pip-tools==7.1.0
    - prometheus-client==0.17.1
    - prompt-toolkit==3.0.39
    - protobuf==4.23.4
    - psutil==5.9.5
    - py-cpuinfo==9.0.0
    - pyarrow==12.0.1
    - pydantic==1.10.11
    - pydub==0.25.1
    - pygments==2.15.1
    - pynvml==11.5.0
    - pyparsing==3.0.9
    - pyproject-hooks==1.0.0
    - pyre-extensions==0.0.29
    - python-dateutil==2.8.2
    - python-json-logger==2.0.7
    - python-multipart==0.0.6
    - pytz==2023.3
    - pyyaml==6.0.1
    - pyzmq==25.1.0
    - ray==2.5.1
    - referencing==0.30.0
    - regex==2023.6.3
    - requests==2.31.0
    - rich==13.4.2
    - rpds-py==0.9.2
    - safetensors==0.3.1
    - schema==0.7.5
    - scipy==1.11.1
    - semantic-version==2.10.0
    - sentencepiece==0.1.99
    - sentry-sdk==1.28.1
    - setproctitle==1.3.2
    - shortuuid==1.0.11
    - simple-di==0.1.5
    - six==1.16.0
    - smmap==5.0.0
    - sniffio==1.3.0
    - starlette==0.27.0
    - svgwrite==1.4.3
    - sympy==1.12
    - tabulate==0.9.0
    - tokenizers==0.13.3
    - tomli==2.0.1
    - toolz==0.12.0
    - torch==2.0.1
    - tornado==6.3.2
    - tqdm==4.65.0
    - transformers==4.31.0
    - triton==2.0.0
    - trl==0.4.7
    - typing-extensions==4.7.1
    - typing-inspect==0.9.0
    - tzdata==2023.3
    - uc-micro-py==1.0.2
    - urllib3==2.0.4
    - uvicorn==0.23.1
    - vllm==0.1.2
    - wandb==0.15.5
    - watchfiles==0.19.0
    - wavedrom==2.0.3.post3
    - wcwidth==0.2.6
    - websockets==11.0.3
    - wrapt==1.15.0
    - xformers==0.0.20
    - xxhash==3.2.0
    - yarl==1.9.2
    - zipp==3.16.2
prefix: /opt/conda/envs/py10

pip_packages

accelerate==0.21.0
aiofiles==23.1.0
aiohttp==3.8.5
aiosignal==1.3.1
altair==5.0.1
anyio==3.7.1
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.2
attrs==23.1.0
bentoml==1.0.24
bitsandbytes==0.39.1
build==0.10.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.2.0
circus==0.18.0
click==8.1.6
click-option-group==0.5.6
cloudpickle==2.2.1
cmake==3.27.0
coloredlogs==15.0.1
contextlib2==21.6.0
contourpy==1.1.0
cuda-python==12.2.0
cycler==0.11.0
Cython==3.0.0
datasets==2.13.1
deepmerge==1.1.0
deepspeed==0.10.0
Deprecated==1.2.14
dill==0.3.6
docker-pycreds==0.4.0
exceptiongroup==1.1.2
fairscale==0.4.13
fastapi==0.100.0
ffmpy==0.3.1
filelock==3.12.2
filetype==1.2.0
fonttools==4.41.0
frozenlist==1.4.0
fs==2.4.16
fschat==0.2.3
fsspec==2023.6.0
gitdb==4.0.10
GitPython==3.1.32
gradio==3.23.0
grpcio==1.51.3
grpcio-health-checking==1.51.3
h11==0.14.0
hjson==3.1.0
httpcore==0.17.3
httpx==0.24.1
huggingface-hub==0.16.4
humanfriendly==10.0
idna==3.4
importlib-metadata==6.0.1
inflection==0.5.1
Jinja2==3.1.2
jsonschema==4.18.4
jsonschema-specifications==2023.7.1
kiwisolver==1.4.4
linkify-it-py==2.0.2
lit==16.0.6
markdown-it-py==2.2.0
markdown2==2.4.9
MarkupSafe==2.1.3
matplotlib==3.7.2
mdit-py-plugins==0.3.3
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
networkx==3.1
ninja==1.11.1
numpy==1.25.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
openllm==0.2.0
opentelemetry-api==1.18.0
opentelemetry-instrumentation==0.39b0
opentelemetry-instrumentation-aiohttp-client==0.39b0
opentelemetry-instrumentation-asgi==0.39b0
opentelemetry-instrumentation-grpc==0.39b0
opentelemetry-sdk==1.18.0
opentelemetry-semantic-conventions==0.39b0
opentelemetry-util-http==0.39b0
optimum==1.9.1
orjson==3.9.2
packaging==23.1
pandas==2.0.3
pathspec==0.11.1
pathtools==0.1.2
peft==0.4.0
Pillow==10.0.0
pip-requirements-parser==32.0.1
pip-tools==7.1.0
prometheus-client==0.17.1
prompt-toolkit==3.0.39
protobuf==4.23.4
psutil==5.9.5
py-cpuinfo==9.0.0
pyarrow==12.0.1
pydantic==1.10.11
pydub==0.25.1
Pygments==2.15.1
pynvml==11.5.0
pyparsing==3.0.9
pyproject_hooks==1.0.0
pyre-extensions==0.0.29
python-dateutil==2.8.2
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3
PyYAML==6.0.1
pyzmq==25.1.0
ray==2.5.1
referencing==0.30.0
regex==2023.6.3
requests==2.31.0
rich==13.4.2
rpds-py==0.9.2
safetensors==0.3.1
schema==0.7.5
scipy==1.11.1
semantic-version==2.10.0
sentencepiece==0.1.99
sentry-sdk==1.28.1
setproctitle==1.3.2
shortuuid==1.0.11
simple-di==0.1.5
six==1.16.0
smmap==5.0.0
sniffio==1.3.0
starlette==0.27.0
svgwrite==1.4.3
sympy==1.12
tabulate==0.9.0
tokenizers==0.13.3
tomli==2.0.1
toolz==0.12.0
torch==2.0.1
tornado==6.3.2
tqdm==4.65.0
transformers==4.31.0
triton==2.0.0
trl==0.4.7
typing-inspect==0.9.0
typing_extensions==4.7.1
tzdata==2023.3
uc-micro-py==1.0.2
urllib3==2.0.4
uvicorn==0.23.1
vllm==0.1.2
wandb==0.15.5
watchfiles==0.19.0
wavedrom==2.0.3.post3
wcwidth==0.2.6
websockets==11.0.3
wrapt==1.15.0
xformers==0.0.20
xxhash==3.2.0
yarl==1.9.2
zipp==3.16.2

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/py10 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
[2023-07-20 13:31:57,679] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.31.0
Platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-glibc2.28
Python version: 3.10.12
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.1
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

System information (Optional)

a2-highgpu-1g GCP instance (1xA100 80GB)

The text was updated successfully, but these errors were encountered:

aarnphm · 2023-07-20T22:11:50Z

This cuda 11.3, which I didn't test on. Can you try cuda 11.8?

Let me add a section to the readme about known CUDA support.

QLutz · 2023-07-21T08:48:00Z

Thanks for your answer (and the great lib by the way!)

Starting from another fresh install and running:

# uninstall previous coda install
sudo /usr/bin/nvidia-uninstall
# install cuda 11.8
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run --silent
# install openllm
conda create -n py10 python=3.10 -y
conda activate py10
pip install "openllm[llama, fine-tune, vllm]"
openllm start llama --model-id huggyllama/llama-13b

The missing SciPy issue still shows up. After installing it, the logs go straight to the checkpoint shards loading (without displaying anything about downloading the model weights). Then, nothing much happens (OpenLLM slowly uses more and more RAM but barely any CPU and no GPU). Any chance loading via CPU may be the bottleneck here ? (despite the GPU being found as evidenced by Deepspeed setting the right accelerator).

aarnphm · 2023-07-21T19:43:18Z

I just fixed a bug for loading on single gpu.

Can u try with 0.2.6?

I guess since you are using a100, it should be good to load the whole model into memory

QLutz · 2023-07-24T08:25:57Z

The logs one hour and a half after running openllm start llama --model-id huggyllama/llama-13b:

bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
[2023-07-24 07:39:35,243] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Downloading (…)fetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.4k/33.4k [00:00<00:00, 13.5MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [02:43<00:00, 60.7MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.90G/9.90G [02:40<00:00, 61.7MB/s]
Downloading (…)of-00003.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.18G/6.18G [01:41<00:00, 61.0MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [07:06<00:00, 142.29s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.21s/it]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 1.01MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [00:00<00:00, 5.03MB/s]
Downloading tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 5.05MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 12.5MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 3.17MB/s]
^C^C^C^C^C^C2023-07-24T08:00:54+0000 [DEBUG] [cli] Importing service "_service.py:svc" from working dir: "/opt/conda/envs/py10/lib/python3.10/site-packages/openllm"
bin /opt/conda/envs/py10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
2023-07-24T08:01:14+0000 [INFO] [cli] Created a temporary directory at /tmp/tmpqthsnq8d
2023-07-24T08:01:14+0000 [INFO] [cli] Writing /tmp/tmpqthsnq8d/_remote_module_non_scriptable.py
[2023-07-24 08:01:14,881] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-24T08:01:16+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-24T08:01:17+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-24T08:01:17+0000 [DEBUG] [cli] Trying paths: ['/home/user/.docker/config.json', '/home/user/.dockercfg']
2023-07-24T08:01:17+0000 [DEBUG] [cli] Found file at path: /home/user/.docker/config.json
2023-07-24T08:01:17+0000 [DEBUG] [cli] Found 'credHelpers' section
2023-07-24T08:01:17+0000 [DEBUG] [cli] [Tracing] Create new propagation context: {'trace_id': 'daf4767d6aa948b4b96d0cdc18949e70', 'span_id': '8ddcc746bd7df314', 'parent_span_id': None, 'dynamic_sampling_context': None}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [12:19<00:00, 246.41s/it]
Using pad_token, but it is not set yet.

Still nothing loaded on the GPU by that time unfortunately.

aarnphm · 2023-07-24T14:33:16Z

What happens with openllm start llama --model-id huggyllama/llama-13b --debug?

QLutz · 2023-07-25T14:23:42Z

Pretty much the same thing at first (using 0.2.9):

[2023-07-25 14:03:55,952] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:04<00:00, 81.50s/it]

But things got moving when I tried to shutdown the command:

^C^C^C^C^C^CStarting server with arguments: ['/opt/conda/envs/py10/bin/python3.10', '-m', 'bentoml', 'serve-http', '_service.py:svc', '--host', '0.0.0.0', '--port', '3000', '--backlog', '2048', '--api-workers', '12', '--working-dir', '/opt/conda/envs/py10/lib/python3.10/site-packages/openllm', '--ssl-version', '17', '--ssl-ciphers', 'TLSv1']
2023-07-25T14:25:28+0000 [DEBUG] [cli] Importing service "_service.py:svc" from working dir: "/opt/conda/envs/py10/lib/python3.10/site-packages/openllm"
2023-07-25T14:25:31+0000 [DEBUG] [cli] Initializing MLIR with module: _site_initialize_0
2023-07-25T14:25:31+0000 [DEBUG] [cli] Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/opt/conda/envs/py10/lib/python3.10/site-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
2023-07-25T14:25:32+0000 [DEBUG] [cli] No jax_plugins namespace packages available
2023-07-25T14:25:33+0000 [DEBUG] [cli] etils.epath found. Using etils.epath for file I/O.
2023-07-25T14:25:51+0000 [INFO] [cli] Created a temporary directory at /tmp/tmpgwt7mutk
2023-07-25T14:25:51+0000 [INFO] [cli] Writing /tmp/tmpgwt7mutk/_remote_module_non_scriptable.py
[2023-07-25 14:25:52,312] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-25T14:26:01+0000 [DEBUG] [cli] Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 7 to 5
2023-07-25T14:26:02+0000 [DEBUG] [cli] Creating converter from 5 to 7
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Popen(['git', 'version'], cwd=/opt/conda/envs/py10/lib/python3.10/site-packages/openllm, universal_newlines=False, shell=None, istream=None)
2023-07-25T14:26:11+0000 [DEBUG] [cli] Trying paths: ['/home/user/.docker/config.json', '/home/qlutz/.dockercfg']
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found file at path: /home/user/.docker/config.json
2023-07-25T14:26:11+0000 [DEBUG] [cli] Found 'credHelpers' section
2023-07-25T14:26:11+0000 [DEBUG] [cli] [Tracing] Create new propagation context: {'trace_id': '663640676af84209a41185161a0d1eac', 'span_id': 'b2ab05f9966f5d45', 'parent_span_id': None, 'dynamic_sampling_context': None}
Loading checkpoint shards:   0%|                                                                                                                                                                                                                                                                                                                      | 0/3 [00:00<?, ?it/s]

Either way, nothing is loaded on the GPU.

aarnphm · 2023-07-25T15:48:45Z

how many GPUs do you have? nvidia-smi?

QLutz · 2023-07-26T06:59:09Z

Still the same setup as in the original post: 1xA100 80GB. I tested on Cuda 11.6 and 11.8

QLutz · 2023-08-17T12:40:51Z

Fixed in the last version (0.2.25) for the described setup and model. Thanks !

npuichigo · 2023-09-09T10:37:15Z

@aarnphm still has the same problem when use openllm start baichuan to load baichuan llm. No gpu usage and cannot accept requests.

QLutz changed the title ~~bug:~~ bug: OpenLLM not loading the model Jul 20, 2023

QLutz closed this as completed Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: OpenLLM not loading the model #125

bug: OpenLLM not loading the model #125

QLutz commented Jul 20, 2023 •

edited

Loading

aarnphm commented Jul 20, 2023 •

edited

Loading

QLutz commented Jul 21, 2023

aarnphm commented Jul 21, 2023 •

edited

Loading

QLutz commented Jul 24, 2023

aarnphm commented Jul 24, 2023

QLutz commented Jul 25, 2023 •

edited

Loading

aarnphm commented Jul 25, 2023

QLutz commented Jul 26, 2023 •

edited

Loading

QLutz commented Aug 17, 2023

npuichigo commented Sep 9, 2023

bug: OpenLLM not loading the model #125

bug: OpenLLM not loading the model #125

Comments

QLutz commented Jul 20, 2023 • edited Loading

Describe the bug

To reproduce

Logs

Environment

Environment variable

System information

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

System information (Optional)

aarnphm commented Jul 20, 2023 • edited Loading

QLutz commented Jul 21, 2023

aarnphm commented Jul 21, 2023 • edited Loading

QLutz commented Jul 24, 2023

aarnphm commented Jul 24, 2023

QLutz commented Jul 25, 2023 • edited Loading

aarnphm commented Jul 25, 2023

QLutz commented Jul 26, 2023 • edited Loading

QLutz commented Aug 17, 2023

npuichigo commented Sep 9, 2023

QLutz commented Jul 20, 2023 •

edited

Loading

aarnphm commented Jul 20, 2023 •

edited

Loading

aarnphm commented Jul 21, 2023 •

edited

Loading

QLutz commented Jul 25, 2023 •

edited

Loading

QLutz commented Jul 26, 2023 •

edited

Loading