bug: Not able to start tiiuae/falcon-7b #41

Kizy625 · 2023-06-20T16:28:40Z

Describe the bug

Hi there,

I followed the instruction on GitHub to start tiiuae/falcon-7b.

pip install "openllm[falcon]" openllm start falcon --model-id tiiuae/falcon-7b
Then, when calling the localhost:3000 for the first time, it's giving timeouts for 30 seconds.

The second time it gives back the next output (check the logs) and also timeouts after some time.

Thanks in advance!

To reproduce

No response

Logs

openllm start falcon --model-id tiiuae/falcon-7b
Make sure to have the following dependencies available: ['einops', 'xformers', 'safetensors']
2023-06-20T16:17:11+0000 [INFO] [cli] Environ for worker 0: set CUDA_VISIBLE_DEVICES to 0
2023-06-20T16:17:11+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service.py:svc" can be accessed at http://localhost:3000/metrics.
2023-06-20T16:17:12+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
2023-06-20 16:17:15.720532: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.81s/it]
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
2023-06-20T16:19:42+0000 [INFO] [runner:llm-falcon-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.616ms (trace=61c419d1f6ebf4618a33c76ab591ca84,span=f0534e5aa9799160,sampled=1,service.name=llm-falcon-runner)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:8] 127.0.0.1:32864 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 139.235ms (trace=61c419d1f6ebf4618a33c76ab591ca84,span=8acdd6ebfdfc0bc3,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [runner:llm-falcon-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.315ms (trace=e9158a075fc27b60719a6852115ec748,span=5147903d91a8f5cc,sampled=1,service.name=llm-falcon-runner)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:3] 127.0.0.1:32872 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 140.223ms (trace=e9158a075fc27b60719a6852115ec748,span=3340643d8fd8fbf3,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:8] 127.0.0.1:32874 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=6855) 10.052ms (trace=78691d213c95604978c79a03e7af901e,span=b1ae35a9f48663c7,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:1] 127.0.0.1:32882 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 4.523ms (trace=994b1e4334df607b036138b15b5bd92d,span=8fe7b9288f48d7eb,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:7] 127.0.0.1:32896 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 3.804ms (trace=1cd9f72ec6c621f4dfc0378da339833f,span=f05a197843500866,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:43+0000 [INFO] [api_server:llm-falcon-service:4] 127.0.0.1:32900 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 3.348ms (trace=a2df8f149d8c22318f5bee1beef3b58b,span=38859e751c1b52fa,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:43+0000 [INFO] [api_server:llm-falcon-service:4] 127.0.0.1:32906 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 0.691ms (trace=b24a982fb330e7db790eee4e166c5fbe,span=63ab2145057b9fb1,sampled=1,service.name=llm-falcon-service)
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.

Environment

bentoml: 1.0.22
openllm: 0.1.8
platform: paperspace

The text was updated successfully, but these errors were encountered:

aarnphm · 2023-06-20T17:03:56Z

falcon requires a lot of resource to run, even during inference.

This has to do with the model having to compute all of the matrices through the attention layer.

On a 4 A10G, The average latency I'm seeing is around 140s

Kizy625 · 2023-06-20T18:19:23Z

Hey,

Yes, I know, but in my case I do not think it is a resource problem. It not about the response time, it is not responding at all.

With paperspace I created a dedicated GPU instance A100 GPU instance.

With 12 CPU, 90GB Memory, without any additional services running on this instance.

That's why I thought the problem is in the logs:

The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

aarnphm · 2023-06-20T22:16:35Z

Got it, i will take a look

aarnphm · 2023-06-26T21:43:02Z

I was only able to run Falcon on g5.24xlarge, which has 96GB GPU mem, 384GB ram :)

Kizy625 · 2023-06-29T08:33:47Z

Wow
Okay, I will give it a try
Thanks!

Kizy625 closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Not able to start tiiuae/falcon-7b #41

bug: Not able to start tiiuae/falcon-7b #41

Kizy625 commented Jun 20, 2023 •

edited

aarnphm commented Jun 20, 2023

Kizy625 commented Jun 20, 2023

aarnphm commented Jun 20, 2023

aarnphm commented Jun 26, 2023

Kizy625 commented Jun 29, 2023

bug: Not able to start tiiuae/falcon-7b #41

bug: Not able to start tiiuae/falcon-7b #41

Comments

Kizy625 commented Jun 20, 2023 • edited

Describe the bug

To reproduce

Logs

Environment

aarnphm commented Jun 20, 2023

Kizy625 commented Jun 20, 2023

aarnphm commented Jun 20, 2023

aarnphm commented Jun 26, 2023

Kizy625 commented Jun 29, 2023

Kizy625 commented Jun 20, 2023 •

edited