Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Not able to start tiiuae/falcon-7b #41

Closed
Kizy625 opened this issue Jun 20, 2023 · 5 comments
Closed

bug: Not able to start tiiuae/falcon-7b #41

Kizy625 opened this issue Jun 20, 2023 · 5 comments

Comments

@Kizy625
Copy link

Kizy625 commented Jun 20, 2023

Describe the bug

Hi there,

I followed the instruction on GitHub to start tiiuae/falcon-7b.

pip install "openllm[falcon]" openllm start falcon --model-id tiiuae/falcon-7b
Then, when calling the localhost:3000 for the first time, it's giving timeouts for 30 seconds.

The second time it gives back the next output (check the logs) and also timeouts after some time.

Thanks in advance!

To reproduce

No response

Logs

openllm start falcon --model-id tiiuae/falcon-7b
Make sure to have the following dependencies available: ['einops', 'xformers', 'safetensors']
2023-06-20T16:17:11+0000 [INFO] [cli] Environ for worker 0: set CUDA_VISIBLE_DEVICES to 0
2023-06-20T16:17:11+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service.py:svc" can be accessed at http://localhost:3000/metrics.
2023-06-20T16:17:12+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
2023-06-20 16:17:15.720532: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.81s/it]
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
2023-06-20T16:19:42+0000 [INFO] [runner:llm-falcon-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.616ms (trace=61c419d1f6ebf4618a33c76ab591ca84,span=f0534e5aa9799160,sampled=1,service.name=llm-falcon-runner)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:8] 127.0.0.1:32864 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 139.235ms (trace=61c419d1f6ebf4618a33c76ab591ca84,span=8acdd6ebfdfc0bc3,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [runner:llm-falcon-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.315ms (trace=e9158a075fc27b60719a6852115ec748,span=5147903d91a8f5cc,sampled=1,service.name=llm-falcon-runner)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:3] 127.0.0.1:32872 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 140.223ms (trace=e9158a075fc27b60719a6852115ec748,span=3340643d8fd8fbf3,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:8] 127.0.0.1:32874 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=6855) 10.052ms (trace=78691d213c95604978c79a03e7af901e,span=b1ae35a9f48663c7,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:1] 127.0.0.1:32882 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 4.523ms (trace=994b1e4334df607b036138b15b5bd92d,span=8fe7b9288f48d7eb,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:42+0000 [INFO] [api_server:llm-falcon-service:7] 127.0.0.1:32896 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 3.804ms (trace=1cd9f72ec6c621f4dfc0378da339833f,span=f05a197843500866,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:43+0000 [INFO] [api_server:llm-falcon-service:4] 127.0.0.1:32900 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 3.348ms (trace=a2df8f149d8c22318f5bee1beef3b58b,span=38859e751c1b52fa,sampled=1,service.name=llm-falcon-service)
2023-06-20T16:19:43+0000 [INFO] [api_server:llm-falcon-service:4] 127.0.0.1:32906 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=706) 0.691ms (trace=b24a982fb330e7db790eee4e166c5fbe,span=63ab2145057b9fb1,sampled=1,service.name=llm-falcon-service)
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.

Environment

bentoml: 1.0.22
openllm: 0.1.8
platform: paperspace

@aarnphm
Copy link
Member

aarnphm commented Jun 20, 2023

falcon requires a lot of resource to run, even during inference.

This has to do with the model having to compute all of the matrices through the attention layer.

On a 4 A10G, The average latency I'm seeing is around 140s

@Kizy625
Copy link
Author

Kizy625 commented Jun 20, 2023

Hey,

Yes, I know, but in my case I do not think it is a resource problem. It not about the response time, it is not responding at all.

With paperspace I created a dedicated GPU instance A100 GPU instance.

With 12 CPU, 90GB Memory, without any additional services running on this instance.

That's why I thought the problem is in the logs:

The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

@aarnphm
Copy link
Member

aarnphm commented Jun 20, 2023

Got it, i will take a look

@aarnphm
Copy link
Member

aarnphm commented Jun 26, 2023

I was only able to run Falcon on g5.24xlarge, which has 96GB GPU mem, 384GB ram :)

@Kizy625
Copy link
Author

Kizy625 commented Jun 29, 2023

Wow
Okay, I will give it a try
Thanks!

@Kizy625 Kizy625 closed this as completed Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants