Support multi-gpu inference #699

research4pan · 2024-03-04T15:10:05Z

Use accelerate to support multi-gpu chatbot

- Bug introduced by `transformers 4.38.0`, fixed a week ago and will appear in >4.38.2 (huggingface/transformers#29457) - By restricting `transformers<4.38.0` in `requirements.txt`

wheresmyhair

Perhaps change the token_per_step in examples/chatbot.py. Other changes LGTM.

- Set default to num_token_per_step=4

research4pan requested review from yaoguany, shizhediao and SHUMKASHUN March 4, 2024 15:10

Support multi-gpu inference

c2551fe

research4pan force-pushed the rpan-batch-infer branch from adf825a to c2551fe Compare March 14, 2024 13:14

Fix offload_weight()

ec9eeb7

- Bug introduced by `transformers 4.38.0`, fixed a week ago and will appear in >4.38.2 (huggingface/transformers#29457) - By restricting `transformers<4.38.0` in `requirements.txt`

research4pan requested a review from wheresmyhair March 22, 2024 15:24

wheresmyhair requested changes Mar 24, 2024

View reviewed changes

Export arg of num_token_per_step for chatbot

af24ec4

- Set default to num_token_per_step=4

wheresmyhair approved these changes Mar 24, 2024

View reviewed changes

research4pan merged commit 8cc21b8 into main Mar 24, 2024
0 of 2 checks passed

research4pan deleted the rpan-batch-infer branch March 31, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-gpu inference #699

Support multi-gpu inference #699

research4pan commented Mar 4, 2024

wheresmyhair left a comment

Support multi-gpu inference #699

Support multi-gpu inference #699

Conversation

research4pan commented Mar 4, 2024

wheresmyhair left a comment

Choose a reason for hiding this comment