Add various server timeouts, lower --max_batch_size and --inference_max_length defaults #97

borzunov · 2022-11-29T05:21:00Z

Summary:

parser.add_argument('--inference_max_length', type=int, default=2048,
                    help='Maximum total sequence length permitted per inference, defaults to 16384 tokens')
parser.add_argument('--max_batch_size', type=int, default=2048,
                    help='The total number of tokens in the same batch will not exceed this value')

parser.add_argument('--alloc_timeout', type=float, default=60,
                    help='If the cache is full, the server will wait for this number of seconds hoping that some memory will be freed '
                         'before rejecting the request')
parser.add_argument('--request_timeout', type=float, required=False, default=3 * 60,
                    help='Timeout for the whole rpc_forward/rpc_backward/rpc_forward_stream/rpc_backward_stream request')
parser.add_argument('--session_timeout', type=float, required=False, default=30 * 60,
                    help='Timeout for the whole inference session')
parser.add_argument('--step_timeout', type=float, required=False, default=5 * 60,
                    help="Timeout for waiting the next step's inputs inside an inference session")

This PR also enables PETALS_8BIT_BACKWARD by default.

borzunov added 2 commits November 29, 2022 04:55

Add cache allocation timeout

8af3ac3

Add --{request,session,step}_timeout

ebf07d3

borzunov force-pushed the server-timeouts branch from 88d61b1 to ebf07d3 Compare November 29, 2022 05:27

borzunov added 2 commits November 29, 2022 05:40

Enable PETALS_8BIT_BACKWARD by default

eb58097

Lower --max_batch_size and --inference_max_length defaults to 2048

5578378

borzunov changed the title ~~Add various server timeouts~~ Add various server timeouts, lower --max_batch_size and --inference_max_length defaults Nov 29, 2022

borzunov merged commit c6e1b5a into main Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add various server timeouts, lower --max_batch_size and --inference_max_length defaults #97

Add various server timeouts, lower --max_batch_size and --inference_max_length defaults #97

borzunov commented Nov 29, 2022 •

edited

Loading

Add various server timeouts, lower --max_batch_size and --inference_max_length defaults #97

Add various server timeouts, lower --max_batch_size and --inference_max_length defaults #97

Conversation

borzunov commented Nov 29, 2022 • edited Loading

borzunov commented Nov 29, 2022 •

edited

Loading