Inference: require max sequence length instead of assuming 2048 #52

justheuristic · 2022-08-28T23:08:13Z

Maximum length is now provided in .inference_session(max_length=100)
- previously, we would always assume max length = 2048
added a generic way to forward **kwargs to inference session
- for compatibility with Priority tasks #47
- Note to @borzunov : it does not pass them arbitrarily, but instead checks for kwarg names at the bottom level
run_server can be started with a custom max_length for inference
renamed --cache_size_bytes to --attention_cache_bytes (to avoid collision with --cache_dir)
--attn_cache_bytes can now support humane file sizes (e.g. 300MB instead of 314572800)
made some server-side errors more human-readable to user (e.g. when max length is exceeded)

justheuristic · 2022-08-28T23:38:38Z

cli/run_server.py

+
+
+def parse_size_as_bytes(size: str) -> int:
+    """parse human-readable data size e.g. 1.5GB, based on https://stackoverflow.com/a/42865957/2002471"""


This cuts some corners (e.g. 1GB = 1GiB). We can get it right all the time by using https://pypi.org/project/humanfriendly/ , but I'm not sure it justifies the extra depenedcy

@borzunov any preferences?

I'm okay with adding this light dependency. I think it's good if we don't have extra code in the PETALS codebase to keep it small.

I'm not okay (weakly) with the same behavior for GB and GiB.

If you don't want a new dependency, to support both GB and GiB in a correct way, you can actually simplify the code to do smth like:

units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'] unit = unit.upper() if 'I' in unit: unit = 1024 ** units.index(unit.replace('I', '')) else: unit = 1000 ** units.index(unit)

In other words, you can use the same array of units, just with different bases :)

borzunov · 2022-08-29T14:03:54Z

cli/run_server.py

@@ -8,6 +8,20 @@
 use_hivemind_log_handler("in_root_logger")
 logger = get_logger(__file__)

+import re


Let's move this to the top of the file

borzunov · 2022-08-29T14:04:16Z

cli/run_server.py

    parser.add_argument('--device', type=str, default=None, required=False,
                        help='all experts will use this device in torch notation; default: cuda if available else cpu')
    parser.add_argument("--torch_dtype", type=str, default="auto",
                        help="Use this dtype to store block weights and do computations. "
                             "By default, respect the dtypes in the pre-trained state dict.")
+    parser.add_argument('--attention_cache_bytes', type=str, default=None,


Suggested change

parser.add_argument('--attention_cache_bytes', type=str, default=None,

parser.add_argument('--attn_cache_size', type=str, default=None,

I strongly advise to replace cache_bytes to cache_size because:

Specifying the size in bytes is standard across Python libs

Moreover, here the size can be specified in any units

Also, I weakly advise to replace attention to attn because it's much shorter (but still understandable to everyone). This is optional though.

[done both]

tests/test_chained_calls.py

borzunov · 2022-08-29T14:06:33Z

src/server/server.py

@@ -135,13 +138,15 @@ def create(
        assert (block_indices is None) != (num_blocks is None), "please specify num_blocks or block_indices, not both"
        if expiration is None:
            expiration = max(2 * update_period, MAX_DHT_TIME_DISCREPANCY_SECONDS)
+        if inference_max_length is None:
+            inference_max_length = max_batch_size


Why do we assign max batch size to max sequence size?

both are (meant to be) in tokens

Would you prefer to set it to a constant by default?

cli/run_server.py

src/server/handler.py

borzunov · 2022-08-29T14:19:59Z

cli/run_server.py

+
+
+def parse_size_as_bytes(size: str) -> int:
+    """parse human-readable data size e.g. 1.5GB, based on https://stackoverflow.com/a/42865957/2002471"""


I'm okay with adding this light dependency. I think it's good if we don't have extra code in the PETALS codebase to keep it small.

I'm not okay (weakly) with the same behavior for GB and GiB.

If you don't want a new dependency, to support both GB and GiB in a correct way, you can actually simplify the code to do smth like:

units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'] unit = unit.upper() if 'I' in unit: unit = 1024 ** units.index(unit.replace('I', '')) else: unit = 1000 ** units.index(unit)

In other words, you can use the same array of units, just with different bases :)

Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

…o fix-cache

.github/workflows/run-tests.yaml

borzunov · 2022-08-29T16:45:32Z

src/server/handler.py

+
+            if not requested_uids:
+                raise ValueError("User must specify at least one block for inference, but got none")
+            assert isinstance(max_length, int), f"rpc_inference metadata must contain int seq_length, got {max_length}"


Suggested change

assert isinstance(max_length, int), f"rpc_inference metadata must contain int seq_length, got {max_length}"

assert isinstance(max_length, int), f"rpc_inference metadata must contain int max_length, got {max_length}"

justheuristic added 4 commits August 29, 2022 02:02

parse human-readable cache_size_bytes

3d56314

enforce max_length

a8d9159

black-isort

d179ec1

clarify assert

11492c7

justheuristic requested review from artek0chumak and borzunov August 28, 2022 23:08

justheuristic added 5 commits August 29, 2022 02:15

fix humane parser

2110b2c

Merge branch 'main' into fix-cache

1d59e0c

fix cache size args, check it in tests

0b09b33

black-isort

004aae3

forward metadata in sequential inference session

d6053a2

justheuristic commented Aug 28, 2022

View reviewed changes

justheuristic mentioned this pull request Aug 29, 2022

Make attention cache wait until memory is freed #53

Merged

3 tasks

borzunov requested changes Aug 29, 2022

View reviewed changes

borzunov and others added 7 commits August 29, 2022 17:31

review

612100b

review

f018e64

Update src/server/handler.py

9987fe8

Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

review

33f6bdf

add test for exceeding max length

b470052

use humanfriendly

0f5c427

Merge branch 'fix-cache' of github.com:bigscience-workshop/petals int…

a9af1b4

…o fix-cache

borzunov requested changes Aug 29, 2022

View reviewed changes

borzunov and others added 3 commits August 29, 2022 20:06

review

4f9ec08

review

d94549e

review

a707eae

borzunov approved these changes Aug 29, 2022

View reviewed changes

justheuristic merged commit d271b75 into main Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference: require max sequence length instead of assuming 2048 #52

Inference: require max sequence length instead of assuming 2048 #52

justheuristic commented Aug 28, 2022 •

edited

Loading

justheuristic Aug 28, 2022

borzunov Aug 29, 2022 •

edited

Loading

borzunov Aug 29, 2022

borzunov Aug 29, 2022

justheuristic Aug 29, 2022

borzunov Aug 29, 2022

justheuristic Aug 29, 2022

borzunov Aug 29, 2022 •

edited

Loading

borzunov Aug 29, 2022



		def parse_size_as_bytes(size: str) -> int:
		"""parse human-readable data size e.g. 1.5GB, based on https://stackoverflow.com/a/42865957/2002471"""

	parser.add_argument('--attention_cache_bytes', type=str, default=None,
	parser.add_argument('--attn_cache_size', type=str, default=None,

	assert isinstance(max_length, int), f"rpc_inference metadata must contain int seq_length, got {max_length}"
	assert isinstance(max_length, int), f"rpc_inference metadata must contain int max_length, got {max_length}"

Inference: require max sequence length instead of assuming 2048 #52

Inference: require max sequence length instead of assuming 2048 #52

Conversation

justheuristic commented Aug 28, 2022 • edited Loading

justheuristic Aug 28, 2022

Choose a reason for hiding this comment

borzunov Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

borzunov Aug 29, 2022

Choose a reason for hiding this comment

borzunov Aug 29, 2022

Choose a reason for hiding this comment

justheuristic Aug 29, 2022

Choose a reason for hiding this comment

borzunov Aug 29, 2022

Choose a reason for hiding this comment

justheuristic Aug 29, 2022

Choose a reason for hiding this comment

borzunov Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

borzunov Aug 29, 2022

Choose a reason for hiding this comment

justheuristic commented Aug 28, 2022 •

edited

Loading

borzunov Aug 29, 2022 •

edited

Loading

borzunov Aug 29, 2022 •

edited

Loading