[BREAKING][fix] Use jagged tensor as default tensor type#92
[BREAKING][fix] Use jagged tensor as default tensor type#92ji-huazhong merged 5 commits intoAscend:mainfrom
Conversation
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
CLA Signature Pass0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
There was a problem hiding this comment.
Pull request overview
This PR changes TransferQueue’s tensor aggregation contract so that batched tensor fields are returned as nested tensors by default (instead of using torch.stack() when shapes are uniform), aligning behavior across backends and reducing downstream branching on return types.
Changes:
- Removed the dense
torch.stack()fast-path during aggregation in both KV and async simple storage manager codepaths, preferringas_nested_tensor(..., layout=jagged)with a strided fallback. - Updated metadata/docs/comments to reflect the new “nested-by-default” behavior.
- Updated unit + e2e tests to compare nested tensors safely (per-sample comparisons) and to use more realistic variable-length test data.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| transfer_queue/utils/common.py | Updates docstring to reflect broader “tensor aggregation” use (not just torch.stack). |
| transfer_queue/storage/managers/simple_storage_manager.py | Removes torch.stack fast-path; always builds nested tensors (jagged → strided). |
| transfer_queue/storage/managers/base.py | KV merge path now always attempts nested tensors (jagged → strided) before NonTensorStack. |
| tests/test_kv_storage_manager.py | Updates fixtures/assertions for nested-by-default behavior and variable-length fields. |
| tests/test_async_simple_storage_manager.py | Updates _pack_field_values expectations: uniform tensors now return nested tensors. |
| tests/e2e/test_kv_interface_e2e.py | Updates equality/close helpers to handle nested vs dense comparisons. |
| tests/e2e/test_e2e_lifecycle_consistency.py | Updates e2e verification helpers and comparisons for nested-by-default return values. |
Comments suppressed due to low confidence (1)
tests/e2e/test_e2e_lifecycle_consistency.py:896
np_arrayis treated as a dense 2D tensor ([0, 0]), but earlier in this file it’s noted it may be returned as a nested tensor by default. For nested tensors, update the mutation/assertion to use per-sample indexing (e.g.,[0][0]) to avoid indexing errors.
# 8. np_array: verify it's a tensor now (TensorDict auto-converts numeric numpy)
# If it's a tensor, writability is guaranteed by nested tensor creation
np_arr_retrieved = retrieved["np_array"]
if isinstance(np_arr_retrieved, torch.Tensor):
np_arr_retrieved[0, 0] = 22222.0
assert np_arr_retrieved[0, 0].item() == 22222.0, "np_array (as tensor) should be writable"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CLA Signature Pass0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Pass0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Pass0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Pass0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
| if all(isinstance(v, torch.Tensor) for v in chunk) and all(v.dim() == 0 for v in chunk): | ||
| return field, torch.stack(chunk) |
| { | ||
| "cell_type": "code", | ||
| "execution_count": 23, | ||
| "metadata": {}, | ||
| "source": [], | ||
| "outputs": [], | ||
| "source": [] | ||
| "execution_count": 23 | ||
| } |
Background
Previously, TransferQueue would try
torch.stack()first when merging per-sample tensors into a batched tensordict for user retrieval. As a result, tensors with uniform shapes were returned as regular dense tensors, while jagged data fell back to nested tensors. This inconsistency forced downstream code to handle two distinct data types (torch.Tensor vs. nested tensor), adding unnecessary branching logic.Changes
This PR changes the default aggregation strategy so that all tensor fields are returned as nested tensors by default, eliminating the
torch.stack()fast-path.Specifically:
KVStorageManager._merge_tensors_to_tensordict: Removed thetorch.stack(chunk)fallback. The new chain is as_nested_tensor(jagged) → nested_tensor(strided) → NonTensorStack.AsyncSimpleStorageManager._pack_field_values: Removed thetorch.stack(values)fast-path for uniform-shape tensors. The new in is as_nested_tensor(jagged) → as_nested_tensor(strided) → NonTensorStack, consistent with the KV backend.KVStorageManager, ensuring both backends behave identically when jagged layout fails (e.g., for zero-dim tensors).torch.stack-first behavior.Test updates