get_device_context() allocates a new tensor on every call

## Bug

In `iris/iris.py`, `get_device_context()` calls `torch.tensor(context_data, dtype=torch.int64, device=self.device)` every time it is invoked. The context data (rank, world_size, heap_bases) never changes after initialization.

This causes two problems:

1. **Performance:** Unnecessary GPU memory allocation on every kernel call.
2. **CUDAGraph/HIPGraph incompatibility:** `torch.tensor()` allocates GPU memory, which is forbidden during graph capture. Any kernel that calls `get_device_context()` in its launch path cannot be captured in a CUDAGraph.

## Impact

Blocks CUDAGraph integration for any iris-backed fused kernel (e.g., `matmul_all_reduce` used via vLLM's `torch.compile` pipeline). Minor performance overhead otherwise.

## Fix

Cache the tensor on first call:

```python
def get_device_context(self):
    if self._cached_device_context is not None:
        return self._cached_device_context
    # ... existing construction logic ...
    self._cached_device_context = context_tensor
    return context_tensor
```

Add `self._cached_device_context = None` in `__init__`.

## Component

`iris/iris.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_device_context() allocates a new tensor on every call #466

Bug

Impact

Fix

Component

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_device_context() allocates a new tensor on every call #466

Description

Bug

Impact

Fix

Component

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions