Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor should be able to put checks into prologue trace #263

Open
jjsjann123 opened this issue Apr 24, 2024 · 0 comments
Open

executor should be able to put checks into prologue trace #263

jjsjann123 opened this issue Apr 24, 2024 · 0 comments
Labels

Comments

@jjsjann123
Copy link
Collaborator

jjsjann123 commented Apr 24, 2024

馃殌 Feature

dynamic constraints would want to insert executor specific checks into prologue trace, given that backends might have specific dynamic constraints.

A quick example as below:

Given a program to be compiled, where we would expect the reduction axis to be a direct input to the program vvv

def foo(a, reducedim):
    return a.sum(reducedim)

The compute trace looks like below vvv.

import thunder
import thunder.torch as ltorch
import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast()
def computation(a, i0, i1):
  # a: "cuda:0 f32[8, 16, 32]"
  # i0: "int 0"
  # i1: "int 1"
  t2 = ltorch.sum(a, (i0, i1), False, dtype=None)  # t2: "cuda:0 f32[32]"
    # t2 = prims.sum(a, (i0, i1))  # t2: "cuda:0 f32[32]"
  return t2

Depends on which backend claims the ltorch.sum. i.e. for torchex, since aten can handle arbitrary reducedim given at runtime, we can re-use the cache and there's no need to insert any check on arg[1].
On the contrary, nvfuserex would require the program to bake in reduction axis as compile time constant. So we'd want to insert that as part of the prologue trace checks.

Alternatives

Alternative 0: we can converge on the most conservative backends and apply a simpler caching strategy at the primitive level. In the example above, we'll just require reduction axis to stay as a compile time constant thing across the board, even though it could be re-used for some executors. This would unfortunately gives us some negative cache hit but would be easier to plumb through.

Alternative 1: thunder as a system can establish a caching strategy. When a backend sees a cache requirement on a certain op that it cannot fulfill, the backend could just reject the operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants