You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An environment variable that dumps out the various Thunder provided debug traces to a log file. This can have variable levels like export THUNDER_DEBUG=<option>
0/'' : Disable
1/'trace' : Enable and dump Thunder generated trace. Can be limited to the trace after delete last used
2/'nvfuser_region' : Enable and dump nvFuser captured regions in addition to 1
3/'nvfuser_code' : Enable and dump nvFuser generated CUDA kernel code in addition to 1 and 2
4/'torch_compile_debug' : Enable the torch.compile debug logging (TORCH_COMPILE_DEBUG=1)
This is a narrow example of the possible debug log levels. Each of these logs can be in a different log file.
Motivation
To get the trace and other debugging information today, we need to add code that captures the trace and prints it after running a model iteration with the inputs.
This is cumbersome as the training code needs to be edited to enable tracing and re-edited when finished.
The ability to find when an iteration has finished and add the tracing code at the appropriate location may not always be possible as Thunder aims to compile more and more convoluted set of repositories. For example, when using libraries like
Lightning Trainer, the user may want to just call model.train() but editing the iteration loop can be difficult.
I see three issues using add_post_optimization_transform
Currently, post_optimization_transform is not applied to prologue_trace - so we won't be able to save it.
The transform is applied to forward and backward trace independently (but we don't explicitly say if given trace is forward or backward). We can probably derive it from trace signature but I don't think it is a good idea.
Also, if using multiple post_optimization_transforms, user will have to make sure that this saving transform would be last, otherwise, it would miss saving information from other transforms which were applied after this one.
馃殌 Feature
An environment variable that dumps out the various Thunder provided debug traces to a log file. This can have variable levels like
export THUNDER_DEBUG=<option>
This is a narrow example of the possible debug log levels. Each of these logs can be in a different log file.
Motivation
To get the trace and other debugging information today, we need to add code that captures the trace and prints it after running a model iteration with the inputs.
Lightning Trainer, the user may want to just call
model.train()
but editing the iteration loop can be difficult.cc - @mruberry
cc @carmocca @apaz-cli
The text was updated successfully, but these errors were encountered: