fix(ler): force num_workers=0 when torch.compile is active to prevent segfault#31
fix(ler): force num_workers=0 when torch.compile is active to prevent segfault#31ivanbasov wants to merge 1 commit into
Conversation
If we are adding this to all orientations (O1-O4), then when is torch compilation remaining enabled? |
7f0f6c8 to
8410009
Compare
8410009 to
ec856fa
Compare
…fault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ec856fa to
6161a3e
Compare
Thanks for the great point! I originally attempted to fix this by disabling torch.compile in the CI |
|
I am not sure how come we bump into this, the following PR was made to test and make sure we removed this problem: removing parallelism is a nice quick hack, but we need it for production. |
|
seems to be already fixed with #29 |
Summary
Test plan
🤖 Generated with Claude Code