Skip to content

fix: handle non-existent path in is_nfs_path for Triton autotune cache#7921

Merged
tohtana merged 2 commits intodeepspeedai:masterfrom
Krishnachaitanyakc:fix/triton-autotune-nfs-check-nonexistent-path
Mar 25, 2026
Merged

fix: handle non-existent path in is_nfs_path for Triton autotune cache#7921
tohtana merged 2 commits intodeepspeedai:masterfrom
Krishnachaitanyakc:fix/triton-autotune-nfs-check-nonexistent-path

Conversation

@Krishnachaitanyakc
Copy link
Copy Markdown
Contributor

Summary

  • is_nfs_path() in matmul_ext.py passes the cache directory path to df -T before the directory is created, causing df: /root/.triton/autotune: No such file or directory errors on stderr
  • Fix by walking up to the nearest existing ancestor directory before invoking df, which correctly resolves the filesystem type without requiring the target path to exist
  • Also suppress stderr via subprocess.DEVNULL and catch FileNotFoundError for environments where df is unavailable (e.g., minimal containers)

Root Cause

In AutotuneCacheManager.__init__, TritonCacheDir.warn_if_nfs(self.cache_dir) is called before os.makedirs(self.cache_dir, exist_ok=True). The is_nfs_path() function then runs df -T on a path that does not yet exist, which causes df to print an error to stderr. While the CalledProcessError exception was caught, the stderr output still leaked to the user's terminal.

Changes

  • deepspeed/ops/transformer/inference/triton/matmul_ext.py: Walk up to nearest existing ancestor before calling df -T; suppress stderr; catch FileNotFoundError

Testing

  • Python syntax validation: PASS
  • yapf formatting check: PASS (no diff)
  • flake8: PASS (no warnings)

Fixes #7642

deepspeedai#7642)

When the Triton autotune cache directory does not exist yet,
`is_nfs_path()` passes a non-existent path to `df -T`, which
fails with "No such file or directory" on stderr.

Fix by walking up to the nearest existing ancestor directory
before calling `df`. Also suppress stderr output and catch
`FileNotFoundError` for environments where `df` is unavailable.

Fixes deepspeedai#7642

Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Copy link
Copy Markdown
Collaborator

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for your contribution, @Krishnachaitanyakc!

@tohtana tohtana merged commit f887b98 into deepspeedai:master Mar 25, 2026
10 of 11 checks passed
nathon-lee pushed a commit to nathon-lee/DeepSpeed_woo that referenced this pull request Mar 27, 2026
deepspeedai#7921)

### Summary

- `is_nfs_path()` in `matmul_ext.py` passes the cache directory path to
`df -T` before the directory is created, causing `df:
/root/.triton/autotune: No such file or directory` errors on stderr
- Fix by walking up to the nearest existing ancestor directory before
invoking `df`, which correctly resolves the filesystem type without
requiring the target path to exist
- Also suppress stderr via `subprocess.DEVNULL` and catch
`FileNotFoundError` for environments where `df` is unavailable (e.g.,
minimal containers)

### Root Cause

In `AutotuneCacheManager.__init__`,
`TritonCacheDir.warn_if_nfs(self.cache_dir)` is called before
`os.makedirs(self.cache_dir, exist_ok=True)`. The `is_nfs_path()`
function then runs `df -T` on a path that does not yet exist, which
causes `df` to print an error to stderr. While the `CalledProcessError`
exception was caught, the stderr output still leaked to the user's
terminal.

### Changes

- `deepspeed/ops/transformer/inference/triton/matmul_ext.py`: Walk up to
nearest existing ancestor before calling `df -T`; suppress stderr; catch
`FileNotFoundError`

### Testing

- Python syntax validation: PASS
- yapf formatting check: PASS (no diff)
- flake8: PASS (no warnings)

Fixes deepspeedai#7642

Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
nathon-lee pushed a commit to nathon-lee/DeepSpeed_woo that referenced this pull request Mar 28, 2026
deepspeedai#7921)

### Summary

- `is_nfs_path()` in `matmul_ext.py` passes the cache directory path to
`df -T` before the directory is created, causing `df:
/root/.triton/autotune: No such file or directory` errors on stderr
- Fix by walking up to the nearest existing ancestor directory before
invoking `df`, which correctly resolves the filesystem type without
requiring the target path to exist
- Also suppress stderr via `subprocess.DEVNULL` and catch
`FileNotFoundError` for environments where `df` is unavailable (e.g.,
minimal containers)

### Root Cause

In `AutotuneCacheManager.__init__`,
`TritonCacheDir.warn_if_nfs(self.cache_dir)` is called before
`os.makedirs(self.cache_dir, exist_ok=True)`. The `is_nfs_path()`
function then runs `df -T` on a path that does not yet exist, which
causes `df` to print an error to stderr. While the `CalledProcessError`
exception was caught, the stderr output still leaked to the user's
terminal.

### Changes

- `deepspeed/ops/transformer/inference/triton/matmul_ext.py`: Walk up to
nearest existing ancestor before calling `df -T`; suppress stderr; catch
`FileNotFoundError`

### Testing

- Python syntax validation: PASS
- yapf formatting check: PASS (no diff)
- flake8: PASS (no warnings)

Fixes deepspeedai#7642

Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Signed-off-by: nathon-lee <leejianwoo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]deepspeed/ops/transformer/inference/triton/matmul_ext.py -> df: /root/.triton/autotune: No such file or directory

2 participants