Skip to content

Close Files.walk stream in GitVersionControlLocalFileStorage.deleteRepo #5548

@aglinxinyuan

Description

@aglinxinyuan

Background

While reviewing #5447, Yicong-Huang flagged that the Files.walk(...) stream-leak pattern Copilot caught in my test cleanup helpers may exist elsewhere. A sweep of the codebase confirmed one production usage that leaks the underlying directory handle:

common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/util/dataset/GitVersionControlLocalFileStorage.java, lines 80-85:

public static void deleteRepo(Path directoryPath) throws IOException {
    Files.walk(directoryPath)
        .sorted(Comparator.reverseOrder())
        .map(Path::toFile)
        .forEach(File::delete);
}

Files.walk(...) returns a closeable java.util.stream.Stream backed by an open directory handle. Without an explicit close(), the handle stays open until GC — which can flake temp-dir deletion on Windows and leak file descriptors on long-lived JVMs (e.g. the dataset service that calls deleteRepo whenever a Git-backed dataset is removed).

Every other Files.walk / Files.list usage in the codebase already wraps the stream in try/finally and closes it (verified across PveManager.scala, HuggingFaceModelResource.scala, HuggingFaceModelResourceSpec.scala).

What needs to change

Convert deleteRepo to use try-with-resources so the stream is closed even if iteration throws:

public static void deleteRepo(Path directoryPath) throws IOException {
    try (var stream = Files.walk(directoryPath)) {
        stream
            .sorted(Comparator.reverseOrder())
            .map(Path::toFile)
            .forEach(File::delete);
    }
}

No behavior change for callers; just the stream-lifecycle fix.

Scope

  • Single-method edit in GitVersionControlLocalFileStorage.java.
  • No new tests required (existing dataset deletion paths already exercise this code).

Metadata

Metadata

Assignees

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions