Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate over directory and delete files no longer works #288

Closed
captain-igloo opened this issue May 31, 2023 · 2 comments
Closed

iterate over directory and delete files no longer works #288

captain-igloo opened this issue May 31, 2023 · 2 comments

Comments

@captain-igloo
Copy link

captain-igloo commented May 31, 2023

I am using boost::filesystem to loop over files in a directory and conditionally delete them. It works in boost 1.71 but in boost 1.74 it crashes after the first file has been deleted. Is this a bug or am I doing something wrong? What is the recommended way to iterate over a directory and delete files?

$ git clone https://github.com/captain-igloo/boost-filesystem-bug.git

$ cd boost-filesystem-bug

$ docker build -t focal  -f focal.Dockerfile  .

$ docker build -t jammy  -f jammy.Dockerfile  .

# boost 1.71, files deleted
$ docker run -it focal /test-boost/a.out
"/tmp/b"
filesystem::recursive_directory_iterator directory error: No such file or directory
"/tmp/a"
filesystem::recursive_directory_iterator directory error: No such file or directory
"/tmp/c"
filesystem::recursive_directory_iterator directory error: No such file or directory

# boost 1.74, first file deleted, then crashes
$ docker run -it jammy  /test-boost/a.out
"/tmp/b"
filesystem::recursive_directory_iterator increment error: No such file or directory
a.out: /usr/include/boost/filesystem/directory.hpp:534: void boost::filesystem::recursive_directory_iterator::disable_recursion_pending(bool): Assertion `(!is_end())&&("disable_recursion_pending() on end recursive_directory_iterator")' failed.

main.cpp:

#include <iostream>
#include <boost/filesystem.hpp>

int main() {
    boost::filesystem::path path("/tmp");
    boost::filesystem::recursive_directory_iterator it(path), ite;
    while(it != ite) {
        if (!boost::filesystem::is_directory(*it)) {
            std::cout << it->path() << std::endl;
            boost::filesystem::remove(it->path());
        }
        try {
            ++it;
        } catch(std::exception& e) {
            std::cout << e.what() << std::endl;
            it.no_push();
            ++it;
        }
    }
    return 0;
}
@Lastique
Copy link
Member

Lastique commented Jun 3, 2023

I don't think this code is correct. recursive_directory_iterator must query the file type on every increment, which means the file must be present on the filesystem. Effectively, this is the same as if some other process removed the file while directory iteration was in progress, and this counts as a filesystem race.

The fixed code would increment the iterator before removing the file. Additionally, if the increment throws, the iterator may become an end iterator, which you should check in the catch block.

#include <iostream>
#include <boost/filesystem.hpp>

int main() {
    boost::filesystem::path path("test");
    boost::filesystem::recursive_directory_iterator it(path), ite;
    while(it != ite) {
        boost::filesystem::path p = it->path();
        try {
            ++it;
        } catch(std::exception& e) {
            std::cout << e.what() << std::endl;
            if (it == ite)
                break;
            it.disable_recursion_pending();
            ++it;
        }
        if (!boost::filesystem::is_directory(p)) {
            std::cout << p << std::endl;
            boost::filesystem::remove(p);
        }
    }
    return 0;
}

That said, the original code does not have to fail because the implementation is allowed to cache some information in the directory_entry, if it is available during iteration without extra effort. That is, if the file type becomes available as a side effect of directory iteration, recursive_directory_iterator may reuse that information without having to query the filesystem. And in fact, that information is partially available on some filesystems and when glibc is used - directory_iterator does cache the file type in directory_entry it returns. Unfortunately, the permissions are not available, and only the symlink file type is available, so recursive_directory_iterator ends up querying the filesystem anyway. This is something I can fix, if only to improve performance. But I want to make it clear that this information is not available universally (i.e. some filesystems may not report anything at all and on some OSes/libcs this API is entirely missing), which means this behavior may vary between systems and filesystems and cannot be relied upon.

Lastique added a commit that referenced this issue Jun 4, 2023
… dir_it.

This commit changes behavior of directory_entry constructors and modifiers
that change the stored path: the methods will now automatically query
the filesystem for the file status instead of leaving the cached data
default-initialized. This means that the paths passed to directory_entry
must be valid, otherwise an error will be returned. Filesystem querying
is implemented in the new directory_entry::refresh methods.

The constructors and modifiers that accepted file_status arguments are
now marked deprecated. The cached file statuses are an implementation detail,
and eventually we may want to add more cached data, as we add more observers
to directory_entry.

Also added a few file type observers to directory_entry. These observers
allow to avoid querying the filesystem if the full file status is not cached
but the file type is (i.e. when permissions are not cached). This is the case
with readdir-based implementation of directory_iterator, if the underlying
C library supports dirent::d_type field.

recursive_directory_iterator has been updated to use the added file type
observers instead of querying the full status. This may improve performance
of directory iteration.

Closes #288.
@captain-igloo
Copy link
Author

Thanks a lot for your help. I have changed my code to increment the iterator first and it works nicely now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants