Skip to content

recursive_directory_iterator::increment fails assertion on corrupted file system #113

@xRodney

Description

@xRodney

From operations.cpp

  // Invariant: On return, the top of the iterator stack is the next valid (possibly
  // end) iterator, regardless of whether or not an error is reported, and regardless of
  // whether any error is reported by exception or error code. In other words, progress
  // is always made so a loop on the iterator will always eventually terminate
  // regardless of errors.
  BOOST_FILESYSTEM_DECL
  void recur_dir_itr_imp::increment(system::error_code* ec)

This invariant is not honored on corrupted filesystems, when a nested directory cannot be traversed fully.
An example of such "corrupted" filesystem is Apple implementation of quarantine on Mac OS X High Sierra and higher.

Steps to reproduce

Consider this program:

#define BOOST_FILESYSTEM_VERSION 3

//  As an example program, we don't want to use any deprecated features
#ifndef BOOST_FILESYSTEM_NO_DEPRECATED 
#  define BOOST_FILESYSTEM_NO_DEPRECATED
#endif
#ifndef BOOST_SYSTEM_NO_DEPRECATED 
#  define BOOST_SYSTEM_NO_DEPRECATED
#endif

#include "boost/filesystem/operations.hpp"
#include "boost/filesystem/path.hpp"
#include "boost/progress.hpp"
#include <iostream>

namespace fs = boost::filesystem;

int main(int argc, char* argv[])
{
  fs::path p(fs::current_path());

  if (argc > 1)
    p = fs::system_complete(argv[1]);
  else
    std::cout << "\nusage:   recursive_ls [path]" << std::endl;

  unsigned long file_count = 0;
  unsigned long dir_count = 0;
  unsigned long other_count = 0;
  unsigned long err_count = 0;

  if (!fs::exists(p))
  {
    std::cout << "\nNot found: " << p << std::endl;
    return 1;
  }

  if (fs::is_directory(p))
  {
    boost::system::error_code ec;
    std::cout << "\nIn directory: " << p << "\n\n";
    fs::recursive_directory_iterator end_iter;
    for (fs::recursive_directory_iterator dir_itr(p);
          dir_itr != end_iter;)
    {
      try
      {
        if (fs::is_directory(dir_itr->status()))
        {
          ++dir_count;
          std::cout << dir_itr->path() << " [directory]\n";
        }
        else if (fs::is_regular_file(dir_itr->status()))
        {
          ++file_count;
          std::cout << dir_itr->path() << "\n";
        }
        else
        {
          ++other_count;
          std::cout << dir_itr->path() << " [other]\n";
        }

        boost::system::error_code ec;
        dir_itr.increment(ec);
        if (ec)
        {
            ++err_count;
            std::cout << "*" << dir_itr->path() << " *" << ec.message() << std::endl;
        }
      }
      catch (const std::exception & ex)
      {
        ++err_count;
        std::cout << dir_itr->path() << " " << ex.what() << std::endl;
      }
    }
    std::cout << "\n" << file_count << " files\n"
              << dir_count << " directories\n"
              << other_count << " others\n"
              << err_count << " errors\n";
  }
  else // must be a file
  {
    std::cout << "\nFound: " << p << "\n";    
  }
  return 0;
}


1. Real world example (Mac OS X High Sierra)

  1. Download terminal application from here: https://www.iterm2.com (In theory, any app will do)
  2. Do not install it!
  3. Just run the downloaded file from whenever it was downloaded to
  4. I repeat, do not let MacOS move it to /Applications or anywhere else
  5. Check that mount command gives you something like:
/Users/<user>/Downloads/iTerm.app on /private/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B (nullfs, local, nodev, nosuid, read-only, nobrowse, mounted by iscan)
  1. Leave the app running
  2. ./recursive_ls /var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/ (ids may vary, but the path must end with AppTranslocation)

Output:


In directory: "/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/"

"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B" [directory]
"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B/d" [directory]
"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B/d/iTerm.app" [directory]

... shortened ...

"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B/d/iTerm.app/Contents/Frameworks/ColorPicker.framework/ColorPicker"
"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B/d/iTerm.app/Contents/Info.plist"
"/var/folders/ys/08961svn0qx6_1g9sg6j32sw0000gn/T/AppTranslocation/317B211C-C40C-4662-8947-416AC32DD07B/d/iTerm.app/Contents/PkgInfo"
Assertion failed: ((m_imp.get())&&("attempt to dereference end iterator")), function dereference, file ../../../boost/filesystem/operations.hpp, line 1013.
*Abort trap: 6

2. Artifical example (any linux or MAC with FUSE)

I have been able to reproduce this behaviour relatively closely with fusepy using the attached script.
memoryfs.py.txt

Python3 and pip is required

sudo -s                               # Switch to root
pip3 install fusepy
python3 memoryfs.py mountpoint/ &     # The file memoryfs.py must be downloaded to the current directory
mkdir -p mountpoint/dirxx/dir2
mkdir -p mountpoint/diryy/dir2
./recursive_ls

Output:


usage:   recursive_ls [path]

In directory: "/home/rodney/Projects/boost/libs/filesystem/test"

"/home/rodney/Projects/boost/libs/filesystem/test/mountpoint" [directory]
"/home/rodney/Projects/boost/libs/filesystem/test/mountpoint/dirxx" [directory]
*"/home/rodney/Projects/boost/libs/filesystem/test/mountpoint/dirxx" *Input/output error
"/home/rodney/Projects/boost/libs/filesystem/test/mountpoint/dirxx" [directory]
recursive_ls: ../../../boost/filesystem/operations.hpp:1013: boost::filesystem::directory_entry& boost::filesystem::directory_iterator::dereference() const: Assertion `(m_imp.get())&&("attempt to dereference end iterator")' failed.
Aborted

Analysis

In both cases, the opendir passes and readdir (or readdir_r) provides first few entries of the nested directory. After a few entries, readdir(_r) fails with an error.

The following condition gets activated:

void directory_iterator_increment(directory_iterator& it,
    system::error_code* ec)
	
	/* .... */
	
	 if (increment_ec)  // happens if filesystem is corrupt, such as on a damaged optical disc
        {
          boost::intrusive_ptr< detail::dir_itr_imp > imp;
          imp.swap(it.m_imp);
          path error_path(imp->dir_entry.path().parent_path());  // fix ticket #5900
          if (ec == 0)
            BOOST_FILESYSTEM_THROW(
              filesystem_error("boost::filesystem::directory_iterator::operator++",
                error_path,
                increment_ec));
          *ec = increment_ec;
          return;
        }

which is then handled here:

  void recur_dir_itr_imp::increment(system::error_code* ec)
  {
    /* ..... */

    //  Do the actual increment operation on the top iterator in the iterator
    //  stack, popping the stack if necessary, until either the stack is empty or a
    //  non-end iterator is reached.
    while (!m_stack.empty())
    {
      directory_iterator& it = m_stack.top();
      detail::directory_iterator_increment(it, ec);
      if (ec && *ec)     /// <<<<<----------------- here
        return;
      if (it != directory_iterator())
        break;

      m_stack.pop();
      --m_level;
    }

it equals directory_iterator() at the marked point, but we return too early to progress to another (possibly valid) directory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions