Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directories are not copied in export() function #663

Open
paul-demo opened this issue Jan 11, 2024 · 4 comments
Open

Directories are not copied in export() function #663

paul-demo opened this issue Jan 11, 2024 · 4 comments

Comments

@paul-demo
Copy link

paul-demo commented Jan 11, 2024

I found that only directories (empty, with none of the expected contents) are copied, due to what I think is a bug in capi2/core.py:

if not os.path.exists(dst) or not cmp(src, dst):

From lines 108 to 125 you have the following, so I assume you intend to support directories (since you handle the case that it is a directory and fall back to copytree()).

            if not os.path.isabs(f):
                if os.path.exists(os.path.join(self.core_root, f)):
                    src = os.path.join(self.core_root, f)
                elif os.path.exists(os.path.join(self.files_root, f)):
                    src = os.path.join(self.files_root, f)
                else:
                    _dirs = self.core_root
                    if self.files_root != self.core_root:
                        _dirs += " or " + self.files_root
                    raise RuntimeError(f"Cannot find {f} in {_dirs}")

                dst = os.path.join(dst_dir, f)
                # Only update if file is changed or doesn't exist
                if not os.path.exists(dst) or not cmp(src, dst):
                    try:
                        shutil.copy2(src, dst)
                    except IsADirectoryError:
                        shutil.copytree(src, dst, dirs_exist_ok=True)

However, filecmp.cmp() is not the right function to use for comparing directories. You would need to detect whether it is a file or directory and instead you should probably use filecmp.dircmp() if a directory. Or, you could skip the comparison entirely and just copy it over!

See here: https://docs.python.org/3/library/filecmp.html

@paul-demo
Copy link
Author

Actually, I suppose giving two directories to cmp would return false: filecmp.cmp('dir1', 'dir2'), so maybe that isn't my problem. I basically see the directories created, which is happening on lines 94-96, but then there is no file content inside those directories.

        dirs = list(set(map(os.path.dirname, src_files)))
        for d in dirs:
            os.makedirs(os.path.join(dst_dir, d), exist_ok=True)

@olofk
Copy link
Owner

olofk commented Jan 12, 2024

There might very well be a bug, but I would like to get a better understanding of what you're trying to do. Can you describe your use-case?

For some context, I'm philosophically against copying directories as that will break dependency tracking that we use to avoid unnecessary rebuilds (in the Flow API). But IIRC there was someone with a need to use copyto to get a directory of data files into the build tree who came up with a patch to support that, and I agreed to merge that. But it's really not intended to work for generally exporting directories.

@paul-demo
Copy link
Author

paul-demo commented Jan 12, 2024

We have an existing method to build documentation from a potentially large doc/ directory. You can imagine this could be a website, Latex document, Markdown, etc. In this case it happens to be a Jupyter book, but that's besides the point. The point is that there could a large number of files in the doc/ directory and they all get built into some consumable document by some tool. It would be nice to just specify the whole directory as a dependency and not have to list every individual file, because that is very likely to result in people forgetting to add files to the list and we then have missing sections of the documentation.

The tool that builds documentation from the directory of source files doesn't actually accept a list of source files. It accepts a directory name and basically crawls every file in the directory. We currently do this outside FuseSoC file discovery/manipulation. While I can provide the directory name in a core file; I'm finding that the directory contents in the build area are all missing, though the tree of all directories is recreated (just with every file missing). We actually don't currently use FuseSoC's fileset for this (and this bug is filed because when I tried to do so, I found that it didn't work with directories).

It's also possible that we aren't using export() and some other piece of our code is failing to copy over directory contents, so it could be unrelated. One possibility is that while FuseSoC's code above attempts to support directories in the export() function, in the rest of FuseSoC functionality like making lists of files in filelists it does not crawl the directory and expand into a list of contained files; it might just report the name of the directory. It's unclear to me why the directory tree is recreated but none of the contents.

Ideally, I should be able to make a filelist with a directory as the contents, and I would assume based on the code above that this would be supported (even though filecmp.cmp() is the not the right function for directories, it returns false when you feed it two directory paths, so the copy should still happen, just it might get invoked unnecessarily).

filesets:
  documentation_files:
    files:
      - doc
    file_type: directory

@olofk
Copy link
Owner

olofk commented Jan 25, 2024

I see your use case. I'm not saying no right away, but I think we would need to carefully analyze this, as it has severe impact on some fundamental design choices.

I do wonder if adding a copyto directive would sort things out for your particular case, since that demonstrably already works with directories. The difference is that your directory would end up directly under the work directory instead of the source directory, but perhaps that's not an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants