Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PathSpec.match_file #12

demurgos opened this issue Aug 3, 2016 · 3 comments


Copy link

@demurgos demurgos commented Aug 3, 2016

I was looking for a Python implementation to match files with the rules defined in .gitignore files and this project is great!
My use case is to synchronize directories across a network and most of the control logic (filter, compare, update) is at the inode level to allow me to maximize the number of skipped elements (to not explore excluded directories for example).
I would like to update my current filter logic to support git patterns: given a list of patterns, is my file path matched or not ? The issue is that currently pathspec seems to be heavily oriented around processing lists of paths, what if I have a single file ?

Here is what my current implementation boils down to:

spec = pathspec.PathSpec.from_lines(pathspec.GitIgnorePattern, patterns)
def match_file(file_path):
    return len(list(spec.match_files([file_path]))) > 0  # This should not be so complicated

is_ignored = match_file(u'')

As you can see, it's pretty cumbersome: I have to create a a collection with a single item, run the matcher and then extract the result.

Ideally, I would imagine that PathSpec exposes a match_file function returning a boolean and match_files (or filter_files since it's currently acting as a filter ?) would just reuse it:

class PathSpec(object):
    # ...

    def match_file(self, file, separators=None):  # Core logic
        norm, path = util.normalize_file(file, separators=separators)  # Single file version
        is_matched = util.match_file(self.patterns, norm)  # Single file version
        return is_matched  # bool

    def match_files(self, files, separators=None):  # Quality of life function: it just replaces a one line generator
        return (file for file in files if self.match_file(file, separators))

Basically, it boils down to the fact the library does not expose single item functions to let me iterate other my files as I want but hides a loop inside every function.
What do you think about adding better support for single file matching ? I am aware that due to the current architecture of the library, it would require some refactoring but I believe that it would be for the best. Could you implement it or should I do it and send a PR (since it's a big change, I'd rather wait for your feedback)

Side note: the real name of the gitignore matcher is wildmatch. How about adding this as an alias name when registering the pattern ? Your module deserves to be better referenced (I had some troubles to find it even if I knew what I was looking for).


This comment has been minimized.

Copy link

@cpburnz cpburnz commented Aug 16, 2016

@demurgos That is quite cumbersome to match files one at a time. In the next release, I'll add PathSpec.match_file. Thanks for pointing out the proper name for the pattern matching git uses for ".gitignore". I tried searching for its name when I started this project, but I came up with nothing.


This comment has been minimized.

Copy link

@cpburnz cpburnz commented Aug 23, 2016

PathSpec.match_file has been implemented, and GitIgnorePattern has been renamed to GitWildMatchPattern. The GitIgnorePattern is still available for backward compatibility.

@cpburnz cpburnz closed this Aug 23, 2016

This comment has been minimized.

Copy link

@demurgos demurgos commented Aug 23, 2016

Ok, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.