Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directories should be ignored even without trailing slash #43

Open
JulienPalard opened this issue Mar 24, 2023 · 5 comments
Open

directories should be ignored even without trailing slash #43

JulienPalard opened this issue Mar 24, 2023 · 5 comments

Comments

@JulienPalard
Copy link
Contributor

To put it in code, I think that this one shall pass:

    def test_ignore_directory_no_slash(self):
        matches = _parse_gitignore_string('.venv', fake_base_dir='/home/michael')
        self.assertTrue(matches('/home/michael/.venv'))
        self.assertTrue(matches('/home/michael/.venv/folder'))
        self.assertTrue(matches('/home/michael/.venv/file.txt'))

Because according to man gitignore:

$ man gitignore | grep 'If there is a separator at the end of the pattern'
       •   If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.
@JulienPalard
Copy link
Contributor Author

Tried to compare the tests against git check-ignore and found some inconsistencies, see:

https://github.com/JulienPalard/gitignore_parser/blob/mdk-against-git/tests.py

Out of 4 inconsistencies, two are directly linked to this issue.

I think it could be nice to keep the "test against git" part to ensure all the tests are always consistent with git (but my branch just tests against git, not against the code, it obviously can't be used as is, it was just a one-shot thing).

@pacha
Copy link

pacha commented Nov 9, 2023

That's right. And it is not only and end slash. If there's no slash anywhere in the pattern, then the pattern is matched against every part of the path, which is not the case otherwise.
I was fighting with this while creating my own package. I implemented @JulienPalard's idea and it was a game changer. Now, all tests are checked against git itself, which digged up a crazy amount of missed edge cases. In case it helps, this is how it ended up looking (unfortunately is not unittest code but pytest, but reusable in any case):
https://github.com/pacha/py-walk/blob/main/tests/conftest.py#L54-L79

@Fs02
Copy link

Fs02 commented Dec 3, 2023

I created a kotlin port from this library and also encountered the same issue

I fixed it like this (Fs02/gitignore-parser@08b83f1), although I'm not sure if this will break other things or not

@corinfinite
Copy link

I believe the issue lies in the fnmatch_pathname_to_regex function. Specifically, the problem is with the handling of directory-only patterns when the pattern does not end with a slash.

I found the following worked: modify the fnmatch_pathname_to_regex function to handle directory-only patterns without a trailing slash:

def fnmatch_pathname_to_regex(
    pattern, directory_only: bool, negation: bool, anchored: bool = False
):
    # ... (previous code remains the same)

    if anchored:
        res.insert(0, '^')
    else:
        res.insert(0, f"(^|{seps_group})")
    
    if directory_only and not pattern.endswith('/'):
        res.append(f"($|{seps_group}.*)")
    elif not directory_only:
        res.append('$')
    elif directory_only and negation:
        res.append('/$')
    else:
        res.append('($|\\/)')
    
    return ''.join(res)

In this modified version, we check if the pattern is directory-only and does not end with a slash. If that's the case, we append ($|{seps_group}.*) to the regex pattern. This allows the pattern to match either the end of the string or a path separator followed by any characters.

Update the rule_from_pattern function to set directory_only to True when the pattern does not end with a slash:

def rule_from_pattern(pattern, base_path=None, source=None):
    # ... (previous code remains the same)

    directory_only = pattern[-1] == '/' or '/' not in pattern
    
    # ... (remaining code stays the same)

This isn't a proper PR but it works for what I needed it to and can hopefully inform a proper fix.

@Illegal-Services
Copy link

UP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants