Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Could you please add a option to exclude a folder and also exclude its subfolders? #15159

Open
NamelessUzer opened this issue Apr 5, 2021 · 20 comments
Labels
Area-FileSystem-Provider specific to the FileSystem provider Issue-Enhancement the issue is more of a feature request than a bug KeepOpen The bot will ignore these and not auto-close WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc. WG-NeedsReview Needs a review by the labeled Working Group

Comments

@NamelessUzer
Copy link

NamelessUzer commented Apr 5, 2021

Summary of the new feature/enhancement

As a user I want PowerShell can exclude a folder and also exclude its subfolders and subfiles.

Proposed technical implementation details (optional)

For instance:

Get-ChildItem -Recurse -ExcludeFolderAndItsSubFolders "ExcludeFolder"

Then, the folder "ExcludeFolder" and all its subfolders and all files in it and its subfolders will be excluded.
"ExcludeFolderAndItsSubFolders " is only an example, it's a little long. Of course, you can choose any other better option name.

@NamelessUzer NamelessUzer added Issue-Enhancement the issue is more of a feature request than a bug Needs-Triage The issue is new and needs to be triaged by a work group. labels Apr 5, 2021
@mklement0
Copy link
Contributor

mklement0 commented Apr 5, 2021

That would indeed be useful.

There is a related idea in #4126 - though that isn't obvious from the wording, and the idea isn't fully fleshed out - which has been green-lit, but not implemented:
It suggests making the existing -Include and -Exclude parameters accept paths rather than name-only patterns.

That is, the syntax would be something like:

Get-ChildItem -Recurse -Exclude ExcludeFolder/*

However, that would only include ExcludeFolder and its children, not any more deeply nested items.

To support something open-ended, we'd need something like ** (e.g., ExcludeFolder/**, as used in the Unix world) as a signifier that the entire subtree should be excluded - but that would be a departure from how PowerShell's wildcard patterns work.


So perhaps the two ideas can be combined:

  • Allow path patterns, using the normal matching rules.

  • Additionally introduce -ExcludeRecursive (which I propose in lieu of your -ExcludeFolderAndItsSubFolders suggestion) and -IncludeRecursive parameters.

@237dmitry
Copy link

In my opinion, this is unnecessary. This will complicate the cmdlet. This functionality is very easy to replace with -notmatch and -notlike operators. For example:

(Get-Childitem -Recurse ./.local) -notlike '*powershell*'   # returns fullnames as string[]
(Get-Childitem -Recurse ./.local) -notmatch 'powershell'    # returns FileInfo[], DirectoryInfo[]

But I am not against the idea itself, but for wider functionality of -Exclude and -Include (regex)

@mklement0
Copy link
Contributor

@237dmitry:

  • While use of up-front collection of all output combined with operators is a workaround, it is not a satisfying solution, especially if you want to walk large hierarchies in a streaming fashion and if you want to exclude large subtrees to begin with (rather than enumerating them and excluding them after the fact).

  • The proposed new parameters may complicate the implementation, but they are easy to conceptualize and cover an important use case.

  • Certainly, regex patterns would give us more power and flexibility (at the expense of increased pattern-language complexity). However, given that PowerShell has no regex literals (it uses strings to represent regexes), also supporting regexes would require parallel parameters (e.g., -Exclude vs. -ExcludeRegex) - and that sounds like a problematic complication of the cmdlet.

@jborean93
Copy link
Collaborator

Personally I agree with @237dmitry where this cmdlet is already complex enough and adding even more parameters is just going to make things worse. I don't see how having an -Exclude or -ExcludeRegex parameter would be beneficial over a simple | Where-Object Name -ne 'Blah', you can even use -match to match against a regex. The latter is highly customizable and already fit the existing PowerShell paradigms and providers don't support a way to natively exclude values in their underlying implementations so there are no native benefits by filtering that far left.

@mklement0
Copy link
Contributor

mklement0 commented Apr 5, 2021

@jborean93, I definitely agree that a -ExcludeRegex parameter is not called for - I used it to illustrate that it would be the only way to bring regex matching to Get-ChildItem and is therefore not a good idea.

-Exclude and -Include are existing, wildcard-based parameters that currently:

  • work on an item's name only
  • at every level of the subtree being traversed with -Recurse.

What this doesn't give you is a simple way to say "exclude all node_modules subfolders and their subtrees", and that is what this suggestion is about - and it seems worthwhile to me:

  • because it is a common use case

  • and something like -ExcludeRecursive node_modules is the most straightforward way to express this intent - no post-processing workarounds with additional patterns needed, and it also prevents unnecessary enumeration at the source.

@NamelessUzer
Copy link
Author

That would indeed be useful.

There is a related idea in #4126 - though that isn't obvious from the wording, and the idea isn't fully fleshed out - which has been green-lit, but not implemented:
It suggests making the existing -Include and -Exclude parameters accept path rather than name-only patterns.

That is, the syntax would be something like:

Get-ChildItem -Recurse -Exclude ExcludeFolder/*

However, that would only include ExcludeFolder and its children, not any more deeply nested items.

To support something open-ended, we'd need something like ** (e.g., ExcludeFolder/**, as used in the Unix world) as a signifier that the entire subtree should be excluded - but that would be a departure from how PowerShell's wildcard patterns work.

So perhaps the two ideas can be combined:

  • Allow path patterns, using the normal matching rules.
  • Additionally introduce -ExcludeRecursive (which I propose in lieu of your -ExcludeFolderAndItsSubFolders suggestion) and -IncludeRecursive parameters.

Your understanding is absolutely what I want, Thank you.

@NamelessUzer
Copy link
Author

In my opinion, this is unnecessary. This will complicate the cmdlet. This functionality is very easy to replace with -notmatch and -notlike operators. For example:

(Get-Childitem -Recurse ./.local) -notlike '*powershell*'   # returns fullnames as string[]
(Get-Childitem -Recurse ./.local) -notmatch 'powershell'    # returns FileInfo[], DirectoryInfo[]

But I am not against the idea itself, but for wider functionality of -Exclude and -Include (regex)

Suppose you are in a folder "PowerShell\blah\blah", your code will output nothing.

@237dmitry
Copy link

Suppose you are in a folder "PowerShell\blah\blah", your code will output nothing.

Suppose that you will create a specific regular expression for a specific case.

@NamelessUzer
Copy link
Author

Suppose you are in a folder "PowerShell\blah\blah", your code will output nothing.

Suppose that you will create a specific regular expression for a specific case.

Create it, please.

@237dmitry
Copy link

Create it, please.

What subfolder have to be excluded? blah\powershell? What is the structure?

<#
Path\To\Powershell\2                        
Path\To\Powershell\blah                     
Path\To\Powershell\2\test                   
Path\To\Powershell\blah\blah                
Path\To\Powershell\blah\blah\powershell     
Path\To\Powershell\blah\blah\powershell\test
#>
((Get-Childitem PowerShell -Recurse) -notmatch '.+\\powershell.+powershell\\?').FullName
<#
Path\To\Powershell\2        
Path\To\Powershell\blah     
Path\To\Powershell\2\test   
Path\To\Powershell\blah\blah
#>

@NamelessUzer
Copy link
Author

Create it, please.

What subfolder have to be excluded? blah\powershell? What is the structure?

<#
Path\To\Powershell\2                        
Path\To\Powershell\blah                     
Path\To\Powershell\2\test                   
Path\To\Powershell\blah\blah                
Path\To\Powershell\blah\blah\powershell     
Path\To\Powershell\blah\blah\powershell\test
#>
((Get-Childitem PowerShell -Recurse) -notmatch '.+\\powershell.+powershell\\?').FullName
<#
Path\To\Powershell\2        
Path\To\Powershell\blah     
Path\To\Powershell\2\test   
Path\To\Powershell\blah\blah
#>

Thank you. You are right. It's possible to implement it in your way.
The reason for this demand is that I was writing a script recently that can format the filename in a directory but need to recursively exclude some directories, such as: .vs, .git, .github and etc. But I didn't find an easy way to achieve it.

@iSazonov
Copy link
Collaborator

iSazonov commented Apr 6, 2021

We could add a recurse predicate. De-facto we have it internally (for depth control) but we could have it as public parameter - a scriptblock.

@iSazonov iSazonov added WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc. labels Apr 6, 2021
@mklement0
Copy link
Contributor

To make the regex for the given example fully robust and cross-platform, it would have to be something like:

-notmatch '(^|[\\/])powershell[\\/](.+[\\/])?powershell([\\/]|$)'

That said, that example would not - and, to me, need not - be addressed by -ExcludeRecursive, because for such special needs post-processing is indeed the right answer.

To emulate what -ExcludeRecursive powershell would do, the regex is a bit simpler:

-notmatch '(^|[\\/])powershell([\\/]|$)'

but that:

  • is still a lot of ceremony
  • doesn't filter at the source.

As an aside: the difference in behavior between -like and -match is troubling - see #15172

@mklement0
Copy link
Contributor

And if the subfolder(s) to exclude in the example is at a known level of the hierarchy, in combination with #4126, the solution would be -ExcludeRecursive */*/*/*/*/powershell
You could argue that the recursive (subtree) exclusion should be implied if you use a wildcard path.
Note that since PowerShell allows interchangeable use of \ and / in paths, there is no cross-platform concern.

@iSazonov
Copy link
Collaborator

iSazonov commented Apr 7, 2021

I suggest more generic way:

Get-ChildItem -RecurseCondition <ScriptBlock gets DirectoryInfo in $Item> -RecurseFilter <ScriptBlock gets FileInfo in $Item>

This is greatly flexible.

@mklement0
Copy link
Contributor

mklement0 commented Apr 7, 2021

I think in this case @jborean93's concern about trying to pack too much functionality into a single cmdlet applies (I also don't understand how that would address the exclude-subtree use case - perhaps you can elaborate).

What I - and @ssfjhh, I presume - had in mind was a simple extension of existing functionality that is based on the simplicity of name literals and wildcards (and potentially paths).

I see the following as the primary use cases of -Exclude / -Include in combination with Get-ChildItem -Recurse (without the -Recurse, the behavior of these parameters is unfortunate - see #3304):

  • [Already available] Get files matching a name (pattern) across all levels of a subtree hierarchy:
Get-ChildItem -Recurse -Include *.json
  • [Already available] Get directories matching a name (pattern) across all levels of a subtree hierarchy:
Get-ChildItem -Recurse -Directory -Include node_modules

Note: Using -Filter rather than -Include is actually more efficient and noticeably faster, because it let's the system APIs do the filtering, but -Include is still necessary if you want to (a) take advantage of PowerShell's [...] wildcard construct and/or (b) you need to specify multiple patterns.

  • What's missing is the ability to fundamentally, up front, include / exclude directories and their subtrees, before potentially applying additional filtering (with -Include, -Exclude, -Filter and/or -File / Directory):

    • -ExcludeRecursive would exclude any matching directories and all their contents (subtrees).

    • -IncludeRecursive would limit further processing only to the items in matching directories and their subtrees.

Note: Perhaps -ExcludeSubtree, -IncludeSubtree would be clearer, but so far we haven't used that terminology elsewhere, from what I can tell.

For instance, the following would then allow you to list all *.js and *.ts files in a directory subtree, while excluding any obj, bin, and node_modules directories from the search:

 Get-ChildItem -Recurse -Include *.js, *.ts -ExcludeRecursive obj, bin, node_modules

@daxian-dbw daxian-dbw removed the Needs-Triage The issue is new and needs to be triaged by a work group. label Jun 24, 2021
@daxian-dbw
Copy link
Member

The Engine Working Group discussed this issue today. We believe this is a reasonable ask -- it would be nice to make it easier to exclude/include a folder completely with Get-ChildItem:

  • We prefer to reuse the -Include and -Exclude parameters instead of introducing new ones.
  • The proposal we had is to have -Include and -Exclude support the Git pathspecs, like in a .gitignore file.
  • Supporting pathspecs means we will need to support something like a/**/b. This obviously doesn't work with the current globbing in PowerShell, since a/**/b is treated as a/*/b in PowerShell wildcards today. Changing the semantics of a/**/b would be a breaking change, but it likely falls into bucket 3 (unlikely grey area).
    • As for how to support a pathspecs syntax like a/**/b, since -Include and -Exclude parameters are directly handled by the underlying provider, a possible option is to have the syntax a/**/b directly supported by a provider, so that we can skip the globbing and just pass the literal string to the provider.
    • An alternate option is to change how globing works to support the a/**/b syntax, but that could be tricky (hard to do) and risky (regressions).

@iSazonov
Copy link
Collaborator

From my experience with #12834:

  • we must fix a lot of existing issues in providers and, principally, add a lot of new tests to avoid regression.
  • we could consider Git pathspecs for Path parameters and generic globbing but not for Include/Exclude parameters since the parameters are provider specific.
  • implementing a/**/b is very-very expensive in Include/Exclude too and I think it makes no sense. It is globbing syntax and it should be in Globbing.
  • Current Include/Exclude semantic is to filter leaves in output. And there are huge weird issues how they work (with -Name and -Recurse). We should review and fix the issues. I think we will have to re-design too.
  • Since Include/Exclude are only for leaves switching to apply them to recursion will be huge breaking change especially for cross-platform scenarios - we will be forced to do escaping which is just a nightmare
  • So proposal is to introduce new parameter to control recursion. See Feature Request: Could you please add a option to exclude a folder and also exclude its subfolders? #15159 (comment) as a draft.

@iSazonov iSazonov added Area-FileSystem-Provider specific to the FileSystem provider WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module and removed WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module labels Nov 30, 2021
@mavaddat
Copy link

IMO, @daxian-dbw's suggestion is the golden functionality for Get-ChildItem:

  • […] reuse the -Include and -Exclude parameters instead of introducing new ones.
  • The proposal we had is to have -Include and -Exclude support the Git pathspecs, like in a .gitignore file.

There is no need for new parameter names here. The existing parameters should be able to infer the user's intended functionality based on the specified parameter set and the glob pattern they gave in the param.

Moreover, the current Exclude and Include behaviors are functionally handicapped if not outright broken. They are not worth preserving in their current form, so it makes no sense to avoid stepping on their toes by introducing new parameters with almost identical names.

@jszabo98
Copy link

jszabo98 commented Apr 7, 2024

This is probably an old argument, but to me if a folder is excluded, a user wouldn't expect it to be recursed into. How often has this question been asked on stackoverflow and reddit?

mkdir dir1\dir2\dir3
dir -r -exclude dir2 | % fullname

C:\users\admin\foo\dir1
C:\users\admin\foo\dir1\dir2\dir3

@SteveL-MSFT SteveL-MSFT added KeepOpen The bot will ignore these and not auto-close WG-NeedsReview Needs a review by the labeled Working Group labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-FileSystem-Provider specific to the FileSystem provider Issue-Enhancement the issue is more of a feature request than a bug KeepOpen The bot will ignore these and not auto-close WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc. WG-NeedsReview Needs a review by the labeled Working Group
Projects
None yet
Development

No branches or pull requests

9 participants