Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Include/Exclude filtering capability to Unzip Task (#5169) #6018

Merged
merged 19 commits into from Feb 6, 2021

Conversation

@IvanLieckens
Copy link
Contributor

@IvanLieckens IvanLieckens commented Jan 11, 2021

Fixes #5169

Context

See #5169

Changes Made

Unzip Task now has "Include" and "Exclude" optional properties to pass a pattern to filter archive entries to be unzipped.

Testing

Added following tests:

  • CanUnzip_WithIncludeFilter
  • CanUnzip_WithExcludeFilter
  • CanUnzip_WithIncludeAndExcludeFilter

These 3 test the ability to include/exclude files from the archive unzip.

Notes

Unable to translate the resources to all languages. Can someone provide guidance/translations?

@rainersigwald
Copy link
Contributor

@rainersigwald rainersigwald commented Jan 11, 2021

Don't worry about the failing PR builds--I believe #6019 will fix that.

Unable to translate the resources to all languages. Can someone provide guidance/translations?

Thanks, but no need to worry! A Microsoft-funded team will do the translations. Details in the loc doc if you're interested.

@rainersigwald
Copy link
Contributor

@rainersigwald rainersigwald commented Jan 11, 2021

/azp run

@azure-pipelines
Copy link

@azure-pipelines azure-pipelines bot commented Jan 11, 2021

Azure Pipelines successfully started running 1 pipeline(s).
Copy link
Member

@Forgind Forgind left a comment

LGTM!

@@ -2792,6 +2792,9 @@
<data name="Unzip.DidNotUnzipBecauseOfFileMatch">
<value>Did not unzip from file "{0}" to file "{1}" because the "{2}" parameter was set to "{3}" in the project and the files' sizes and timestamps match.</value>
</data>
<data name="Unzip.DidNotUnzipBecauseOfFilter">
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value>

This comment has been minimized.

@Forgind

Forgind Jan 13, 2021
Member

nit:

Suggested change
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value>
<value>Did not unzip file "{0}" because it didn't match the include or because it matched the exclude filter.</value>

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 14, 2021
Author Contributor

No problem, I have applied it in the new commits

using (TestEnvironment testEnvironment = TestEnvironment.Create())
{
TransientTestFolder source = testEnvironment.CreateFolder(createFolder: true);
TransientTestFolder destination = testEnvironment.CreateFolder(createFolder: false);

This comment has been minimized.

@Forgind

Forgind Jan 13, 2021
Member

Would you mind modifying this test to use wildcards such that two are included, of which one is also excluded; a third is also excluded; and a fourth isn't excluded or included?

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 14, 2021
Author Contributor

Not at all, I hope this new version is more what you were looking for?

@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso
public Unzip() { }
[Microsoft.Build.Framework.RequiredAttribute]
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } }
public string Exclude { get { throw null; } set { } }
public string Include { get { throw null; } set { } }

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 15, 2021
Member

If we stick with regular expressions for this, we should change the names for Exclude and Include as they are "default" msbuild names. ie. items are added via Include and the patterns differ between them and regular expressions. A suggested name during PR review was IncludePattern & ExcludePattern

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 20, 2021
Author Contributor

This makes sense, the functionality differs indeed. I have pushed the change.

This comment has been minimized.

@Forgind

Forgind Jan 20, 2021
Member

@BenVillalobos, I thought we'd agreed globs were more MSBuild-y than regex? Confused by your comment.

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 20, 2021
Member

We did, I posted this while we were talking about it, hence the "If we stick with regular expressions." The current implementation is still regex, is this how we process includes and excludes normally?

This comment has been minimized.

@Forgind

Forgind Jan 20, 2021
Member

No; we use globs.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 21, 2021
Author Contributor

@BenVillalobos ok, thank you for the clearing up. Is there a good example in MSBuild of how you'd like this to function I can use as a lead for the implementation? Just want to make sure that it feels native without any quirks.

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 21, 2021
Member

Check out Expander.cs, ExpandIntoItemsLeaveEscaped may have what you're looking for.

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 22, 2021
Member

ItemSpec.cs is also relevant here.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 26, 2021
Author Contributor

Thank you @BenVillalobos , I studied Expander and ItemSpec but they were outside of reach for the Tasks assembly and I did not want to introduce dependencies so I leveraged FileMatcher to handle the normalization and verification of the globs and paths. It does not support property references due to this. I hope this matches with your expected behavior.

This comment has been minimized.

@cdmihai

cdmihai Jan 29, 2021
Contributor

MSBuildGlob would have been great here, but unfortunately it's not visible by tasks. :(

Copy link
Member

@BenVillalobos BenVillalobos left a comment

The PR overall looks good, pending the switch from regex to globs.

@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso
public Unzip() { }
[Microsoft.Build.Framework.RequiredAttribute]
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } }
public string Exclude { get { throw null; } set { } }
public string Include { get { throw null; } set { } }

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 20, 2021
Member

So we're in agreement 🙂

@IvanLieckens the to-do list here would be:

  • Rename ExcludePattern and IncludePattern back to Exclude and Include (sorry)
    • We wanted ExcludePattern if we were sticking with RegEx, but it turns out we don't want to use regex here
  • Change the implementation to expect * parse a glob pattern
@@ -31,6 +31,8 @@ internal class FileMatcher
private static readonly char[] s_wildcardCharacters = { '*', '?' };
private static readonly char[] s_wildcardAndSemicolonCharacters = { '*', '?', ';' };

private static readonly string[] s_propertyReferences = { "$(", "@(" };

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 27, 2021
Member

Suggested change
private static readonly string[] s_propertyReferences = { "$(", "@(" };
private static readonly string[] s_propertyAndItemReferences = { "$(", "@(" };

Items are referred to with an @ and properties with $.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 28, 2021
Author Contributor

Thanks, I just copied from the old name of the method. I adjusted the naming now to these.

/// <summary>
/// Determines whether the given path has any property references.
/// </summary>
internal static bool HasPropertyReferences(string filespec)

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 27, 2021
Member

Suggested change
internal static bool HasPropertyReferences(string filespec)
internal static bool HasPropertyOrItemReferences(string filespec)

Another option would be to create a HasPropertyReferences that checks for $( and a separate HasItemReferences that checks for @(. Though I don't feel too strongly about that extra suggestion since filematcher already has something like s_wildcardAndSemicolonCharacters and HasWildcardsOrSemicolon.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 28, 2021
Author Contributor

Yeah, I split up the existing method that was already there to allow more granular calling and identification. But I didn't want to introduce too many new methods if they weren't needed.

/// </summary>
internal static bool HasWildcardsOrSemicolon(string filespec)
{
return -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters);

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 27, 2021
Member

Is there any significant perf difference between using Aggregate and -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters)?

This comment has been minimized.

@Forgind

Forgind Jan 27, 2021
Member

Were you thinking Aggregate with a function like Aggregate(false, (acc, ch) => acc || s_wildcardAndSemicolonCharacters.Contains(ch))? That would almost certainly be slower than this.

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 28, 2021
Member

I was thinking the difference between s_propertyReferences.Aggregate(false, (current, propertyReference) => current | filespec.Contains(propertyReference)); and -1 != filespec.LastIndexOfAny(s_propertyReferences);

If the former is more efficient, we can change HasWildcardsOrSemicolon to do the same.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 28, 2021
Author Contributor

I believe LastIndexOf is going to be faster but I had to use the aggregate for the other because the s_wildcardAndSemicolonCharacters are char[] and the s_propertyReferences are string[]. Would need to setup a microbenchmark to validate.

This comment has been minimized.

@BenVillalobos

BenVillalobos Jan 28, 2021
Member

Ah! I didn't realize that we couldn't replicate what was already there. This looks fine to me 👍

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 29, 2021
Author Contributor

Forgind improved it further now to use Any() :)

src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
patterns = pattern.Contains(';')
? pattern.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Select(FileMatcher.Normalize).ToArray()
: new[] { pattern };
if (patterns.Any(p => p.IndexOfAny(Path.GetInvalidPathChars()) != -1))

This comment has been minimized.

@Forgind

Forgind Jan 28, 2021
Member

nit:
Move this before the split?

This comment has been minimized.

@Forgind

Forgind Jan 28, 2021
Member

This is very strange...the docs for Path.GetInvalidPathChars explicitly says that "on Windows-based desktop platforms, invalid path characters might include...less than (<), greater than (>), pipe (|),..." yet trying that out on my Windows-based desktop platform, it didn't. I submitted an issue about it: dotnet/dotnet-api-docs#5292

I'd use FileUtilities.InvalidPathChars and modify the tests to target | or a character 1-31 and make it the same across all platforms.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 29, 2021
Author Contributor

That's very interesting, I changed to using the FileUtilities.InvalidPathChars and added a pipe in the name in the test whilst removing the platform specific flag.

src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
@@ -212,5 +275,41 @@ private bool ShouldSkipEntry(ZipArchiveEntry zipArchiveEntry, FileInfo fileInfo)
&& zipArchiveEntry.LastWriteTime == fileInfo.LastWriteTimeUtc
&& zipArchiveEntry.Length == fileInfo.Length;
}

private bool ParseIncludeExclude()

This comment has been minimized.

@Forgind

Forgind Jan 28, 2021
Member

Why does this return anything? Looking at ParsePattern below, it either throws an error or returns true. That means that this either throws an error or returns true. I was momentarily confused when I thought it was skipping everything if there were no include/exclude present, so I'd just remove that bit.

This comment has been minimized.

@IvanLieckens

IvanLieckens Jan 29, 2021
Author Contributor

Ok, I removed the return and swapped to using Log.HasLoggedErrors to jump out preventing further execution when the Task is misconfigured.

@IvanLieckens IvanLieckens changed the title Added Include/Exclude RegEx filtering capability to Unzip Task (#5169) Added Include/Exclude filtering capability to Unzip Task (#5169) Jan 29, 2021
Copy link
Member

@Forgind Forgind left a comment

LGTM! Thanks for bearing with us on this!

@@ -204,7 +204,7 @@ internal static bool HasWildcardsSemicolonItemOrPropertyReferences(string filesp
/// </summary>
internal static bool HasPropertyOrItemReferences(string filespec)
{
return s_propertyAndItemReferences.Any(ref=> filespec.Contains(ref));
return s_propertyAndItemReferences.Any(filespec.Contains);

This comment has been minimized.

@Forgind

Forgind Jan 29, 2021
Member

👍 Nice!

src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
IvanLieckens and others added 2 commits Jan 29, 2021
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
@Forgind Forgind merged commit 70f6767 into dotnet:master Feb 6, 2021
8 checks passed
8 checks passed
license/cla All CLA requirements met.
Details
@azure-pipelines
msbuild-pr Build #20210129.11 succeeded
Details
@azure-pipelines
msbuild-pr (Linux Core) Linux Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Core) Windows Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full Release (no bootstrap)) Windows Full Release (no bootstrap) succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full) Windows Full succeeded
Details
@azure-pipelines
msbuild-pr (macOS Core) macOS Core succeeded
Details
@azure-pipelines
msbuild-pr (macOS Mono) macOS Mono succeeded
Details
@Forgind
Copy link
Member

@Forgind Forgind commented Feb 6, 2021

Thanks @IvanLieckens!

@IvanLieckens IvanLieckens deleted the IvanLieckens:feature/UnzipFiltering branch Feb 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

6 participants