Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NuGet pack "The DateTimeOffset specified cannot be converted into a Zip file timestamp" #3793

Merged
merged 25 commits into from Feb 3, 2021

Conversation

erdembayar
Copy link
Contributor

@erdembayar erdembayar commented Dec 10, 2020

Bug

Fixes: Issue#7001
Fixes: Issue#7395

Regression: No

  • Last working version:
  • How are we preventing it in future:

Fix

Details: Running nuget pack .nuspec fails with the message
"The DateTimeOffset specified cannot be converted into a Zip file timestamp."
It happens with nuget 4.6.2 for a .dll file that has LastWriteTime 31/12/1979 23:00:00 +00:00

Reason for fail is ZipFile doesn't accept files with modified date before 1980/1/1 in constructor.

Also this fixes Timestamps of file of packed package is shifted by the timezone #7395
issue.

After fix

Input file for test:
image

Now this file modified in 1959 after packing looks like this:
image

Testing/Validation

Tests Added: Yes
Reason for not adding tests:
Validation:

@erdembayar erdembayar requested a review from a team as a code owner December 10, 2020 20:33
Copy link
Member

@nkolev92 nkolev92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions.

Copy link
Contributor

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor suggested changes but this looks good!

@erdembayar
Copy link
Contributor Author

erdembayar commented Dec 15, 2020

@NuGet/nuget-client

Can you someone refer to what is 'deterministic' flag here? Not sure if I need to take account of it too.

_deterministic = false; // fix in https://github.com/NuGet/Home/issues/8601

if (_deterministic)
{
entry.LastWriteTime = ZipFormatMinDate;
}

@nkolev92
Copy link
Member

Did you follow the link in NuGet/Home#8601?

It links the feature and why it was disabled.

…before 1/1/1980 00:00:00 UTC will be corrected, also any file created after 12/31/2107 23:59:58 UTC will be corrected.
@erdembayar
Copy link
Contributor Author

erdembayar commented Dec 16, 2020

Did you follow the link in NuGet/Home#8601?

It links the feature and why it was disabled.

Ok, just checked that one.

@erdembayar
Copy link
Contributor Author

@nkolev92 @zivkan
Addressed review concerns. Please review again.

@@ -893,7 +911,35 @@ private ZipArchiveEntry CreateEntry(ZipArchive package, string entryName, Compre
private ZipArchiveEntry CreatePackageFileEntry(ZipArchive package, string entryName, DateTimeOffset timeOffset, CompressionLevel compressionLevel)
{
var entry = package.CreateEntry(entryName, compressionLevel);
entry.LastWriteTime = timeOffset;

// Please note: ZipArchive stream reader randomly changes LastWriteTime by another 1 second off than what "entry.LastWriteTime" has, most likely bug on their side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a limitation of the zip format, not a bug.

From https://en.wikipedia.org/wiki/ZIP_%28file_format%29 :

The FAT filesystem of DOS has a timestamp resolution of only two seconds; ZIP file records mimic this. As a result, the built-in timestamp resolution of files in a ZIP archive is only two seconds, though extra fields can be used to store more precise timestamps. The ZIP format has no notion of time zone, so timestamps are only meaningful if it is known what time zone they were created in.

I'd tweak the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkolev92
Wow, how do you know it?
I wish I knew it before spend quite some time on seemingly non-trivial issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Addressed.

@erdembayar
Copy link
Contributor Author

@nkolev92
Please review again.

Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests that pack a file, then extract it, and make sure that the original and extracted files have the same timestamp (and that the CI agent is not on the UTC timezone)?

If not, I don't think we're done. I think we need to modify pack to make sure we're always setting UTC time, and modify the extraction code to make sure when setting the file's timestamp, that the API knows it's UTC, not local, time.

@erdembayar erdembayar force-pushed the dev-eryondon-CannotBeConverted2ZipfileTimestamp branch from f14cac8 to b212126 Compare January 13, 2021 00:02
@erdembayar
Copy link
Contributor Author

@erdembayar can you check if this PR will also fix NuGet/Home#7395?

@zivkan
I tested time zone scenarios in PST and Redmond timezones. Please review again.

@erdembayar
Copy link
Contributor Author

@NuGet/client-team
Ready for review.

}
else if (timeOffset > ZipFormatMaxDate)
{
if (!_zipFormatCorrected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why log only once?
We should log for all files that were adjusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this comment to limit spamming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method WriteFiles iterates the files in the package and calls CreatePart for each. If you create a collection in this method, containing all the filenames with "out of range" dates in this method, then you can create a single message with all the affected files. As an example, see this pack validation rule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zivkan

Limit spamming means omitting repetitive information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I imitated your pattern but end result seems still same.

Currently dotnet.exe pack --no-build output looks like below:
image

       File 'net5.0-windows/10218.dll' was last modified on '1/29/1956', which is out of range of what the zip format supports. Using '1/1/1980' instead.
       File 'netcoreapp3.1/10218.dll' was last modified on '1/29/1957', which is out of range of what the zip format supports. Using '1/1/1980' instead.

Can you elaborate more on removing repetitive information? Is below looks correct approach? But this one has some complexities. Instead of is we need to use are, so we need need to have another string resource specific for purpose. Also how about someone mix 1956 and 2200 files then what message to show (not sure if it's possible scenario)?

        File 'net5.0-windows/10218.dll' was last modified on '1/29/1956', 'netcoreapp3.1/10218.dll' was last modified on '1/29/1957', which are out of range of what the zip format supports. Using '1/1/1980 instead'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other warnings had a block of text, followed by a list of shorter messages for each individual instance. So, you could think about changing the warning to something like:

The zip format supports a limited date range. The following files are outside the supported range:
'file1' changed from '1979-12-31' to '1980-01-01'
'file2' changed from '2108-01-01' to '2107-12-31'

This way we can avoid repeating the "out of range for what the zip format supports" text.

Also, if you were to do this in Visual Studio, you should see only a single entry in the error list, not one per file. I'm not sure if it reduces a list of warnings in a binlong.

Having said this, I was thinking about it over the weekend, and I'm feeling much less strongly about this. Consider a c# syntax error instead. If I have the same syntax error in multiple places in a single file, Roslyn is going to raise one warning per instance, not one warning per file or one warning per project. One difference is that in Visual Studio you can click the warning and go to the line of code, whereas NuGet doesn't. Even if NuGet had the capability to determine a filename and line number, for "go to error" in general, I'm not sure it makes sense for this specific warning. It probably makes sense here to have a single message with a list of affected files.

As mentioned in another comment, we should have an NU warning code, and a docs issue to document the NU code.

Copy link
Contributor Author

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments.

}
else if (timeOffset > ZipFormatMaxDate)
{
if (!_zipFormatCorrected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this comment to limit spamming.

Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR's initial comment wasn't updated to say "fixing: ..." for the additional issue that it fixes and should be closed when this PR is merged.

Copy link
Contributor Author

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zivkan
Thank you for your review. I addressed your comments. Please review again.

}
else if (timeOffset > ZipFormatMaxDate)
{
if (!_zipFormatCorrected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I imitated your pattern but end result seems still same.

Currently dotnet.exe pack --no-build output looks like below:
image

       File 'net5.0-windows/10218.dll' was last modified on '1/29/1956', which is out of range of what the zip format supports. Using '1/1/1980' instead.
       File 'netcoreapp3.1/10218.dll' was last modified on '1/29/1957', which is out of range of what the zip format supports. Using '1/1/1980' instead.

Can you elaborate more on removing repetitive information? Is below looks correct approach? But this one has some complexities. Instead of is we need to use are, so we need need to have another string resource specific for purpose. Also how about someone mix 1956 and 2200 files then what message to show (not sure if it's possible scenario)?

        File 'net5.0-windows/10218.dll' was last modified on '1/29/1956', 'netcoreapp3.1/10218.dll' was last modified on '1/29/1957', which are out of range of what the zip format supports. Using '1/1/1980 instead'.

src/NuGet.Core/NuGet.Packaging/Strings.resx Outdated Show resolved Hide resolved
Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress. Getting very close now.

}
else
{
entry.LastWriteTime = timeOffset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my previous round of reviews, I asked to make sure timeOffset was in the UTC timezone before setting entry.LastWriteTime, but I don't see that it's checked or converted anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case I added .UtcDateTime. But for actual physical files it's already in UTC timezone and it comes from below call stack.

  1. This method calls our above method and passing lastWriteTime.

    var entry = CreatePackageFileEntry(package, entryName, lastWriteTime, CompressionLevel.Optimal, warningMessage);

  2. And this one calls above 1 and passing lastWriteTime.

    CreatePart(
    package,
    file.Path,
    stream,
    lastWriteTime: _deterministic ? ZipFormatMinDate : file.LastWriteTime,

  3. In non-deterministic mode lastWriteTime is file.LastWriteTime which actually reads UTC datetime here:

else if (SourcePath != null)
{
_lastWriteTime = File.GetLastWriteTimeUtc(SourcePath);
return File.OpenRead(SourcePath);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for actual physical files it's already in UTC timezone and it comes from below call stack.

Yes, but NuGet.Packaging is a package that customers can reference in their own projects, and by implementing their own IPackageFile that does not return UTC time, they could encounter problems. You've fixed it by using .UtcDateTime, but my point is that since NuGet is not only a tool, but also a package/SDK, we need to consider all the different possible usage.

@@ -957,6 +992,7 @@ private HashSet<string> WriteFiles(ZipArchive package, HashSet<string> filesWith
}
}

_logger?.Log(PackagingLogMessage.CreateMessage(warningMessage.ToString(), LogLevel.Information));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this before, but we should always have NU warning/error codes. So, you need to create a new one, and create a task in our docs repo to create docs for the new NU code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I created NU5132 warning code. If it's wrong choice of warning code then please correct me.
Also I created follow up document issue.

@@ -20,5 +20,17 @@ internal static class StringFormatter
source,
contentHash);
}

internal static string ZipFileTimeStampModified(
string entry,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
string entry,
string packagePath,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking your suggestion, but prefer filePath. Because packagePath might suggest actual package itself something like C:\Users\xxxxx\.nuget\packages\nuget.versioning, but here we're talking about just one of many files in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 as long as it's not "entry", because someone looking at this method without the context of knowing where it's being used may not undersatnd what "entry" means.

[Theory, InlineData(false), InlineData(true)]
public void EmitRequireLicenseAcceptance_ShouldEmitElement(bool requireLicenseAcceptance)
[Fact]
public void PackageBuilder_CorrectTestWriteTimeAfterYear2107_Succeeds()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (after 2107) looks so similar to the previous test (before 1980) that it's quite difficult to tell them apart. It feels a lot like re-reviewing the same code.

Even though it will have slightly slower execution time, I think it would be better to have a single parameterized test (Theory) that takes fileTime (input) and packageTime (expected) as parameters, creates a nupkg with only a single file, and then asserts that single file's timestamp to the expected packageTime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they're very close. But they usually focus little bit different sides on edge cases, so I prefer to keep as it's.

Before 1980 2 files and after 1980 4 files are tested. More of upside of 1980.
But 2107 4 files and after 2107 2 files are tested. More of downside of 2107.

Of course we remove fine grained edge case testing then we can merge then into 1 unit test using Theory data.

Copy link
Contributor Author

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zivkan
Thank you for your follow up review.
I addressed your review comments and implemented block text warning which you mentioned.

With nuget.exe it's working as expected.
1

But dotnet.exe does own things when dealing with multiline warning text. It breaks each line into new warning. Now I'm wondering whether to keep as it's or revert back.

Here is what is memory:

2

Actual printed text, it breaks down each line into new warning.
3

@@ -957,6 +992,7 @@ private HashSet<string> WriteFiles(ZipArchive package, HashSet<string> filesWith
}
}

_logger?.Log(PackagingLogMessage.CreateMessage(warningMessage.ToString(), LogLevel.Information));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I created NU5132 warning code. If it's wrong choice of warning code then please correct me.
Also I created follow up document issue.

}
else
{
entry.LastWriteTime = timeOffset;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case I added .UtcDateTime. But for actual physical files it's already in UTC timezone and it comes from below call stack.

  1. This method calls our above method and passing lastWriteTime.

    var entry = CreatePackageFileEntry(package, entryName, lastWriteTime, CompressionLevel.Optimal, warningMessage);

  2. And this one calls above 1 and passing lastWriteTime.

    CreatePart(
    package,
    file.Path,
    stream,
    lastWriteTime: _deterministic ? ZipFormatMinDate : file.LastWriteTime,

  3. In non-deterministic mode lastWriteTime is file.LastWriteTime which actually reads UTC datetime here:

else if (SourcePath != null)
{
_lastWriteTime = File.GetLastWriteTimeUtc(SourcePath);
return File.OpenRead(SourcePath);

@@ -20,5 +20,17 @@ internal static class StringFormatter
source,
contentHash);
}

internal static string ZipFileTimeStampModified(
string entry,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking your suggestion, but prefer filePath. Because packagePath might suggest actual package itself something like C:\Users\xxxxx\.nuget\packages\nuget.versioning, but here we're talking about just one of many files in it.

[Theory, InlineData(false), InlineData(true)]
public void EmitRequireLicenseAcceptance_ShouldEmitElement(bool requireLicenseAcceptance)
[Fact]
public void PackageBuilder_CorrectTestWriteTimeAfterYear2107_Succeeds()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they're very close. But they usually focus little bit different sides on edge cases, so I prefer to keep as it's.

Before 1980 2 files and after 1980 4 files are tested. More of upside of 1980.
But 2107 4 files and after 2107 2 files are tested. More of downside of 2107.

Of course we remove fine grained edge case testing then we can merge then into 1 unit test using Theory data.

@@ -957,6 +997,13 @@ private HashSet<string> WriteFiles(ZipArchive package, HashSet<string> filesWith
}
}

var warningMessageString = warningMessage.ToString().Trim();
Copy link
Member

@zivkan zivkan Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think warningMessage.Length -= Environment.NewLine.Length will have better performance than .Trim(). I'm assuming the only reason to call .Trim() is to remove the trailing newline character(s).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Good trick.

}
else
{
entry.LastWriteTime = timeOffset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for actual physical files it's already in UTC timezone and it comes from below call stack.

Yes, but NuGet.Packaging is a package that customers can reference in their own projects, and by implementing their own IPackageFile that does not return UTC time, they could encounter problems. You've fixed it by using .UtcDateTime, but my point is that since NuGet is not only a tool, but also a package/SDK, we need to consider all the different possible usage.

Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the potential product issues I can think of are fixed. It will be really great to fix these time zone issues 👍

{
return string.Format(CultureInfo.CurrentCulture,
Strings.ZipFileTimeStampModifiedWarning,
listOfFileTimeStampModifiedMessages);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the "Writing High Performance .NET Code" book, string.Format is much slower than string.Concat (for which the + operator is syntactic sugar.

Although this method isn't on a hot path, so isn't itself perf-critical, I think it's good to practise these things all the time, so that it's natural when you are working on perf-critical code.

return Strings.ZipFileTimeStampModifiedWarning + Environment.NewLine + listOfFileTimeStampModifiedMessages;, and remove {0} from the resource files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. I bought the book. Currently couldn't make time to read it.

@erdembayar
Copy link
Contributor Author

@nkolev92

Do you have any concern if I merge this PR? Please check my comment for block text warning output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants