Skip to content

UTC Date not converted to Local if in Folder #422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
philip-firstorder opened this issue Jun 25, 2019 · 18 comments
Closed

UTC Date not converted to Local if in Folder #422

philip-firstorder opened this issue Jun 25, 2019 · 18 comments

Comments

@philip-firstorder
Copy link

philip-firstorder commented Jun 25, 2019

I archived a a file with modified date 2019-06-25 11:58:57.201 +0000 UTC using https://golang.org/pkg/archive/zip/

If I put the file inside a folder and the folder is archived, then the UTC date is not converted to Local Date when unarchiving, to reproduce you can unarchive this png.
Bad Date.zip

Bad UTC Date

But if the file is directly archived in the root, then the date is correctly converted to Local Date.
Correct Date.zip

The correct time using my Germany local +2 Timezone should be 13:58, but if the file is inside a folder it remains 11:58 when unarchived, seems a bug to me.

$ zipinfo -v "Bad Date.zip"

Archive:  Bad Date.zip
There is no zipfile comment.

End-of-central-directory record:
-------------------------------

  Zip archive file size:                     30923 (00000000000078CBh)
  Actual end-cent-dir record offset:         30898 (00000000000078B2h)
  Expected end-cent-dir record offset:       30898 (00000000000078B2h)
  (based on the length of the central directory and its expected offset)

  This zipfile constitutes the sole disk of a single-part archive; its
  central directory contains 2 entries.
  The central directory is 189 (00000000000000BDh) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 30709 (00000000000077F5h).


Central directory entry #1:
---------------------------

  Folder/

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             none (stored)
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2019 Jun 25 16:45:54
  file last modified on (UT extra field modtime): 2019 Jun 25 18:45:54 local
  file last modified on (UT extra field modtime): 2019 Jun 25 16:45:54 UTC
  32-bit CRC value (hex):                         00000000
  compressed size:                                0 bytes
  uncompressed size:                              0 bytes
  length of filename:                             7 characters
  length of extra field:                          9 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x5455 (universal time) and 5 data bytes.
    The local extra field has UTC/GMT modification time.

  There is no file comment.

Central directory entry #2:
---------------------------

  Folder/firstorder-black.png

  offset of local header from start of archive:   46
                                                  (000000000000002Eh) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             none (stored)
  file security status:                           not encrypted
  extended local header:                          yes
  file last modified on (DOS date/time):          2019 Jun 25 11:58:56
  file last modified on (UT extra field modtime): 2019 Jun 25 13:58:57 local
  file last modified on (UT extra field modtime): 2019 Jun 25 11:58:57 UTC
  32-bit CRC value (hex):                         5c46ba63
  compressed size:                                30581 bytes
  uncompressed size:                              30581 bytes
  length of filename:                             27 characters
  length of extra field:                          9 bytes
  length of file comment:                         45 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x5455 (universal time) and 5 data bytes.
    The local extra field has UTC/GMT modification time.

------------------------- file comment begins ----------------------------
5d013b72409327f33177b5ab/firstorder-black.png
-------------------------- file comment ends -----------------------------
zipinfo -v "Correct Date.zip"

Archive:  Correct Date.zip
There is no zipfile comment.

End-of-central-directory record:
-------------------------------

  Zip archive file size:                     30798 (000000000000784Eh)
  Actual end-cent-dir record offset:         30776 (0000000000007838h)
  Expected end-cent-dir record offset:       30776 (0000000000007838h)
  (based on the length of the central directory and its expected offset)

  This zipfile constitutes the sole disk of a single-part archive; its
  central directory contains 1 entry.
  The central directory is 120 (0000000000000078h) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 30656 (00000000000077C0h).


Central directory entry #1:
---------------------------

  firstorder-black.png

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             none (stored)
  file security status:                           not encrypted
  extended local header:                          yes
  file last modified on (DOS date/time):          2019 Jun 25 11:58:56
  file last modified on (UT extra field modtime): 2019 Jun 25 13:58:57 local
  file last modified on (UT extra field modtime): 2019 Jun 25 11:58:57 UTC
  32-bit CRC value (hex):                         5c46ba63
  compressed size:                                30581 bytes
  uncompressed size:                              30581 bytes
  length of filename:                             20 characters
  length of extra field:                          9 bytes
  length of file comment:                         45 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x5455 (universal time) and 5 data bytes.
    The local extra field has UTC/GMT modification time.

------------------------- file comment begins ----------------------------
5d013b72409327f33177b5ab/firstorder-black.png
-------------------------- file comment ends -----------------------------
@aonez aonez self-assigned this Jun 27, 2019
@aonez aonez added this to the Look at milestone Jun 27, 2019
@aonez aonez added the zip label Jun 27, 2019
@philip-firstorder
Copy link
Author

From what I read from the specifications the DOS date/time of .zip files CANNOT store locale information, so whatever you find there should be assumed to be local time and should NOT be modified when unarchiving.

However there are 2 extra fields that can be set, see the go code here

So if you have this in your archive then the third extra timestamp UTC time should be used instead of the DOS time, then convert the date depending on the locale of the client.

  file last modified on (DOS date/time):          2019 Jun 25 11:58:56
  file last modified on (UT extra field modtime): 2019 Jun 25 13:58:57 local
  file last modified on (UT extra field modtime): 2019 Jun 25 11:58:57 UTC

The thing is you already seem do this for files in the root of the archive, but not for the ones inside Folders.

@aonez aonez modified the milestones: Look at, Future Jun 27, 2019
@aonez
Copy link
Owner

aonez commented Jun 27, 2019

Good catch! Thanks @philip-firstorder. Due to differences in the headers, the "Correct.Date.zip" file is handled with p7zip and uses the correct date while the "Bad.Date.zip" file is handled with unar and uses the first date.

This should be fixed here: https://github.com/MacPaw/XADMaster

I'll try to check the code.

@aonez
Copy link
Owner

aonez commented Jun 28, 2019

I'm seeing that the normal behavior is to set the local time as the DOS date time. In this case the Go library is setting this as UTC, so probably this might also be fixed.

@aonez
Copy link
Owner

aonez commented Jun 28, 2019

I think the issue here might be the extra field length is incorrect, but I'm still checking. So the extra field is properly informed. It has 9 bytes and only contains the extended timestamp. XADMaster requires an extended field bigger that 9 bytes so it does not read that one.

@philip-firstorder
Copy link
Author

I managed to set the DOS time in local time like this, and the go library also sets the Expanded flags accordingly.

// Modified date should be set in LOCAL format of the hosting server
// This will be converted by the archive/zip package to 3 formats:
// file last modified on (DOS date/time):          2019 Jun 25 18:46:36 (up to 2 seconds loss because of DOS conversion)
// file last modified on (UT extra field modtime): 2019 Jun 25 18:46:37 local (ignored by some unzip software)
// file last modified on (UT extra field modtime): 2019 Jun 25 16:46:37 UTC   (ignored by some unzip software)
loc, err := time.LoadLocation("Europe/Berlin")
if err != nil {
	// Location data needs to be added to Dockerfile for dev/prod with
	// RUN apk add --no-cache tzdata
	// see: http://pouzek.si/blog/go-loadlocation-docker/
	return 0, fmt.Errorf("Can't load time Location: %v", err)
}

// Set file meta in the zip
fileHeader := &zip.FileHeader{
	Method: zip.Store, // no compression, use for fastest speed,
	Modified = modified.Time().In(loc),
}

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

I think for Go this must be fixed in the writer rather than the user script. In the APPNOTE it only says MSDOS time, so technically using the UTC is ok, but since all other compressors seem to use local most extractors will expect it to be the local.

4.4.6 date and time fields: (2 bytes each)

   The date and time are encoded in standard MS-DOS format.
   If input came from standard input, the date and time are
   those at which compression was started for this data. 
   If encrypting the central directory and general purpose bit 
   flag 13 is set indicating masking, the value stored in the 
   Local Header will be zero. MS-DOS time format is different
   from more commonly used computer time formats such as 
   UTC. For example, MS-DOS uses year values relative to 1980
   and 2 second precision.

@philip-firstorder
Copy link
Author

The reasons why Go wants the Modified field in local are:

  1. To give priority to the legacy DOS time, which all extractors understand also as local
  2. Go cannot automatically read the user's or server's location, it needs to be specified like above
  3. The Go archive/zip writer can compute the 2 extra UT fields based on the local date
  4. Allows the user the possibility to set a custom local, like reading it from a client profile setting or passed by the front-end by some region auto-detection script
  5. Setting only the Modified field in local time is better than setting 2 fields ModifiedUTC & Location, they wanted to keep the zip.FileHeader field naming the same as in the official specs

So I think Go is right here.

What is not OK is the specification of zip itself, because they got too lazy to solve this problem when they first implemented MS-DOS times, and because they didn't deprecate it in 30 years.

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

I'm not sure I understand your point. I meant here:

  file last modified on (DOS date/time):          2019 Jun 25 11:58:56
  file last modified on (UT extra field modtime): 2019 Jun 25 13:58:57 local
  file last modified on (UT extra field modtime): 2019 Jun 25 11:58:57 UTC

The first date, the DOS one, should be local instead of UTC, since most extractors expect that to be local. Looking at the writer code I see this:

// Contrary to the FileHeader.SetModTime method, we intentionally
// do not convert to UTC, because we assume the user intends to encode
// the date using the specified timezone. A user may want this control
// because many legacy ZIP readers interpret the timestamp according
// to the local timezone.
//
// The timezone is only non-UTC if a user directly sets the Modified
// field directly themselves. All other approaches sets UTC.
fh.ModifiedDate, fh.ModifiedTime = timeToMsDosTime(fh.Modified)

Even if they say "legacy ZIP readers", I think most readers (even Go I assume) will use the extended values instead of the DOS one, so keep that time to local will be better, in my opinion.

@philip-firstorder
Copy link
Author

philip-firstorder commented Jul 4, 2019

Ah, sorry, my example above was BEFORE I implemented in go the DOS date/time in local format, if for example you look on the other 5GB archive zipinfos you'll see it's all in local time.

Otherwise I wouldn't have been able to detect this bug, as both the DOS and UT extra fields would have had the same local date, giving me the impression that all is well.

So the problem is actually that Keka doesn't read from the UT extended time for the files in Folders.

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

So the problem is actually that Keka doesn't read from the UT extended time for the files in Folders

#422 (comment) So unar was ignoring the extended data, but fixed that in MacPaw/XADMaster#99 and will release the fixed version in 1.1.18 😊

@philip-firstorder
Copy link
Author

Yes, then you can set this as solved I guess, that was the only problem.

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

I did set it as fixed (the label) but I prefer to close it when 1.1.18 is released.

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

Also would be awesome to change the writer.go behavior to set the DOS date in local instead of UTC.

@philip-firstorder
Copy link
Author

philip-firstorder commented Jul 4, 2019

I managed to set the DOS time in local time like this, and the go library also sets the Expanded flags accordingly.

// Modified date should be set in LOCAL format of the hosting server
// This will be converted by the archive/zip package to 3 formats:
// file last modified on (DOS date/time):          2019 Jun 25 18:46:36 (up to 2 seconds loss because of DOS conversion)
// file last modified on (UT extra field modtime): 2019 Jun 25 18:46:37 local (ignored by some unzip software)
// file last modified on (UT extra field modtime): 2019 Jun 25 16:46:37 UTC   (ignored by some unzip software)
loc, err := time.LoadLocation("Europe/Berlin")
if err != nil {
	// Location data needs to be added to Dockerfile for dev/prod with
	// RUN apk add --no-cache tzdata
	// see: http://pouzek.si/blog/go-loadlocation-docker/
	return 0, fmt.Errorf("Can't load time Location: %v", err)
}

// Set file meta in the zip
fileHeader := &zip.FileHeader{
	Method: zip.Store, // no compression, use for fastest speed,
	Modified = modified.Time().In(loc),
}
// Contrary to the FileHeader.SetModTime method, we intentionally
// do not convert to UTC, because we assume the user intends to encode
// the date using the specified timezone. A user may want this control
// because many legacy ZIP readers interpret the timestamp according
// to the local timezone.
//
// The timezone is only non-UTC if a user directly sets the Modified
// field directly themselves. All other approaches sets UTC.
fh.ModifiedDate, fh.ModifiedTime = timeToMsDosTime(fh.Modified)

They won't change it, because it would overwrite whatever custom timezone the user specifies. I fixed this on my side already.

@aonez
Copy link
Owner

aonez commented Jul 4, 2019

What they say there is that if the user informed fh.Modified (as you did in your script) do not modify that date.

	// If Modified is set, this takes precedence over MS-DOS timestamp fields.
	if !fh.Modified.IsZero() {
		// Contrary to the FileHeader.SetModTime method, we intentionally
		// do not convert to UTC, because we assume the user intends to encode
		// the date using the specified timezone. A user may want this control
		// because many legacy ZIP readers interpret the timestamp according
		// to the local timezone.
		//
		// The timezone is only non-UTC if a user directly sets the Modified
		// field directly themselves. All other approaches sets UTC.
		fh.ModifiedDate, fh.ModifiedTime = timeToMsDosTime(fh.Modified)

But the problem when fh.Modified is not informed (the default) is that fh.ModifiedDate and fh.ModifiedTime are set using time.Time, that is set in UTC:

	// Modified is the modified time of the file.
	//
	// When reading, an extended timestamp is preferred over the legacy MS-DOS
	// date field, and the offset between the times is used as the timezone.
	// If only the MS-DOS date is present, the timezone is assumed to be UTC.
	//
	// When writing, an extended timestamp (which is timezone-agnostic) is
	// always emitted. The legacy MS-DOS date field is encoded according to the
	// location of the Modified time.
	Modified     time.Time
	ModifiedTime uint16 // Deprecated: Legacy MS-DOS date; use Modified instead.
	ModifiedDate uint16 // Deprecated: Legacy MS-DOS time; use Modified instead.

But they are also converting the DOS date to UTC in the reader, so I'm not sure they would like to change this.

	msdosModified := msDosTimeToTime(f.ModifiedDate, f.ModifiedTime)
	f.Modified = msdosModified
	if !modified.IsZero() {
		f.Modified = modified.UTC()


		// If legacy MS-DOS timestamps are set, we can use the delta between
		// the legacy and extended versions to estimate timezone offset.
		//
		// A non-UTC timezone is always used (even if offset is zero).
		// Thus, FileHeader.Modified.Location() == time.UTC is useful for
		// determining whether extended timestamps are present.
		// This is necessary for users that need to do additional time
		// calculations when dealing with legacy ZIP formats.
		if f.ModifiedTime != 0 || f.ModifiedDate != 0 {
			f.Modified = modified.In(timeZone(msdosModified.Sub(modified)))
		}
	}

Since they also set the extended dates, I will keep that as fixed with the unar fix 🤷🏻‍♂️

@philip-firstorder
Copy link
Author

I got go‘s point: set only Modified field to the local timezone and they take care of everything else. That‘s why they deprecated the other time fields.

unar fix is enough. Thanks!

@aonez
Copy link
Owner

aonez commented Sep 4, 2019

@philip-firstorder version 1.1.19 has all fixes, released now in the WEB, in a few hours in the App Store :)

@aonez aonez closed this as completed Sep 4, 2019
@philip-firstorder
Copy link
Author

philip-firstorder commented Sep 4, 2019

Very nice, I just tested it and confirm it fixes this bug and can also open the 5GB archive without any warning.

I will send my users to use Keka :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants