Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: attribute 'archive' only for completed downloads #3354

Closed
wmark opened this issue Dec 9, 2018 · 11 comments
Closed

windows: attribute 'archive' only for completed downloads #3354

wmark opened this issue Dec 9, 2018 · 11 comments

Comments

@wmark
Copy link
Contributor

wmark commented Dec 9, 2018

The archive bit (FILE_ATTRIBUTE_ARCHIVE, 0x20) tells files that shall be backed up from those that are either not ready or have not changed.

Downloads in progress are neither ready to be backed up, nor should they be opened by a different process. Only after a download has been completed it's sensible to include it in any integer snapshot or backup of the system.

For example, if you were downloading a Linux ISO you don't want to carry a fraction of it through the history of your backup sets. Indeed, it will give the wrong impression this were the file's valid state.


Open two command prompts, one to start the download and the second to see attributes. We'll focus on the output of the second while the download is in progress.

curl.exe --remote-name <URL>/<fname>
attrib <fname>

… which gives:

A            <fname>

… but the archive bit shouldn't be set at this point, so this is what we'd ideally see (with or without any FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS):

             <fname>

Happens with curl 7.61.1 (x86_64-pc-win32)
using Windows 10 1809, though likely not limited to 10 1809 or 7.61.1.

@bagder bagder added cmdline tool Windows Windows-specific labels Dec 9, 2018
@bagder
Copy link
Member

bagder commented Dec 9, 2018

We're using good old fopen() to create new files so clearly Windows then gives it that attribute.

But why is it curl's job to tell what files that should be backed up or not? curl creates files on request, if that should be backed up or not while being transferred seems like someone else's decision.

@wmark
Copy link
Contributor Author

wmark commented Dec 9, 2018

Your assessment is correct. Nonetheless, if curl stripped that bit after fopen() and set it after a complete download (no matter what), it would amount to a much friendlier behaviour. For the reasons described above. – So far it's no bug but RFE.

Most Windows programs do this, but I believe one shouldn't follow bad examples. Although the usual practice is to create a temporary file. Curl does everything in-place, hence this becomes relevant.

Unfortunately the workaround:

fsutil file createNew <fname> 0
attrib -A <fname>
curl.exe -C - --remote-name …

… will not work because now unexpectedly curl re-sets the attribute (retaining all other, like I or T).

@bagder
Copy link
Member

bagder commented Dec 9, 2018

curl re-sets the attribute

... because that's what fopen() does.

@wmark
Copy link
Contributor Author

wmark commented Dec 9, 2018

Prematurely, and as part of curl. ;-)

I think the same reasoning can be applied to flags like --cacert: Why use it if Windows' good old whatever exists.

Properly indicating that and when something is ready to be archived (or not) is an handy advancement.

@bagder
Copy link
Member

bagder commented Dec 9, 2018

Sure, but... if the almighty powers at Microsoft agree with your assessment on how this should work, why don't they just make sure that these flags are set accordingly and fully automatically since they are in full control?

I'm not actually arguing against your enhancement request, I just can't see the logic in how this works. But I'm not a Windows person, I don't use Windows, I don't backup on Windows and I will not personally work on implementing what you're suggesting (as it can only be done using windows-specific APIs and tests). If what you're suggesting is generally considered The Right Thing, then I'm sure others will give a possible future PR for this the necessary 👍

@wmark
Copy link
Contributor Author

wmark commented Dec 9, 2018

On Windows, much is not changed (evolved) for »legacy« reasons. I could tell a far better story for Linux.

Absence of divine intervention is no sign of disagreement (nor encouragement).

For my notes, and whoever else's, for a PR: I suspect it's two prayers of SetFileAttributesA

(Under Linux you avoid the create-times-attribs race condition by opening, trunc, (fallocate,) (madvise,), attribs, time. Ideally with an initial O_TMPFILE and linking the result into existence. Windows doesn't have the latter but said cludge, the archive attribute.)

@dscho
Copy link
Contributor

dscho commented Dec 14, 2018

@wmark how about working on this? Should be fun.

And yes, the SetFileAttributes() function would be the correct thing to use here (for inspiration: Git for Windows uses the same for setting the hidden flag: https://github.com/git-for-windows/git/blob/v2.20.0.windows.1/compat/mingw.c#L593-L604).

@jay
Copy link
Member

jay commented Dec 14, 2018

We could probably do this with CreateFile and _open_osfhandle to avoid the race condition, I will take a look.

@jay jay reopened this Dec 14, 2018
jay added a commit to jay/curl that referenced this issue Dec 15, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.


- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
@jay
Copy link
Member

jay commented Dec 15, 2018

We could probably do this with CreateFile and _open_osfhandle to avoid the race condition, I will take a look.
I tried it but it doesn't work as I expected, refer to curl:master...jay:win32_disable_archive_bit_until_close. Basically Windows can turn on the archive bit whenever the file is modified and AFAICS we can't really stop that.

What about an option to disable shared read, this way when curl opens an output file for writing it cannot be read by other programs until the file is closed?

Or how about an option to use something temporary (like this fopen_excl_genfile from an old example)

jay added a commit to jay/curl that referenced this issue Dec 15, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.

- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Dec 15, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.

- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Dec 15, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.

- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Dec 15, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.

- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Dec 16, 2018
(***Draft***)
Doesn't work fully. Windows still turns on archive bit even if
FILE_ATTRIBUTE_NORMAL is used to create the file, regardless of whether
or not it already exists. Also even if the archive bit is removed after
creation Windows can turn it back on whenever the file's buffer is
written to disk, ie modified.

- Change the output files created by the curl tool via
  tool_create_output_file() to disable the file's archive bit before
  writing to the file and enable it before closing the file.

The curl tool on Windows opens output files in a way that other
processes can read the file as it's being written, ie fopen default
shared read. Prior to this change a backup process that depends on the
archive bit could see fit to backup an output file before we are
finished writing it. Therefore we now disable the file's archive bit
before writing to it and enable it before closing the file.

Fixes curl#3354
Closes #xxxx
@bagder
Copy link
Member

bagder commented Jan 10, 2019

@jay are you planning on turning this work into a real PR? If not, I rather make this a TODO item and close this issue for now.

@jay
Copy link
Member

jay commented Jan 20, 2019

@jay are you planning on turning this work into a real PR? If not, I rather make this a TODO item and close this issue for now.

No because it doesn't work the way I expected, it is due to the nature of the OS that we can't make the guarantee AFAICT so there's nothing really TODO. I don't know how we can solve this in a palatable way and the alternate ideas don't have any traction. If the file must only contain the output of a valid completed transfer (or whatever) then the onus is on the user to, for example, download to a temporary file and if exit code 0 and http code 200 then replace the older version.

@jay jay closed this as completed Jan 20, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Apr 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants