Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement issue# 13996. Add File.tempFile. #2956

Merged
merged 1 commit into from
Mar 31, 2015

Conversation

jmdavis
Copy link
Member

@jmdavis jmdavis commented Feb 4, 2015

This adds a File.tempFile, which generates a random file name and
returns an open std.stdio.File for it. Unlike with File.tmpfile, it's a
normal file which is not deleted when the file is closed, and you
actually have access to the file's name, which is necessary in many
situations - particularly when writing unit tests that need to write to
a file and then read from it.

version(Posix)
{
import core.sys.posix.sys.stat, core.sys.posix.fcntl;
enum flags = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting mktemp here.

When creating a file, the resulting file has read and write permissions for the current user, but no permissions for the group or others; these permissions are reduced if the current umask is more restrictive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I take it that you're suggesting that this just be

enum flags = S_IRUSR | S_IWUSR;

I guess that that makes sense. It's been so long since I wrote that that I don't even remember why I picked those permissions - probably because it was the least restrictive.

@MartinNowak
Copy link
Member

Great looks really good, much simpler than I thought due to the existing fdopen and name.

chars[0 .. letters.length] = letters[];
chars[letters.length .. $] = digits[];

foreach(ref c; name[prefix.length .. $])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ - suffix.length

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that that's what I get for just quickly throwing the suffix in there, since all it had before today was a prefix.

@jmdavis
Copy link
Member Author

jmdavis commented Feb 5, 2015

Okay. I made changes per the comments.

dir = Directory of the temporary file. Defaults to the result of
$(XREF file, tempDir).
*/
static File tempFile(string prefix = null, string suffix = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the documentation is generated with warnings it will complain about no parameter for dir:

Warning: Ddoc: function declaration has no parameter 'dir'

If you put the documentation on the other overload instead it will not complain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's ugly, since the overload without dir is the short one. I'd have hoped that the compiler would be smart enough to not give a warning when one of the documented items matched the documentation - though honestly, even having warnings for documentation like that is a bit off when you consider cases like when a struct and function share documentation as sometimes occurs when a function returns a specific struct. So, while I can certainly see how such warnings could be useful, I question that they actually make sense ultimately.

Regardless, as ugly as it may be, I can rearrange the functions to avoid the warning.

@JakobOvrum
Copy link
Member

So now we'll have both File.tmpfile and File.tempFile, each with slightly different behaviour? :(

@jmdavis
Copy link
Member Author

jmdavis commented Feb 5, 2015

So now we'll have both File.tmpfile and File.tempFile, each with slightly different behaviour? :(

I would love to get rid of tmpfile. IMHO, it's useless. Every time that I've need a temporary file, I've need to be able to close it and reopen it again, and that's not possible with tmpfile. The only reason that tmpfile isn't marked as scheduled for deprecation here is because Martin was arguing against it in the enhancement request that this goes with, and I didn't really want to argue about it. But I have no problem with putting tmpfile on the deprecation path. It's a misfeature IMHO.

@andralex
Copy link
Member

andralex commented Feb 7, 2015

I think we should rename tempFile to scratchFile

@jacob-carlborg
Copy link
Contributor

I think we should rename tempFile to scratchFile

I don't think that is a name a developer would look for when trying to create a temporary file.

@quickfur
Copy link
Member

quickfur commented Feb 7, 2015

@jmdavis Isn't reopening a tempfile with a fixed filename vulnerable to security holes? E.g., if somebody successfully guesses the filename and creates it between the time it's closed and the time it's open, with maliciously-crafted file permissions/ownerships?

@schuetzm
Copy link
Contributor

schuetzm commented Feb 7, 2015

Andrei has a point, because if the file is not automatically deleted, it's not really temporary.

Anyway, maybe there could be a compromise, but I don't know whether it's possible: Could the file be deleted at first, so that it automatically gets released when the program ends, but as soon as someone accesses the file name, a link (hardlink) in /tmp is created? The first part is easily achievable on Posix, and I believe on Windows you can use FILE_SHARE_DELETE, but I don't know whether there's a cross-platform way to create a link from an fd. On older versions of Linux, you used to be able to create a link from /proc/self/fd/... to the desired filename, but this doesn't work anymore. OTOH, there's now (>= 3.11) O_TMPFILE, which can be used in combination with linkat() to achieve the same effect in a cleaner way [1], [2]. No idea about Windows and the BSDs though...

[1] http://man7.org/linux/man-pages/man2/open.2.html
[2] http://man7.org/linux/man-pages/man2/linkat.2.html

@schuetzm
Copy link
Contributor

schuetzm commented Feb 7, 2015

@quickfur It's opened with O_EXCL, kept open, and then a File is created from the fd. It's fine AFAICS.

@quickfur
Copy link
Member

quickfur commented Feb 7, 2015

@schuetzm I was responding to @jmdavis 's comment that he wanted to be able to close the file and reopen it again later.

@jmdavis
Copy link
Member Author

jmdavis commented Feb 8, 2015

@andralex

I think we should rename tempFile to scratchFile

I don't really care. It's functionality itself that matters, but I've never heard anyone use the term "scratch file, " and I'd expect anyone looking for this sort of functionality to be looking for something referring to "temporary files." So, I would think that it would be missed much more easily if it were called scratchFile. A link to it could be provided in tmpfile's documentation though, since tmpfile is likely what folks would find first when looking for this functionality.

Isn't reopening a tempfile with a fixed filename vulnerable to security holes

@quickfur I honestly have no idea what the purpose of a file is if you can't close it and reopen it. Obviously, someone has a use for that, or tmpdir wouldn't exist in the first place. But the fact that most files that anyone deals with have specific names means that most any file that you open has the risk of being replaced by another process with the permissions to do so, so if you have security concerns with a file being changed out from under you, because you closed it and reopened it, then you're going to have those concerns with almost every file on the computer. If there are good reasons to keep tmpfile around because of its unique properties, then fine. I didn't deprecate it as part of this PR, because I didn't want to argue the point. But for every situation that I've seen where a temporary file would be useful, tempFile does exactly what I'd need. The need for it has come up several times in the newsgroups before, and apparently, dub recently had a need for it as well. So, I'm quite certain that we should have something like tempFile, regardless of whether tmpfile is of any use or not (though I have never seen a use for it myself).

Andrei has a point, because if the file is not automatically deleted, it's not really temporary.

It's my understanding that most temporary files go into a directory where the OS will delete it later (typically when rebooting). Anyone who doesn't want to rely on that can just delete it themselves when they're done with it. But having the file deleted when it has been closed is something that I have never found a use for, and I have frequently needed to be able create temporary files that I've needed to reopen (especially in unit tests, though the need does come up occasionally as part of the normal operation of a program). So, I see no need to complicate things by trying to figure out how to the delete the file automatically, especially when it's trivial enough for the progam to take care of that itself if it cares.

@jmdavis
Copy link
Member Author

jmdavis commented Mar 23, 2015

Okay. It's now named scratchFile, and I made some minor improvements based on the feedback. And it now passes on all platforms - including Win64 (which it was failing on before).

@jmdavis jmdavis force-pushed the tempFile branch 2 times, most recently from c2ec5f9 to 75102a5 Compare March 24, 2015 01:54
return retval;
}

throw new Exception("Failed to create a temporary file.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy with this kind of exception usage. Basically you tried 3 times and then fail with an unspecific error, that doesn't indicate how to recover from the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the odds and an added errno check, failing 3 times would indicate a bad RNG or a hacking attempt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy with this kind of exception usage. Basically you tried 3 times and then fail with an unspecific error, that doesn't indicate how to recover from the issue.

I agree. I commented on the ErrnoException that was used previously. Throwing a plain Exception is not any better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. It now throws an ErrnoException for the last failed open attempt. So, it won't tell you why the others failed, but you could check the error code of the last one to see whether it was a permissions problem or whatever. I'm not sure what else you might want me to do here. The only other thing that I can think of would be to create a new exception type, which would be overkill IMHO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrnoException is just the right thing, because it contains a human-readable error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrnoException is just the right thing, because it contains a human-readable error message.

ErrnoException is always the wrong thing and shouldn't exist in the first place, see my comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what else you might want me to do here. The only other thing that I can think of would be to create a new exception type, which would be overkill IMHO.

Yes, that's exactly what you should do. In general there's way to much laziness when it comes to exceptions in Phobos. Basically everywhere enforce is used which by default throws a plain Exception. It's not possible to do good error handling with exception types that are loosely defined on what they mean.

This adds a File.scratchFile, which generates a random file name and
returns an open std.stdio.File for it. Unlike with File.tmpfile, it's a
normal file which is _not_ deleted when the file is closed, and you
actually have access to the file's name, which is necessary in many
situations - particularly when writing unit tests that need to write to
a file and then read from it.
@jmdavis
Copy link
Member Author

jmdavis commented Mar 30, 2015

Updated per most recent comments.

@MartinNowak
Copy link
Member

Auto-merge toggled on

MartinNowak added a commit that referenced this pull request Mar 31, 2015
Implement issue# 13996. Add File.tempFile.
@MartinNowak MartinNowak merged commit 8a65ff8 into dlang:master Mar 31, 2015
limit, filename);
throw new ErrnoException(msg);
}
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be slightly better to only retry when errno was EEXISTS, i.e. replace the if block above with errnoEnforce(errno == EEXISTS, msg), because none of the other open errors will go away by retying (http://linux.die.net/man/2/open).

@CyberShadow
Copy link
Member

This pull request caused a +508KB (684KB -> 1191KB) increase in the size of a compiled "Hello, world" program.

https://issues.dlang.org/show_bug.cgi?id=14539

@MartinNowak
Copy link
Member

Are static functions not emitted to a separate object like free standing functions?

@CyberShadow
Copy link
Member

It's probably due to the new imports.

@CyberShadow
Copy link
Member

Treemaps:
http://thecybershadow.net/d/mapview/view.php?id=5549a753478ac - before
http://thecybershadow.net/d/mapview/view.php?id=5549a75cb44fb - after
(These are from Windows map files; above measurements are from Linux/64)

It is now pulling in:

  • std.datetime
  • std.format
  • std.array
  • std.windows.registry
  • std.algorithm
  • std.utf
    etc.

@MartinNowak
Copy link
Member

I hate static constructors and how they bloat statically linked binaries.

@CyberShadow
Copy link
Member

There's also Object.factory, which is pulling in all declared classes' virtual methods and everything they use, but I'm not sure if it's at play here.

@MartinNowak
Copy link
Member

There's also Object.factory, which is pulling in all declared classes' virtual methods and everything they use, but I'm not sure if it's at play here.

A lot of this stuff should be done using weak-linkage, but we should also finally gear towars a shared phobos library.

@WalterBright
Copy link
Member

I'm reverting this because of the disastrous executable file size increase in code that doesn't use this new function. The functionality needs a redesign.

@CyberShadow
Copy link
Member

A simple but hackish fix would be to make the functions templated (add empty TemplateArgumentList). The imports will only be pulled in when the function is called (and the template instantiated). The downside is that things such as taking the address of the functions will fail and require an explicit instantiation (&scratchFile!()).

@jmdavis
Copy link
Member Author

jmdavis commented May 11, 2015

A simple but hackish fix would be to make the functions templated (add empty TemplateArgumentList). The imports will only be pulled in when the function is called (and the template instantiated). The downside is that things such as taking the address of the functions will fail and require an explicit instantiation (&scratchFile!()).

I wouldn't really consider that to be much of a downside, since I can't imagine that it's a common thing to do with a function like scratchFile, and scratchFile is new, so you're not going to have a bunch of code floating around that would break due to such a change. And if pretty much all it takes to get rid of the extra bloat in hello world is to templatize scratchFile, then seems like a sensible change to me, particularly since calling it is the same either way.

@CyberShadow
Copy link
Member

Right, I'm just saying that this isn't a universal fix, and doesn't handle the root of the issue.

@MartinNowak
Copy link
Member

Wait a second, this means anyone using the function would have to run semantic, import those module, and optimize the code. Right now this only has to be done one when compiling phobos. Please let's address the real cause.

@timotheecour
Copy link
Contributor

digger bisect returned this commit as cause of https://issues.dlang.org/show_bug.cgi?id=14828
(see comment in that bug though)

@MartinNowak
Copy link
Member

Binary size increase mostly seems to come from using std.file (tempDir). Maybe we can move the implementation of that into a std.internal.file module. Of course we could also split std.file into a package, but that wasn't received too well lately.

@MartinNowak
Copy link
Member

Also weird that a static function of a struct get's dragged into the executable without that function being used. Maybe we need to tweak our multilib code in the compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.