-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 7033: File.rawWrite is slow on Windows #7590
Conversation
|
Thanks for your pull request and interest in making D better, @canopyofstars! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#7590" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed changes introduces a slight breakage for code which operates on the underlying FILE* as well as the File instances because openMode may become invalid when modifying the file mode via the FILE*.
Such code is obviously less than ideal but will be broken by this change.
std/stdio.d
Outdated
| p = 1 << 4 | ||
| } | ||
|
|
||
| BitFlags!OpenMode openModeFlags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Make it
version(Windows)(it's only used on windows after all) BitFlagsis redundant given your enum definitionopenModeFlagsshould be placed afterisPopenedfor better packing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters anymore, but isn't the whole point of BitFlags to provide a safer interface than the raw enum value? If that's the case, I don't think it's superfluous.
std/stdio.d
Outdated
| Orientation orientation; | ||
| Orientation orientation; | ||
|
|
||
| enum OpenMode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short DDOC comments are appreciated
std/stdio.d
Outdated
| flush(); // before changing translation mode | ||
| immutable fd = ._fileno(_p.handle); | ||
| immutable mode = ._setmode(fd, _O_BINARY); | ||
| scope(exit) ._setmode(fd, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now happens before the actual write because the if introduces a scope relevant for scope (exit).
|
Is this actually faster? Are there any benchmarks? I think the proper way to do bulk writes is to use |
I think you are right -- in the general case. However, when you open a file for writing binary mode (as the bug report indicates), the fact that However, my view on the approach has changed, now that I've looked at this more closely. I originally thought you could cache the last mode (binary or not), but there are other modes besides BINARY which the FILE could be. Plus there's the valid concern that @MoonlightSentinel brings up that someone can alter the mode outside of the Here is my recommendation for implementation now (sorry for not giving more time to it earlier):
Edit, let's try this without an extra member. All that is needed is to try Here is the proposal, in pseudocode: rawWrite(T[]) {
auto oldMode = _setmode(fd, _O_BINARY);
if(oldMode != _O_BINARY) {
// need to flush the data that was written with the original mode
_setmode(fd, oldMode);
flush();
_setMode(fd, _O_BINARY);
}
... // do writing
if(oldMode != _O_BINARY) {
flush();
_setMode(fd, oldMode)
}
}The justification for calling All this should be versioned with Windows. This should solve the bug, AND satisfy @MoonlightSentinel's concerns. |
|
That sounds like it's in the right direction. Though, if I think it would be good to introduce a |
|
It's likely O(1), but still an opaque call at least, and at worst may have to lock something, which might make it more expensive? I'd expect that the likelihood that the binary mode changes between calls to For sure, we can do it without storing an extra boolean in the heap-allocated struct, but the cost of storing that extra boolean is pretty small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggestions in previous message.
I was thinking more about the complexity and the additional state (e.g. to a casual observer it wouldn't be clear what the interaction is between |
|
The single point of the boolean is to do the minimum And thinking about it more, if you use |
|
Implemented the proposal. By the way, this is DMC's int setmode(int fd, int mode)
{
// ...
for (fp = &_iob[0]; fp < &_iob[_NFILE]; fp++)
{
__fp_lock(fp);
// ...
}
// ...
} |
Yikes! Note that DMC has a limit of 64 open file descriptors anyway. But yeah, that's painful. However, it's clear from that code that the binary mode is not set on the file descriptor but rather the FILE * or |
|
Trailing whitespace on lines 1086, 1098, 1099, 1101 |
Unless I misunderstood, it seems that there is no _flag field for MS' implementation. else version (CRuntime_Microsoft)
{
// ...
///
struct _iobuf
{
void* undefined;
}
// ...
} |
std/stdio.d
Outdated
| immutable mode = ._setmode(fd, _O_BINARY); | ||
| scope(exit) ._setmode(fd, mode); | ||
| immutable oldMode = ._setmode(fd, _O_BINARY); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing whitespace line 1080 here (according to autotester)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good, assuming my editor did its job since I'm incapable of discerning needless whitespace.
Yep, you did not misunderstand. You would have to do something separate for digital mars stdio. It's not critical to do now I don't think, let's just get this in, and we can worry about squeezing more performance later. I still think the speed is going to be possibly comparable, and I also don't imagine many people are using DMC runtime over MSVC runtime. It might be a good gut-check to run against the DMC runtime with the original bug report (but set the open mode to non-binary) and see how it does before and after. |
Wouldn't it be something to this effect? version (Windows)
{
immutable fd = ._fileno(_p.handle);
version (MICROSOFT_STDIO)
{
// Do changes made in this commit.
}
version (DIGITAL_MARS_STDIO)
{
immutable info = __fhnd_info[fd];
if (info & FHND_TEXT)
{
flush();
atomicOp!"&="(__fhnd_info[fd], ~FHND_TEXT);
}
scope (exit)
{
if (info & FHND_TEXT)
{
flush();
__fhnd_info[fd] = info;
}
}
}
} |
No, you have to duplicate the code in DMC runtime and just read the flags (though you don't have to search, you know the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a squash-merge
Understood. But as you mentioned, the prevalence of DMC's runtime makes me question whether it's worth it. Regardless, thank you for posting this issue on the NG. I didn't do all that much but bungle your proposal's implementation, but it was still a learning experience. |
You did great! Thanks for making this fix. |
|
@CyberShadow any more comments on this? |
|
Well, in lieu of tests, some benchmark numbers would be nice, just so we're sure we're solving the right performance problem. (I see in principle why this would obviously be faster, but also, programs like |
Sure it does: https://github.com/coreutils/coreutils/blob/5b8161ee4dc95308c9ef89a5268aa87a7f32d4bf/src/cat.c#L513-L517 buffering is faster because you spend less system calls. Yes, the OS does buffering, but the cost of getting at the buffer is much higher than writing to memory. In the case of
I hate to set up CI tests that depend on performance, because we can't control the system being used to test.
I agree. I can see about running this PR on a Windows box. @canopyofstars if you have some benchmarks you can run, it would be useful to post here. |
No,
Pretty sure that's not how it works. |
|
The use case is completely different. Yes, There is no way to for My usual way of showing that buffering is worth it is to use $ time dd if=/dev/zero of=zeroes bs=4 count=1000000
1000000+0 records in
1000000+0 records out
4000000 bytes (4.0 MB, 3.8 MiB) copied, 1.91583 s, 2.1 MB/s
real 0m1.918s
user 0m0.681s
sys 0m1.237s
$ time dd if=/dev/zero of=zeroes bs=4000 count=1000
1000+0 records in
1000+0 records out
4000000 bytes (4.0 MB, 3.8 MiB) copied, 0.0069534 s, 575 MB/s
real 0m0.011s
user 0m0.004s
sys 0m0.007sBoth write 4MB, the first does it 4 bytes at a time, the second does it 4000 bytes at a time. |
Yes, that's exactly what I meant. What I meant to say in addition to that is that, when
The kernel does the buffering in the case of process pipes. I'm not sure if there's a guarantee there, though. I see there's a way to create a datagram-like pipe, but I didn't get it to work.
Good illustration, thanks. |
|
I've benchmarked two programs, where one uses the current rawWrite ( ldc2 was used to compile both executables.
Please note that To benchmark them, I created a brief PowerShell script. This script will print out the mean execution time (in milliseconds) over a number of repetitions (e.g. 1000).
The results:
As you can see, |
|
Thanks @canopyofstars. If possible, do you think you can run dmd instead of ldc, and make sure you use -m32 to ensure we are using the DMC runtime? I still think the improvement will be just as stark. |
Fantastic. Thank you very much. It's good to have numbers like this on record to justify the change. |
I reduced the number of iterations in the interest of time. After a few tests with
In this instance,
Indeed. I should've done so from the beginning with my first commit. |
FWIW (and future discussions) we no longer cater to DMD outdated optimizer or DMC's runtime here. It's a waste of time. |
understood. I just wanted to make sure it's not worse. I didn't see how it could be, but we do have another option in terms of handling DMC's runtime without the slow loop that currently is |
Fix issue 7033: File.rawWrite is slow on Windows merged-on-behalf-of: Steven Schveighoffer <schveiguy@users.noreply.github.com>
To fix this issue, the newly added
openModeFlags(and the enumOpenModefrom which it's derived) is used to store the current open mode of a file. Thus,rawWritecan check if the file was already opened in binary mode to obviate temporarily changing the mode and thus executing two flushes.A new method,
parseOpenModeStringis used to parse an open mode string, and it stores the desired open mode inopenModeFlags. It's called in every method that changes a file's open mode. (See the documentation of this function for more information.)I'd like your input on this, @schveiguy, since you asked to be notified about this.