-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directory and file exists methods could be more efficient #22219
Comments
OK... Now I'm massively confused. The reason I got started on this was #21678 which started from another issue dotnet/corefx#19716 (comment) which started because checking that file exists || directory exists is inefficient. However, looking at this code, it seems to return true for
|
Hmm... maybe that comment is just wrong here: https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L176-L177 |
Sorry :)
Shouldn't.
Looks like it. :) |
OK... walking back from the ledge now. Let's see... so. Are you saying you recommend |
This is related to the other (#19717) in that checking for existence period isn't efficient. You have to check twice (#19717) when you don't care if the existing path is a file or directory, which is a pretty common scenario. Underlying that scenario is this one- if you do care if it is a file or directory, the existing methods do way more work than they need to, notably in Windows. We pre-normalize the path, but the Windows API does that anyway when you get the attributes (which I presume is also the case for Unix). The current FileSystem impl throws, then we catch and turn it to false- which is also inefficient- allocating strings and exceptions then throwing them away. :( The Unix impl may actually benefit from pre-normalization as it calls multiple APIs. We'd have to validate. We don't want to change the existing Trimming the trailing separator is still something we'd want to keep. If you don't want to do this, that's fine, but I'm more than happy to help you through it if you do. I've got enough experience here that I can pretty confidently review your changes. :) |
I'd be happy to tackle this one. questions... It seems like the "common" function I want is something like a (not yet existing)
that doesn't ever throw on anything but rather just returns Also, it seems like we need to follow symlinks in this one, right? Just want to confirm. |
I've always been conflicted about how to represent the concept in a way that is easy to understand.
That is what I would do- much easier to iterate.
Yeah, as that is the default historical (Windows) behavior. |
@JeremyKuhne I've taken the first steps in addressing this one. Can you have a look over the commit referenced above (kellypleahy/corefx@1a3842c) and see if this is along the lines you were thinking? My general philosophy was to pass through all errors that were easily captured from the underlying system (for free) to the caller as enum values. The enum values that are consistent between the two OSes are reused - some can only come from one or the other OS (for example, I haven't done anything to use this new function yet. I assume we'd use it in Next steps - I don't know how best to test some of these cases. Some of the tests should be straightforward, but for some of them I need to create a path that has a folder in it that is not accessible to the user running the cc @karelz re: testing above ^ |
also,... might be good to assign this one to me so nobody else grabs it ;) |
Your wish was granted 😆, it's now assigned to you. (BTW: Thanks for your help!) Re: Testing - I don't think we have abilities to switch users in test runs. It is very corner-case scenario and our infra was focused on mainline scenarios so far. |
It is a good start. Some more outstanding stuff (to make sure they're in your todo list):
I think that is fine, as long as we don't surface them in our public API, as they don't align nicely cross-plat. Passing them back internally like this may make it easier to modify/add new API that depends on this helper. I would also make sure that
Afaik we haven't written anything to do this yet. A simpler answer might be to remove file rights (but not ownership), then restore them? We might also be able to find OS files that we can't hit? (Don't know any off the top of my head.) From this point I'd move towards an official PR for discussion after you make more changes- you can mark it as [WIP] if we're still hashing things out. |
OK cool. How do I "mark it as WIP"? Do I just put [WIP] in the PR title? |
yep |
@kellypleahy Just a heads up- I've got to make some changes around this area for UAP work. dotnet/corefx#21149 changes the core FillAttributes- I'll drop mentions of any other changes that cross paths with this issue. |
Thanks, I got a bit distracted but will be jumping back in this weekend. I
can always rebase when you are done with your changes.
…On Fri, Jun 16, 2017, 16:01 Jeremy Kuhne ***@***.***> wrote:
@kellypleahy <https://github.com/kellypleahy> Just a heads up- I've got
to make some changes around this area for UAP work. dotnet/corefx#21149
<dotnet/corefx#21149> changes the core
FillAttributes- I'll drop mentions of any other changes that cross paths
with this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/dotnet/corefx/issues/20876#issuecomment-309158174>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADP2NIbJZNl-Mw7Dd4n6TaVc-WTQfofks5sEwk_gaJpZM4N1rFh>
.
|
I have been very busy lately and haven't been able to do anything additional on this. I'm hoping to get back into it in a week or so. If you deem it important to get started earlier, please feel free to reassign - I don't want to delay anyone who is waiting on this. |
No worries. If someone wants to grab it, they can speak up. It is in Future milestone, which means we won't look at it ourselves anytime soon. |
Happy to reassign if you free up... |
Costed at 3-4 days if @JeremyKuhne does it. Depends on dotnet/corefx#25539 to add Span overloads first. |
I'd really love to know the perf numbers of the File.Create, File.Exists and File.Delete operations cross platform. In our app (Exceptionless), we've noticed better performance on on windows and we are trying to track it down. Currently we are looking to replace all unneeded operations in our storage provider and just catch exceptions. https://github.com/FoundatioFx/Foundatio/blob/master/src/Foundatio/Storage/FolderFileStorage.cs |
@niemyjski Exists isn't horrible- you'd have to call it an awful lot to get a noticeable hit. Directory enumeration is the most expensive API we have- a good place to focus your investigation. I recently did a big update for Windows with enumeration perf and I'm looking to provide high-performance extension points cross plat. dotnet/corefx#25873 If you get some measurements of where your app is spending time in System.IO that you think can be improved feel free to open issues and loop me in. |
@Anipik this is one you could look at when you have finished other tasks and have free time later. @JeremyKuhne can give more details. |
@niemyjski I made significant improvements to FillAttributeInfo. The marshaling involved now uses blittable structs and it only falls back to FindFirstFile in the needed error case. Some of the marshaling changes are in 4.7.2. The key place the existing code is still bad is that it try/catches for non-existence. That is the first thing we should try to remove here. Additionally we can look at removing the normalization (e.g. Path.GetFullPath). The OS doesn't need us to do that work and we've removed almost all of the pre-checks from that code. If we do remove the GetFullPath() we would need to allow for the fact that the p/invokes assume normalization has occurred. That entails trimming trailing periods, spaces, and separators if the path isn't a device path. If it is a device path just the trailing separator has to be removed. |
Not as big of an issue as it used to be. Closing. |
Related to #21678
Exists methods have a try-catch for error cases and they normalize, which is not necessary (at least on Windows). As all error cases return false, we can remove most of this overhead.
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L181-L210
The only pre-emptive check we might want to keep from GetFullPath is to return false if the passed in path contains an embedded null.
Maybe have a
FileSystem.FileExistsFast
? This is the current Windows one:https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L181-L210
Unix is already in much better state. I assume that crappy and partial (relative) paths will work.
Tests for bad and relative paths are critical, need to validate existing coverage and expand if necessary.
The text was updated successfully, but these errors were encountered: