New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API suggestion: System.IO.FileAlreadyExistsException #34221

Open
Neme12 opened this Issue Dec 22, 2018 · 28 comments

Comments

Projects
None yet
7 participants
@Neme12
Copy link

Neme12 commented Dec 22, 2018

(similar to #34220)

Scenario: I want to create and write some default data into a file unless it already exists.

A naive implementation would be:

if (!File.Exists(path))
{
    using (var fileStream = File.Open(path, FileMode.CreateNew))
    {
        // ...
    }
}

The problem with this is that I'm accessing the file system at 2 points in time. As @jaredpar has taught me, File.Exists is evil. There's no guarantee that because File.Exists returned false, the file still won't exist when calling File.Open.

To be robust, we should just call File.Open and catch the exception it throws in case the file already exists. The problem is that it throws System.IO.IOException with a message of "The file already exists". There is no specific exception type for this scenario. At first it would seem that the only thing we can do is catch the exception depending on its message string (which is a terrible idea), but luckily, there is a specific HResult for this failure, leaving us with:

Stream fileStream = null;

try
{
    fileStream = File.Open(path, FileMode.CreateNew);
}
catch (IOException e) when (e.HResult == -2147024816) // FILE_EXISTS
{
}

if (fileStream != null)
{
    using (fileStream)
    {
        // ...
    }
}

This works OK but makes the code seem unreadable and maybe even brittle because we're depending on a magic constant that comes somewhere from the Windows API. I'm not sure if this code works outside of Windows at all, but even if it does, it's definitely not obvious.

Please add a dedicated exception type for this kind of failure.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 22, 2018

For what it's worth, this is something that shows up relatively often in .NET and it can be a little frustrating. Many APIs throw a generic exception type with no way to determine the failure other than checking the exception message or if you're lucky, an HResult.

Please consider this in future APIs especially in an area like IO where this is not a contract exception - these exceptions are meant to be caught and dealt with. There's no way of knowing whether a file system action would fail in advance.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 22, 2018

Question: Why not just catch IOException. Why would you need to differentiate between this failure and other IO failures?

Answer: If the file already exists, that's fine and we don't need to do anything in that scenario. But in case of other IO failures, we might want to show an error message to the user, a "retry" dialog or have some other fallback logic.

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 23, 2018

This makes sense to me, right now we hand the user either 0x80070050 (ERROR_FILE_EXISTS) or 0x10014 (EEXIST). Same for the linked issue, the user has to look for ERROR_DIR_NOT_EMPTY or ENOTEMPTY or whatever it may be on any future OS.

That's doubly unfortunate because .NET is designed around exceptions not error codes. It is clumsy to handle both in the same method. Also, it forces platform specific code - and even on one platform several error codes may map to a single condition for example removing a non empty directory can give ENOTEMPTY or EEXISTS just on Linux. If the caller is not diligent, they may use the message field instead of the code, which will break when localized (or, more often if the OS is localized or changes the text).

If we do anything here, and it would be nice, it should not be bit by bit. We should do an audit to come up with a complete proposal, examining all the places where we overload IOException. In a quick glance it is used for include

  • Treating a directory as a file
  • Permission denied/read only (if not UnauthorizedAccessException)
  • File exists
  • Sharing violation
  • Invalid path in some cases
  • Part of path not found
  • Too large file mapping
  • Too many pipes
  • Invalid handle
  • Pipe disconnected
  • Write failed
  • Compression errors, eg., corruption
  • Copying to itself
  • Too many open files
  • File too large
  • File stream invalid position eg overrwite in append
  • Some registry access errors
  • Many network errors
  • Console errors, semaphores, ..
  • Memory streams
  • Any other IO error that's not specifically handled

Perhaps one reason IOException is so broad may have been a concern that it is easy to break code inadvertently by changing the exception type, if the differences are fine grained. (I know this is the reason that XmlException is used for so many distinct cases.) This does not seem like a good reason, with good unit tests and a stable platform. PathTooLongException, FileNotFoundException, DirectoryNotFoundException, DriveNotFoundException are all good stable fine grained exception types, and all derive from IOException. Also, if code depends on the error code, they already depend on the "type" of error.

As an aside we should also have a first class field for the path, if any. Right now that also has to be parsed out of the message if it's needed and not already known.

@stephentoub @JeremyKuhne @jkotas

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 23, 2018

We would only want to do this for cases that are significant, worth distinguishing, are not programming errors, are stable, well defined, and can be programmatically distinguished.

Looking through our code I see five potential ones:

  • DirectoryAlreadyExistsException mapping to EISDIR in context of rename/move directory or copy file to path of directory, or possibly when failure due to treating a directory as a file, some ERROR_ALREADY_EXISTS cases
  • FileAlreadyExistsException mapping to ERROR_FILE_EXISTS/EEXIST in context of FileMode.CreateNew, or extracting from zip to existing file, possibly ENOTDIR when treating a directory as a file, some ERROR_ALREADY_EXISTS
  • DirectoryNotEmptyException mapping to ERROR_DIR_NOT_EMPTY/ENOTEMPTY/EEXISTS in context of removing directory
  • UnauthorizedIOException mapping to ERROR_ACCESS_DENIED/EACCES/EPERM/?EBUSY if currently throwing IOException (not UnauthorizedAccessException)
  • SharingViolationException mapping to ERROR_SHARING_VIOLATION/EWOULDBLOCK except where eg it's due to attempting to copy a file to a directory. (Windowsism)

That leaves alone the long tail of random IO errors, and non file system errors (networking, semaphores, ..).

This is just a sketch of a possible choice, I did not analyze carefully.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Dec 23, 2018

#31917 is related to this. The current file I/O APIs are broken for "file already exist" and other similar situations that the programs often want to recover from gracefully.

We can fix it by either:
(1) Introducing more fine grained exceptions, like what is proposed here. This option encourages using exception handling for control flow that is not desirable.
And/or (2) Introducing new set of non-throwing APIs, like what is proposed in #31917. The advantage of this option is that it is more robust better performing design that matches library capabilities of other programming languages (C, Go, ....).

My vote would be to spend energy on (2). It is the more forward looking option.

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 23, 2018

Are they complementary?

This is an adjustment to help all the existing IO code written using exceptions be a bit more robust, with the addition of little or no code in CoreFX beyond the types - just changing some places we throw IOException to throw something more specific. It is just tuning an existing exception hierarchy.

The other proposal is a new approach to IO, which will need substantial design, establishing a new set of abstract IO codes, etc, ... It will take time, and does not help existing code that is not worth rewriting to the try model.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Dec 23, 2018

I do not think they are complementary. The value of (1) is going to be lost once we have (2). Once we have (2), the fine-grained exception hierarchy will be obsolete - the guidance is going to use the new APIs and to not use exception handling for control flow.

I agree that (2) is more work than (1).

We have been in similar situations before. For example, we used to have number of suggestions for new API variants that take arrays or unmanaged pointers. We could have spent energy on adding those and it would certainly be an incremental improvement on top of the existing designs. We have not done that. Instead, we have spent our energy on adding Span that was a superior design. API version of the Innovator's Dilemma.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 23, 2018

@jkotas I don't think this is quite the same as unmanaged pointers vs span. With span, you'll be able to use the same code as you would have if the API took an unmanaged pointer. It was worthwhile to wait for the new API because it was a generalization of the proposed one.

New non-throwing IO APIs on the other hand introduce a completely new paradigm. It's not a generalization. It's an alternative way of doing the same thing and it's somewhat a matter of style. I also believe that throwing and non-throwing alternatives should have the same feature set. Having the throwing versions lack certain exceptions for some very specific error codes, while having exceptions for the majority of other ones seems like it would be surprising to any user who isn't familiar with the fact that the non-throwing APIs are newer or that we're trying to encourage them to use the non-throwing API (which won't be obvious at all unless you actually make the exception hierarchy [Obsolete]).

Furthermore, it's not as simple as saying "do not use exceptions for control flow". In this case, you might consider "file already exists" to be part of control flow since it's an expected situation, but other kinds of IO failures would not be as expected and might be handled in a try-catch higher up the call stack. Exceptions would be more convenient for that use case.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 23, 2018

that matches library capabilities of other programming languages (C, Go, ....).

I find it hard to believe that C# programmers would expect the API to follow the patterns used in C and Go rather than C++ and Java.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Dec 23, 2018

Having both non-throwing and throwing versions of the same APIs is common for parsing or collections in C#. It is not a completely new paradigm. The two variants do not always have the same feature set and the throwing version is missing completely in some cases.

C# programmers would expect the API to follow the patterns

I have said capabilities, not patterns. The capability here is to be able to do non-exceptional file I/O error handling in robust performant way. It means without using exceptions in C# - throwing an exception in C# is a lot more expensive operation than opening a file.

Java

I believe exceptions for control flow are generally more acceptable in Java than in C#. For example, Java has int.Parse only. C# has both int.Parse and int.TryParse.

C++

Exceptions for control flow are generally less acceptable in C++ than in C#. I believe it is fairly common to see error codes returned in C++, e.g. https://en.cppreference.com/w/cpp/filesystem/create_directory. The standard C++ library does not have rich I/O exception hiearchy to encourage using exceptions for control flow.

other kinds of IO failures would not be as expected and might be handled in a try-catch higher up the call stack

This issue can be addressed by the API design, e.g. the API can have enum that says when to throw vs. when to return an error code.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 23, 2018

Having both non-throwing and throwing versions of the same APIs is common for parsing or collections in C#. It is not a completely new paradigm.

Sure. I meant it's a new paradigm to use with IO APIs. You will have to migrate your code over to this paradigm. It's very different from your example of pointers and Span.

I have said capabilities, not patterns. The capability here is to be able to do non-exceptional file I/O error handling in robust performant way.

I am not arguing against that, I like that proposal as well. But I think there should be consistency between these 2 paradigms. It's really a matter of style as to which one you prefer. I don't think we should be dictating that people use the new non-throwing if they don't want to deal with HResults.

The two variants do not always have the same feature set

But every time you have a throwing and a non-throwing alternative, the throwing alternative is never less complete - it's the Try-version that can lose some information. Isn't that the case?

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 23, 2018

I find it hard to believe that C# programmers would expect the API to follow the patterns used in C and Go rather than C++ and Java.

My point here was: users would also expect do to be able to the same thing with the throwing APIs as opposed to having to use the non-throwing one. Not that I don't want the non-throwing APIs to exist. But I think the throwing ones will always be seen as more idiomatic in .NET. It's definitely not something that people will consider to be obsolete.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 23, 2018

I have said capabilities, not patterns

I guess what I meant was: "the preferred API to follow the patterns used in C and Go..."

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Dec 24, 2018

It's really a matter of style as to which one you prefer.

We tend to form a strong opinion about the preferred style during the API design and add the new APIs around it. We do not generally add duplicate APIs to support other styles. Having one preferred style has a lot of value for platform usability. Also, the preferred styles tend to change over time as the new requirements or new fundamental features show up.

I think these are the questions to answer from the API design point of view:

  • What should the one preferred style for non-exceptional file I/O handling be? I am pretty sure that it should be the one that does not use exceptions for control flow.
  • Can we have both fine-grained exception hierarchy and error codes with full parity between them? Yes, but it comes with negative points for duplicate APIs.

I don't think we should be dictating that people use the new non-throwing if they don't want to deal with HResults.

Error codes do not imply HResults. .NET has number of domain-specific error code enums actually, e.g. SerialError, SocketError or WebSocketError. Maybe we should consider adding FileSystemError enum and matching property to IOException. The error code enum is much more lightweight than
fine grained exception hierarchy. And it would work nicely for both exception-throwing and error-code returning APIs and minimize duplication at the same time.

You will have to migrate your code over to this paradigm. It's very different from your example of pointers and Span.

I would expect that migrating the code to use error codes for file I/O would be much easier on average than migrating the code to use Spans. Opening a file and similar operations are usually scoped to a single method that one can change easily. On the other hand, Span is about passing data around. It is not unusual for Spans migration to involve non-trivial refactoring (changing many method arguments from arrays to Spans, etc.).

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 24, 2018

consider adding FileSystemError enum and matching property to IOException

That sounds fine to me too. I do think something should be done, as we have made it worse when we made it cross platform.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 24, 2018

Maybe we should consider adding FileSystemError enum and matching property to IOException.

I think that's fine too.

@joshudson

This comment has been minimized.

Copy link
Contributor

joshudson commented Dec 26, 2018

Hey, I've got a file system error table lying around. The intended usage is

catch (IOException ex) when (ex.HResult == Emet.FileSystem.IOErrors.AlreadyExists)
@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 26, 2018

catch (IOException ex) when (ex.HResult == Emet.FileSystem.IOErrors.AlreadyExists)

I don't think we want to change what the value of HResult is; even if we pick the existing Win32 values (which is rather limiting and do not necessarily match to sufficient distinct causes) that would break anyone expecting the Unix code today. Plus, it's a poor name for a cross platform code. On top of that, it's weakly typed as int rather than a nice enum.

This would require an API proposal:

  1. to add a property to IOException named perhaps Error or IOError (I think FileSystemError is a poor choice as, at least on Windows, it did not necessarily originate with a file system operation.). It would have the type of the enum in 2.

  2. to add a new enumeration named eg IOError and populate it with several useful entries perhaps based on my analysis above. We do not need to think of them all now, it's not a breaking change to add more later, but it would need a (brief) review so better to do it now.

@Neme12 any interest in doing this? The process is here and you would append to your top post because that makes it easier on the review committee.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 26, 2018

@Neme12 any interest in doing this? The process is here and you would append to your top post because that makes it easier on the review committee.

Sorry, I was interested in making a proposal for the exception type, but I don't think I'm the right person to determine the shape of a new enum for all IO errors. My knowledge there is very limited. I would feel much more comfortable if someone else took over this.

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 26, 2018

No problem. I"ll let this sit here for a volunteer.

@Neme12

This comment has been minimized.

Copy link

Neme12 commented Dec 27, 2018

Maybe the property should be called IOErrorCode to match SocketException.SocketErrorCode, WebSocketException.WebSocketErrorCode and to a lesser extent HttpResponseMessage.StatusCode.

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 27, 2018

It seems it should include a None or Success value -- although that would never be used in an exception, in #31917 it would be returned even on success.

@MarcoRossignoli

This comment has been minimized.

Copy link
Collaborator

MarcoRossignoli commented Dec 28, 2018

No problem. I"ll let this sit here for a volunteer.

@danmosemsft up-for-grabs?

@danmosemsft

This comment has been minimized.

Copy link
Member

danmosemsft commented Dec 28, 2018

@MarcoRossignoli yep, but this is a bunch of work up front to define and reach consensus on the cases/values for API review. I made a good start above.

@peteraritchie

This comment has been minimized.

Copy link

peteraritchie commented Dec 29, 2018

As @jaredpar has taught me, File.Exists is evil. There's no guarantee that because File.Exists returned false, the file still won't exist when calling File.Open

It's not evil, it's a run-of-the-mill race condition :)

File,Open is not very intention-revealing. This doesn't leave a very stable foundation for making generic robust code. File.Open is here to stay, it's foundational, a wrapper to implementation details. It's probably best to treat it as foundational and not use it directly when possible. E.g. Because File.Open is used by many things with different goals it's near impossible to change it (non-breaking) to suit specific needs or scenarios.

So, the real solution is to have new intention-revealing interfaces (methods) that can encapsulate this type of logic. A quick-and-easy intention-revealing naming technique is the desired outcome. Since the intention isn't to simply open the file, the intention is to perform an operation on the file, the required outcome is a concrete reader or writer. e.g. the outcome in your example is to have a writable stream or a stream writer. So, we could create a method with a name describing that outcome. Maybe something like "CreateStreamWriter"? But, StreamWriter is a bit vague too, and your example uses Create, so maybe "CreateEmptyFileStreamWriter"? But, wait, don't we have a method that does this already? File.CreateText does exactly what your code describes

Creates or opens a file for writing UTF-8 encoded text. If the file already exists, its contents are overwritten.

So, I think there's a better way to provide value than to create a new FileAlreadyExistsException exception. While I might not think the naming is perfect, there's already something there that provides the end goal without adding a new means goal that gets you to the same end goal without spending much time ensuring a non-breaking change and reaching consensus.

@joshudson

This comment has been minimized.

Copy link
Contributor

joshudson commented Dec 30, 2018

I don't think we want to change what the value of HResult is

Correct. They way my thing lying around works is it's a platform-specific binary. If you build for win32 you get one version of Emet.Filesystems.dll. If you build for Linux you get a different dll. If you build platform-independent the loader selects the correct one at runtime.

@joshudson

This comment has been minimized.

Copy link
Contributor

joshudson commented Dec 30, 2018

@peteraritchie

File.Exists is evil because it returns false on transient IO errors at the hardware level (not a bad disk--SAN was overloaded). I've lost data due to this.

@peteraritchie

This comment has been minimized.

Copy link

peteraritchie commented Dec 31, 2018

@joshudson Another good reason to provide functionality to get the outcome you need rather than try to change something that can't realistically be "fixed" because it already does what it's supposed to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment