Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RandomAccess.Read*|Write* methods work with non-seekable files #96711

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Jan 9, 2024

fixes #58381

@ghost
Copy link

ghost commented Jan 9, 2024

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #58381

Author: adamsitnik
Assignees: adamsitnik
Labels:

area-System.IO

Milestone: -

@adamsitnik adamsitnik changed the title Relax random access Make RandomAccess.Read*|Write* methods work with non-seekable files Jan 9, 2024
@adamsitnik adamsitnik marked this pull request as ready for review January 10, 2024 12:26
@adamsitnik adamsitnik added this to the 9.0.0 milestone Jan 10, 2024
ex = await Assert.ThrowsAsync<TaskCanceledException>(() => RandomAccess.ReadAsync(writeHandle, GenerateVectors(1, 1), 0, token).AsTask());
Assert.Equal(token, ex.CancellationToken);
ex = await Assert.ThrowsAsync<TaskCanceledException>(() => RandomAccess.WriteAsync(writeHandle, GenerateReadOnlyVectors(1, 1), 0, token).AsTask());
Assert.Equal(token, ex.CancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you avoid the duplicated calls by doing sth like this?

Task t = RandomAccess.WriteAsync(writeHandle, GenerateReadOnlyVectors(1, 1), 0, token).AsTask();
Assert.True(t.IsCanceled);
ex = await Assert.ThrowsAsync<TaskCanceledException>(() => t);
Assert.Equal(token, ex.CancellationToken);

int bytesRead = 0;
do
{
bytesRead += RandomAccess.Read(readHandle, buffer.Slice(bytesRead), fileOffset: 0); // fileOffset NOT set to bytesRead on purpose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to throw or break out of the loop if RandomAccess.Read returns 0 to avoid a potential infinite loop?


ReadExactly(readHandle, buffer, content.Length); // what is required for the above write to succeed

Assert.Equal(content, buffer.AsSpan(0, content.Length).ToArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we always used AssertExtensions.SequenceEqual for array comparisons.

byte[] buffer = new byte[content.Length * 2];
int readFromOffset456 = RandomAccess.Read(readHandle, buffer, fileOffset: 456);

Assert.InRange(readFromOffset456, 1, content.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the InRange used because it is not right to assume that Read will fill the whole buffer?

{
byte[] content = RandomNumberGenerator.GetBytes(BufferSize);
Task writeToOffset123 = RandomAccess.WriteAsync(writeHandle, content, fileOffset: 123).AsTask();
byte[] buffer = new byte[content.Length * 2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is read buffer's size double the size of content?

readOnlyVectors.SelectMany(vector => vector.ToArray()).Take(byteCount),
writableVectors.SelectMany(vector => vector.ToArray()).Take(byteCount));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing tests for Get/SetLength.

{
// We need to fallback to the non-offset version for certain file types
// e.g: character devices (such as /dev/tty), pipes, and sockets.
Interop.ErrorInfo errorInfo = Interop.Sys.GetLastErrorInfo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy for a future refactor to introduce something that resets last error before this is called? Maybe last error should be passed in so it's visible.

@tmds
Copy link
Member

tmds commented Jan 23, 2024

I assume this PR is about bringing the capability to pass multiple buffers to a read/write with a non-seekable handle?

I think it would make sense to add new methods that don't take an offset argument.

Rather than try to handle it for the user, I think we can expect the user to not use non -seekable handles with methods that take an offset. We don't need to try to detect and handle it.

@adamsitnik what do you think?

@adamsitnik
Copy link
Member Author

I assume this PR is about bringing the capability to pass multiple buffers to a read/write with a non-seekable handle?

The multiple buffers is a secondary goal, the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

I think it would make sense to add new methods that don't take an offset argument.

Yes and no. The main blocker to me is... the name of the type: RandomAccess, as what you describe is more of a SequentialAccess. When we were introduced this API we have discussed other names, the main problem was that all best names were already taken: File, FileAccess. Adding them to SafeFileHandle was not received well.

So we have 3 options right now:

  1. Relax the requirements for RandomAccess (this PR) and stop throwing for non-seekable files.
  2. Introduce non-offset based to RandomAccess type
  3. Introduce a new type, SequentialAccess and move the new methods there.

@stephentoub @jozkee what are your preferences here?

@stephentoub
Copy link
Member

stephentoub commented Jan 23, 2024

@stephentoub @jozkee what are your preferences here?

I still prefer (1).

We already found it valuable to be able to do that internally, which is why the single buffer methods already support this for internal use.

@tmds
Copy link
Member

tmds commented Jan 23, 2024

We already found it valuable to be able to do that internally, which is why the single buffer methods already support this for internal use.

I think the FileStream case is a special once since it is meant to work with both seekable and non-seekable handles.

The typical user of this API will have a handle that he means to use in a seekable way or not. I prefer (2) because if he does mean to use it in a seekable way, that is clear from the API he is using.

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

@jozkee
Copy link
Member

jozkee commented Jan 23, 2024

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

await Task.WhenAll(server.WaitForConnectionAsync(), client.ConnectAsync());

bool isAsync = (PipeOptions & PipeOptions.Asynchronous) != 0;
return (GetFileHandle(server, isAsync), GetFileHandle(client, isAsync));
Copy link
Member

@tmds tmds Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Unix, the handles are set to non-blocking when they are used in an async operation.
Because these handles weren't used yet, the handles are still blocking when PipeOptions.Asynchronous is set in RandomAccess_NonSeekable_AsyncHandles.

If the handles were non-blocking, some of the reads/writes should fail with EAGAIN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When making the handles non-blocking by adding:

if (isAsync)
{
    await client.WriteAsync(new byte[1]);
    await server.ReadAsync(new byte[1]);
}

6 of the RandomAccess_NonSeekable_AsyncHandles tests fail with:

  Error Message:
   System.IO.IOException : The process cannot access the file because it is being used by another process.

This is the EAGAIN that is surfacing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the EAGAIN that is surfacing.

Great catch! In such case, we should most likely do a bigger refactor and use the epoll code path here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some discussions in the past of making the epoll path more generic and usable elsewhere.
So far we've avoided the effort and used it as-is (like in #44647).

@stephentoub
Copy link
Member

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

That's reasonable.

@adamsitnik
Copy link
Member Author

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

The problem is that sometimes you don't expect that given file is non-seekable (you get a path and you don't know what it is exactly) and following code could fail:

int index = 0;
int count = requestedSize;
byte[] bytes = new byte[count];
while (count > 0)
{
    int n = RandomAccess.Read(sfh, bytes.AsSpan(index, count), index);
    
    index += n;
    count -= n;
}

@tmds
Copy link
Member

tmds commented Jan 24, 2024

the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

Perhaps the situation is different on Windows, but in contrast to using RandomAccess instead of FileStream, it's less obvious for me what the expected gains are for example when using it with a socket instead of using the Socket class directly.

@stephentoub
Copy link
Member

the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

Perhaps the situation is different on Windows, but in contrast to using RandomAccess instead of FileStream, it's less obvious for me what the expected gains are for example when using it with a socket instead of using the Socket class directly.

From my perspective:

  • It's not obvious you can use Socket with things other than sockets.
  • You shouldn't need to construct a class just to perform an operation on the SafeHandle; these static APIs will let you do that.
  • If you don't care about offset, then it shouldn't matter whether it's ignored or not.
  • You shouldn't need to know the seekable-ness of the file descriptor to know whether you can use these APIs.
  • If we were to add new methods without offsets, I would exect them to behave the same as if a default 0 offset were supplied, which means the existing ones shouldn't fail for such cases, which means we also wouldn't need the new overloads (for usability we could make the existing parameters optional).

@tmds
Copy link
Member

tmds commented Jun 22, 2024

It's not obvious you can use Socket with things other than sockets.

I didn't mean to suggest that Socket should be used for everything. If you are dealing with a pipe, you can use an AnonymousPipeClientStream.

You shouldn't need to construct a class just to perform an operation on the SafeHandle; these static APIs will let you do that.

Yes, these APIs avoid a user having to create a class. My comment was meant to challenge if this is much of a problem.

If you don't care about offset, then it shouldn't matter whether it's ignored or not.

Non-seekable handles are limited compared to seekable handles. I think it would be good if the API shows that it will work with non-seekable handles. We're blurring this distinction by making users pass an offset which they should expect to be ignored.

@stephentoub
Copy link
Member

@adamsitnik, what are you planning to do with this PR? Do we want it in .NET 9?

@adamsitnik
Copy link
Member Author

what are you planning to do with this PR? Do we want it in .NET 9?

I wanted to have it merged for 9, but as @tmds has pointed in #96711 (comment) the current implementation is not going to work for async handles on Linux. Making it work is possible, but would require a lot of work (mostly refactoring the epoll-related code and re-using it here). On the other hand, we could just document this limitation as currently not supported and call it a day.

@stephentoub what is your opinion?

@stephentoub
Copy link
Member

what are you planning to do with this PR? Do we want it in .NET 9?

I wanted to have it merged for 9, but as @tmds has pointed in #96711 (comment) the current implementation is not going to work for async handles on Linux. Making it work is possible, but would require a lot of work (mostly refactoring the epoll-related code and re-using it here). On the other hand, we could just document this limitation as currently not supported and call it a day.

@stephentoub what is your opinion?

Today the methods don't work with any non-seekable file descriptors. With this change, as far as I can tell we're making the existing methods work with more file descriptors and without any take backs. Even if there are still more beyond that we could support in the future, this seems to me like moving in the right direction.

Up to you what you want to do with it.

@adamsitnik adamsitnik modified the milestones: 9.0.0, 10.0.0 Jul 31, 2024
@adamsitnik adamsitnik added the blocked Issue/PR is blocked on something - see comments label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO blocked Issue/PR is blocked on something - see comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relax RandomAccess type requirements, make all Read*|Write* methods work with non-seekable files
5 participants