Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams not quite working correctly #6

Closed
judgej opened this issue Nov 21, 2019 · 4 comments
Closed

Streams not quite working correctly #6

judgej opened this issue Nov 21, 2019 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@judgej
Copy link
Member

judgej commented Nov 21, 2019

Was trying to copy a file between two separate disks on Laravel:

        $destinationFileSystem->writeStream(
            $filename,
            $sourceFileSystem->readStream($filename)
        );

Error from Azure is:

Code: 400
Value: An HTTP header that's mandatory for this request is not specified.
details (if any): <?xml version="1.0" encoding="utf-8"?><Error><Code>MissingRequiredHeader</Code><Message>An HTTP header that's mandatory for this request is not specified.

Workaround in my application for now:

        $destinationFileSystem->put(
            $filename,
            stream_get_contents($sourceFileSystem->readStream($filename))
        );

My files are limited in size, so this is not such a problem. It would still be nice to get to the bottom of this issue though, albeit not with much urgency.

Given that workaround, the stream reading is working, but the stream writing is not.

This would be a simpler line than the stream reading, but I've left it in as a reminder:

$sourceFileSystem->read($filename)
@judgej judgej added the bug Something isn't working label Nov 21, 2019
@judgej judgej self-assigned this Nov 21, 2019
@judgej
Copy link
Member Author

judgej commented Nov 25, 2019

I'm sure this used to work, but now throws "missing header" exceptions at the HTTP level communicating with Azure. That threw me quite a lot.

It appears that the read stream the API provides no longer can be stated, i.e. details about its length is not available. When writing a stream to the API, it expects to know the content length from the start. The MS library was assuming the length was available just because a resource was provided, when in fact the length was unknown - null. This caused the request to the stream writing to the API to be corrupted - it had a null length. And that was throwing up all sorts of obscure errors.

@judgej
Copy link
Member Author

judgej commented Nov 25, 2019

The read streams provided by the MS library are now copied to php://temp streams before returning to the caller.

This does mean the full file is read from the API to the local server. Using the temp stream allows that to not cause memory issues. The temp stream will read the first 2Mbyte into memory, then fall back to local disk for any remaining bytes.

An integration test has been added to demonstrate a streaming copy from one file to another.

@judgej
Copy link
Member Author

judgej commented Dec 2, 2019

Please note: there is something very wrong withing the Azure file storage API, and I don't believe it is the PHP library that uses it. A very wide class of errors, such as a simple additional "." on the end of a path, makes Azure file storage through what can only be described as a "wobbly". It reports totally irrelevant errors concerning missing headers, invalid authentication etc. This is a big gotcha if you are not aware of this, and it can lead you down the wrong path diagnosing bugs for days if not careful.

@judgej
Copy link
Member Author

judgej commented Dec 5, 2019

This workaround is going to be permanent, unless the MS filesystem library changes. Basically the API stream writer needs a resource property that the API stream reader does not provide, so they cannot be piped back-to-back. The workaround puts an intermediate local memory+disk streaming cache in between the two.

Maybe a better way would be to put a check on the wrapper to the API writer. If the stream being passed in is statable, then pass it through, if not, then wrap the resource with a statable local resource (php:\\temp). The API stream reader does not need to be wrapped like this as a matter of course (as this workaround does) because not all applications reading the stream need to know how long it will be before reading it. Some may, and they can deal with that themselves. This performance enhancement would remove - effectively - a layer of caching taking up memory and possibly disk too. However, our use-case is tiny files, so not worth the effort for now, but if anyone else wants to tackle this, perhaps if they are streaming massive files direct to local disk, feel free to do so.

@judgej judgej closed this as completed Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant