Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last chunk is bigger then configured chunkSize #51

Closed
sreuter opened this issue Mar 14, 2013 · 20 comments
Closed

Last chunk is bigger then configured chunkSize #51

sreuter opened this issue Mar 14, 2013 · 20 comments

Comments

@sreuter
Copy link

sreuter commented Mar 14, 2013

While enforcing the maximum chunkSize on a Server-Side (node.js) I realized that the last chunk is often bigger than the configured chunkSize. The source code states this is how it is supposed to be at https://github.com/23/resumable.js/blob/master/resumable.js#L259

Why are we doing this? And if there is a good reason for it, shouldn't we mention the behavior in the documentation?

@steffentchr
Copy link
Member

The reason for this is related to the prioritizeFirstAndLastChunk option. This lets you receive the first and the last part of a file for verification of meta data, and in those cases it's necessary to know at both of these parts are at least chunkSize big. This makes it handy for files with meta data in the end of the file (think id3 tags on an mp3 file).

(Resumable.js was originally written for us to handle video uploads to 23 Video and in those cases we actually review whether a file is a valid video file already when the first and last chunk are uploaded. For this, we need all meta data for the file to be available -- and not just a last bit of only a few bytes.)

@sreuter
Copy link
Author

sreuter commented Mar 14, 2013

Guessed so and truly understand, as we're also handling video uploads / encoding for our service. Still, I think this should be more configurable. I don't think that I need 8MB of the file end for the proper (MOOV/id3 tag) detection. Also, this routine should only be enforced when prioritizeFirstAndLastChunk is set to true and should be well documented, don't you think?

@steffentchr
Copy link
Member

I don't actually disagree about the configurability, but opted against it in the implementation -- for the simple reason that having it configurable on the frontend would also mean that it had to be configurable in any and all backend implementations (the node/express lib, the python/django lib, the php lib and so on). To me, this was too high a price to pay for the feature.

As a side-note, you're absolute right: 8 MB chunks are probably too big; I usually opt for 1 MB chunks, making the trailing meta data chunk between one and two megs.

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

I find this behavior highly confusing, too. Could you elaborate in which way this would have to be configured server-side?

Wouldn't this be solvable by:
a) Really just triggering this when prioritizeFirstAndLastChunk is set and document this along with the option.
b) Add a lastChunkSize option with a sane default to make this behavior more explicit.
c) When prioritizeFirstAndLastChunk is set and the last chunk is smaller than max, upload the two last chunks. This would allow for analysis while the server would still be able to store this in a chunk-sized storage system without breaking the chunks apart himself.

I personally would opt for b) or c), but I'm sure there are many more possibilities to solve this.
In the current state this really can be confusing and also unnecessarily complicate some use cases when this is not configurable.
So let's please discuss this.

@steffentchr
Copy link
Member

"Could you elaborate in which way this would have to be configured server-side?"

Because the server needs to verify the chunk size -- so if the size varies by client-side configuration we would also need to have a server-side configuration. This is something to avoid. I agree with you that there are alternative defaults, but not that these would make enough of a difference to warrant breaking existing implementations of server-side implementations.

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

I understand you don't want to break existing server-side implementations, could we add an opt-in to make this more sane (e.g forceChunkSize)?

@steffentchr
Copy link
Member

We can change behaviour, sure -- but let me try to understand the issue in having a bigger-than-chunkSize chunk before then. Is it merely a size validation issue or are there more pressing concerns?

Originally, I reasoned that the *chunks are always chunkSize or bigger" made good sense for optimizing upload times because you could have cases of an upload of only a few bytes, but I agree that the potential gains of that are negligible.

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

The issue we are having specifically are a server-side enforcement of the maximum chunk size along with a chunk-based storage system with a fixed block size. Server-side chunk splitting or usage of a greater block size are unfortunately not possible.

Alongside with that -- more ideologically -- you have to admit the current behavior is rather not what one would expect when explicitly setting a chunk size, don't you?

@steffentchr
Copy link
Member

Alongside with that -- more ideologically -- you
have to admit the current behavior is rather not
what one would expect when explicitly setting a
chunk size, don't you?

Of course, you're exactly right about that. I agree that the original choice could have been made better; only question is whether the mistake is big enough to warrant breaking changes, or just needs better documentation.

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

I don't understand. How would e.g. adding a forceChunkSize option with default false be a breaking change?
And yes -- better documentation about that should be done anyway.

@steffentchr
Copy link
Member

That wouldn't be breaking, only question is whether to set it to true by default to make for a logical future.

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

When aiming for a logical future interface, I still think we should send the last chunk >=chunkSize only when prioritizeFirstAndLastChunk is set. It's hard to imagine an existing server-side implementation would assume the last chunk to be >=chunkSize except when using the prioritizeFirstAndLastChunk. Yes, this would theoretically be a breaking change, but I expect the percentage of implementations it would possibly break to be very low.

@steffentchr
Copy link
Member

Lets go with your suggestion. So:

  • Keeping the current behaviour by default, even if prioritizeFirstAndLastChunk is set to false.
  • Then we'll add forceChunkSize (defaulting to false). In the default mode, the last chunk will be between chunkSize and 2*chunkSize bytes. When set to true the last chunk will be sized between 0 bytes and chunkSize.

Would you want to make the changes and send a pull request?

@srijs
Copy link
Contributor

srijs commented Mar 18, 2013

See #56

@talha-asad
Copy link

Impressive work, however a very bad default behaviour regarding last chunkSize in my opinion. Perhaps slowly we can fix this. :)

@steffentchr
Copy link
Member

It's a very important default for our use case -- we need a sizeable chunk of the file being uploaded to determine whether it's a supported video file. So wouldn't call it very bad.

@talha-asad
Copy link

I do understand your use case, but this library is standalone and it seems unreasonable to have that behaviour as its default. I think the general idea is vice versa.

@pacificsoftdevld
Copy link

Hi all.
in r.on('fileAdded', (file, event) => {});
I want get content to create md5 hash and add it into query of this file.
Please help me get content file.

@cpnielsen
Copy link

@pacificsoftdevld Please don't hijack other issues with unrelated comments (I can see you also created an issue, which is the right approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@steffentchr @sreuter @cpnielsen @srijs @talha-asad @pacificsoftdevld and others