-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need sync compression, not just async #281
Comments
Hi, |
sync vs async: I tried (see the comment of 0ceb14c) but eventually removed the sync methods because it became increasingly hard to make everything work with async input and async output. asap in external: A Promise is an ES 6 object and we need a polyfill for older browsers. We let users override it because it directly affect the type of the output of JSZip: if someone uses an other Promise implementation (Bluebird for example) or an other Promise polyfill that is incompatible with the one we use, JSZip.external.Promise let him/her fix that.
Instead of letting people release Zalgo and make *async methods synchronous, I'd rather look for a robust way to support both sync and async methods. In the mean time, JSZip v2.6.0 still work synchronously. |
I'll come up with a proposal soon. |
Hi, |
Hi, I want to add some arguments for adding a synchronous mode. because I believe there are more than enough good reasons to have a synchronous mode available. Just stating that
is no good reasoning. From my perspective there are good use cases where an async model is counter productive or even bad, if not an anti-pattern:
Concerning the performance, async execution is always worse than straight sync execution, simply because all of the overhead the async model requires (making Promises, scheduling, creating and calling callbacks). So, especially the WebWorker and Commandline Interface cases are very good reasons to have a sync execution path. When it is totally OK, from an application perspective to run a task synchronously and just to wait for its result in place, it is bad to force asynchronous execution. It adds a lot of boilerplate and it changes and complicates the code design, that's when I call it an anti-pattern. I made a couple of APIs that support both async and sync execution. I build a helper for it (https://github.com/graphicore/obtainJS) which at least helps me organizing the similar and distinct pieces of a async/sync function. I don't want to get you into using it, especially because I think it needs a good amount of overhaul and some more tools to make it broader useful, also it's probably not the fastest possible implementation. However, I'm saying that it is possible without making your project unmaintainable and that its worth the effort. |
Hello, I have the exact same scenario as the OP. The scenario i have is a bit sticky but i need to compress data and store it in memory Synchronously. |
IMHO sync methods are useful in node only (but i agree those can be useful there). It worth to split entry points for node and for browserified builds ( |
Not only in node. Were on the client side in Angular. Edit some clarification of the use case: Keeping the unzipped data in memory can end up with data duplication nightmares. So we just keep data in uncompressed state for now. |
@puzrin: Please explain. We're working here to give good reasons for a sync way to do zip stuff, also in browsers. I'd like to know if you have arguments to make these reasons void or if you just don't care about sync execution in browsers. |
Sync methods are also useful in browsers in the following situations:
|
@graphicore may be i'm not familiar with some use cases. The requirements of async/sync interfaces is determined by external components of data flow process. If browser has only async interfaces to load/store data, it would be strange to use sync interface under the hood.
Worker is async (post messaging) by design. You will have to pass/receive data in async way with no choice. So, sync jszip is not strict requirement here.
I agree in general. But final choice depends on all stages of data flow in your system. If the most of data blocks are async, and you try to cheat in single block with sync code, that can be a bad design. Problem with browsers is that data sources are async there (usually, callbacks or Well, anyway, i don't intended to troll anyone about "sync vs async" and so on. My point is, that it's technically difficult/impossible to combine such things in one bottle. Working solution could be to split top level API to separate files, because some parts (encoding, for example) are implemented completely different and some features may be not needed. The reason why i mentioned sync for node.js is because i know obvious use case for it - writing simple shell scripts. |
@puzrin |
I see at least 2 alternatives of workers use with jszip, modular and without requirement to sync interface:
Did not analyzed in details. IMHO in api design process, logic goes via such stages:
|
There's no inherent async external dependency in this thing.
That's wrong. I/O is usually better done async, like XMLHttpRequest or newer interfaces are often async only. But, look at a common example for unzipping: Using 'changeHandler' for a The problem here is not that we want to code around an inherently async interface. The problem is a long running thread, and if you ask me, forcing people into this kind of execution is bad design. And we know that it is forced, because just a version ago it ran perfectly fine as a sync call. Talking about this, I think the way the asynchronicity is created here is not by running any async code, but by chaining sync code via Promises. The promises themselves behave async when they are resolved. That sounds to me like a rather simple thing to turn into sync code again.
Just by neglecting it your "analysis" wont get any more true. There are many good cases where a much simpler sync call will suffice, it is enough to realize this. The other way around is forcing us users into a paradigm that is not justified.
I did it often. It's not that hard. You just have to organize your code in the right bite-size. In this case, the whole process could a big generator internally. The sync case is executed like this: function syncRunner(generator) {
var result;
while(!(result = generator.next()).done);`
return result.value;
} To make it async, if you want to do it with promises, call the generators function asyncRunner(generator) {
// note that in this case the first call to generator.next() is executed when asyncRunner
// is initally called. That's usually not a good thing for async code, because of error handling.
// I'd make an initial delay from the initial caller, so this implemantation can stay
// simpler. Or there is something simple we could do in here.
var result = generator.next();
if(result.done)
return Promise.resolve(result.value);
return Promise.resolve(generator).then(asyncRunner);
}
// run like asyncRunner(myGenerator()).then(onResult); http://jsbin.com/mefevivuxi/1/edit?js,console With this architecture, based on generators, we can even use the Also, the library user could be exposed to the generator directly and fully control how and when delays are needed. The Note that this is that easy because we have no real async interfaces in the pipeline.
Trade-offs are OK. Nobody expects that we live in a perfect world. Paternalism is not OK. |
AFAIK, you have no sync file access there. Async read via FileReader still required. Also that means a modern browser with typed arrays support (no need for some boring fallbacks like binary strings).
Yes, that's a often use case. But since XHR* is event-driven, it doesn't looks a problem to extract data in async way and rethrow event. I don't mean that sync method is not needed at all. I mean, in this case it does not seems to add principal difference.
Promises were added not for fun, but because some internal steps like File read in memory effective way are async. I quess, choice was to reject new features or reject sync api as it was in previous version.
That's normal. Because async API was required by new features like huge files streamed processing and not by desire to teach people programming. Sync methods need noticeable work to rewrite top level components "gluer", because it's no combineable with async one. But as everywhere in OSS, nobody prohibits to stay with previous version or do PR :). As far as i see, this issue is still not rejected. |
Yeah, you're right. But once you got the content that's over.
it was sync before this release, stop speculating please. They where added for no other reason than to make the long process not trigger a "the script it running forever" alert.
The other way around is true. You can't make a sync call from a real async one, but it's easy to make a sync call execute in an async manner, i.e. setTimeout.
Yeah, I might. But not without consent of the maintainer. Otherwise, without consent, I'd rather fork it an do my own implementation. |
I don't specilate, i explain what needs of 'sync' API means, as i understant it. It's technically impossible to move forward with big files support without async api. And thechnicaly impossible to provide both with zero efforts, because those have significat difference internally. May be authors have human resources for one task only, but that's not due wish to make your life difficult :) |
So what calls are responsible for this? And why not let the user decide
nobody wants that
I repeatedly offered my help. I'm waiting for a dialogue about that. update: I'm afk from now til sunday. I'll catch up later. |
"Let user decide" = provide different APIs with different functionality. I said about it in the first post, and explained steps how to split methods in optimal way. That includes check, if any code can be shared at all. In example with image size check it was more effective to do 2 independed implementations instead of try to reuse code (those are still in single package for "let users decide" :).
I'm not author of jszip and can't comment. I'm author of deflate/inflate used here, and my activity is usually related to bugs monitoring. We got a bug report for deflate, and i was checking if there are any related issues here. Also posted some replies if i had experience to share. |
There you have it! async function doMagic() {
let blob = await zip.generateAsync({type: blob()})
saveAs(blob, filename)
} Wasn't that hard. There is a way for async code to fell and operate kind of like synchronous code but still doesn't block. |
Thanks @jimmywarting, but this is not about syntactic sugar. |
Sorry for the lack of updates on this issue. JSZip v3 isn't async "just" to avoid the alert "the script it running forever"
You're right @puzrin :)
@graphicore Generators look nice and I totally see the advantages when reading
We will with #290 (don't resolve the full blob but read chunks only when I'm working on a fix for #290 which made implementing sync methods a bit |
btw, you have the sync blob reader inside workers if that counts for something... |
Sounds good to me. |
Guys, |
So there is still no way to unzip data synchronously, even when I have everything loaded in memory? |
There is jszip-sync on npm. |
I wrote my own ZIP reader in 50 lines of code (it supports only Deflate compression through pako.js, but it is enough in my case). |
@Stuk: Can we backport |
Hi,
thanks for this impressive work.
I'm struggling to use jszip in a scenario which requires synchronous execution.
While async calls are great for probably most use cases, they make it impossible to have an in-memory zipped piece of data (in the form of an ArrayBuffer) as a value.
There are many use cases where this is required, such as returning a buffer. This is simply impossible to implement using async functions.
Example:
I notice that JSZip.generate was removed in recent releases. Would it be possible to bring it back (together with a strong warning that this is not the preferred approach)?
Eric
The text was updated successfully, but these errors were encountered: