Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous backend file-uploading #58

Closed
mitom opened this issue Oct 8, 2013 · 5 comments
Closed

Asynchronous backend file-uploading #58

mitom opened this issue Oct 8, 2013 · 5 comments

Comments

@mitom
Copy link
Contributor

mitom commented Oct 8, 2013

#53 contains 2 issues, it will be easier to follow them like this.

An other option that came to my mind, which might fall out of the scope of this bundle and would only improve things for those with other than Local backends is that the tmp backend could be also used as a web-server too, and upload instead of actually uploading could just give them information of the file, such as the key and so. Maybe the upload should rather call a handler as the validation does (except using only one) and if there is no handler fall back to the default? Using that one could create a background job that would copy over the file to the desired location, looked it up in the database, changed the location for it and then removed it from temp.

@sheeep sheeep mentioned this issue Oct 8, 2013
7 tasks
@mitom
Copy link
Contributor Author

mitom commented Oct 14, 2013

I was thinking about this during #53 and now I sort of think that this falls out of the scope/is already covered in the bundle.

The only thing that we could offer - without enforcing a dependency on everyone - is a callback so people could implement their own way of moving files. Now the thing is, there is postPersist and postUpload, which are exactly that. They provide every information that is needed to schedule a job, or whatever one wants.

I wonder, do you know any use case where changes would be needed?

I know it was my suggestion in the first place, but if you don't know of any limitation/lack that would make the above unusable, I'd just close this one and put a note in the docs that explains how could one implement such a process.

@sheeep
Copy link
Contributor

sheeep commented Oct 14, 2013

I disagree on the out of scope/already implemented part. Only the possibility to implement it yourself is given. You're right that someone could use the given Event though to do exactly that.

This feature would be comparable to the provided upload validators (mime-type..). It's not crucial to provide them in the first place, but a nice one to have nonetheless. One could even say that the different direct implementations of the UploadControllers (Blueimp/Dropzone/FancyUpload..) are quite a bit of overhead.

There are however some differences:

  • I don't think that this feature will be used very often. Compared to validators (and controllers :)).
  • It's far more complex to implement an interface to background job workers than implement simple validation constraints.

But the usage of background jobs to move a file when using Gaufrette to a remote storage should be as easy as possible, because it should be the way to go anyway. And in the end the given features are what makes someone choose this bundle over another one.

[...] without enforcing a dependency on everyone

I don't think that it will lead to any enforced dependency. I imagine different EventListeners for different job backends (Resque, Gearman..) which only will be registered (and afterwards be usable) if the corresponding dependency is available. Enlarges only the suggest part in composer.json.

The only thing that we could offer is a callback so people could implement their own way of moving files. Now the thing is, there is postPersist and postUpload, which are exactly that.

The current controllers will automatically move the file to the permanent storage. I'd vote +1 for having at least the abstract possibility to implement it against a given interface (Event, callback, you name it). Having a specific implementation like Resque job enqueuer or the like is IMHO not needed for 1.0.

What do you think?

@mitom
Copy link
Contributor Author

mitom commented Oct 14, 2013

In short: I still hold up to my statement above.

In long:

One could even say that the different direct implementations of the UploadControllers (Blueimp/Dropzone/FancyUpload..) are quite a bit of overhead.

That was actually my first thought when I saw your bundle. I think design wise it would be way better to have a common bundle (including the abstract parts), and separate bundles for every frontend which would all only include 1 controller and has a dependency on the common one. It would help decouple a lot of things and remove that overhead, but this is just a side note.

This feature would be comparable to the provided upload validators (mime-type..).

I also disagree having those included. I think how things are checked there are really dependant on what sort of application are you using. In case you will sell some online storage based thing, the size calculation would be very important as well as content filtering so one would write their own listeners to do the job.

On the other hand if it's just some public service, or you provide infinite of space or whatever else, having listeners to do nothing would become an overhead, a small one but an overhead nonetheless. Obviously I'm not going to open issues or PR's that remove functionality from your bundle. :D

But this is once again, not relevant to the topic.

If you wish to, I'm be happy to discuss and help out with any change, idea, design, suggestion whatsoever that could make this bundle more robust, as is my goal (you may have noticed that nearly every suggestion of mine was performance related).

But the usage of background jobs to move a file when using Gaufrette to a remote storage should be as easy as possible, because it should be the way to go anyway. And in the end the given features are what makes someone choose this bundle over another one.

I agree that it should be as easy as possible, however it also should be as flexible as possible. Using an interface would mean it either had to be very extensive which implies it would need a huge amount of customization when implemented, or it would be rather limited. In the first case, one would be not much better off (if at all) than completely implementing it himself. Now that there is a StreamManager decoupled from the gaufrette storages since #53 anyone could just rely upon that to move files between filesystems staying consistent with the bundle's methods.

In the second case, I think it is never worth to introduce features causing regression and overhead in functionality.

I answer to why I disagree with using only gaufrette to move files later on

I don't think that it will lead to any enforced dependency. I imagine different EventListeners for different job backends (Resque, Gearman..) which only will be registered (and afterwards be usable) if the corresponding dependency is available. Enlarges only the suggest part in composer.json.

This is indeed true. However keep in mind that you would have to maintain all these things, alongside the controllers (or hope for PR's) to keep the functionality. I'd say that's quite the job, as I personally just had some serious dependency-hell with Resque and had to create my own repository of it (actually of it's symfony wrapper).

The above would mean if something slips by once, it will cause hundreds of application to malfunction, while if the implementation would be the user's task, something slipping by would case 1 application to fail and it wouldn't be related to this bundle either. Once again, decoupling things would help reduce the amount of error-possiblities.

The current controllers will automatically move the file to the permanent storage.

They will move the file to a storage configured by the users. It is not necessarily permanent either. Especially not permanent, if we are talking about moving the files further in any way.

If one would keep performance in mind I'd follow this scenario:

  1. Set up a shared storage and mount it using NFS as chunk/orphanage and file storage for the bundle.
  2. Choose any CDN or more robust file serving solution and configure it (be it gaufrette or not).
  3. Write a postPersist listener that will
    1. persist the files into the database in whataver form is needed
    2. schedules a job to move it
  4. Write a job in your choice of worker that will
    1. look up the file in the database
    2. locate it on the place set up in point 1.
    3. copy it to the place set up in point 2.
    4. update the database entry
    5. delete the file from the place in point 1.

This would be at the moment (in my knowledge) the most robust and scalable way of using this bundle. The upload process that happens in point 4. is also essentially different than the one happening when a user uploads a file. The first one is a safe process. It could skip a lot of checks (then again, overhead) depending on the implementations. There wouldn't be risk of losing a file, as even if the upload fails at some point, it is still stored intact. Information about the file is already decided, so everything there is a safe method too. It is basically just moving a file across 2 solutions.

One might even say, that Gaufrette could cause quite an overhead there too, relying on php. As it doesn't change anything how do you move files across filesystems only how you reach them, one may even want to leave gaufrette completely out and rely on a native program or php extension to do this. Another thing which would either fall out of the scope of the bundle, or would need a lot of work and probably would still not be able to cover every use case, or would end up causing more flaws than what it solves.

I do agree that people shall be given the right to break things and be dumb on their own accord, but that should not mean going out of your way to provide those options.

My conclusion is, if you see it as such things are essential, a plug-in like structure would be a better decision, where this bundle would turn into a small framework. Then there could be a bundle that includes the logic to handle the uploads in general, the chunk storage, the orphanage, naming, validation and event listeners. Then each frontend could get it's specific bundle containing logic limited to them, which would mean those could get a bit more extensive. And then there could be functional extensions, such as wrappers and abstraction for things such as asynchronous backend uploading. This way the size of the bundle could get smaller for those who only need the core and one frontend, or none. And would still stay smaller to those using more frontends (which is retarded in one project I'd say). Also it would mean a good decoupled set of components.

I am in favor of that, however I'd still disagree with trying to abstract away different workers, as that sooner or later would mean one's functionality has to be limited to match the others (as is the case with gaufrette itself, see how dumb the local adapter is or with the controllers, not every one of them supports the same functionality so it's either not covering things, or mismatch.), and people would be back to having to implement it themselves.

I hope I'm not too selfish or irrational stating my points, it is quite late already. :) I also find it strange having to attack arguments I posed myself too. ^^

So, counter my arguments? :P

edit: Sorry for writing you a shorter essay.

@sheeep
Copy link
Contributor

sheeep commented Oct 15, 2013

I'll just start with what I think the most relevant point to this issue:

They will move the file to a storage configured by the users. It is not necessarily permanent either. Especially not permanent, if we are talking about moving the files further in any way.

Absolutely true. To be honest, the implications of this simple bold a completely convinced me. It is indeed better to let the user implement a queuable behaviour himself. But not the fact that it would be rather limited (or complex to implement) if provided an interface1 is superior to me.

Dependency hell / Maintainability: You're worried that if something slips by once, it will cause hundreds of application to malfunction and you're right. The main purpose of a suistanable and trustworthy bundle is its permanence. The deep conviction that once installed and configured, the only source of error is the application itself. But not its underlying third party bundles/parts.

Well, speaking of maintainability: You made some very interesting point regarding a more concise architecture of such a bundle. And I think this is directly related to this issue. What is a feauture, what is overhead2? This has to be defined. I would be more than happy to discuss significant architectural changes that implies the decoupling of all Controllers/Responses/Error handlers/Namers, yes, even the Gaufrette feature to detached bundles or bundle extensions. And I think before the answer to this question (more like: the answer to these questions) is not clear and fully discussed we should refrain from implementing more dependencies, more abstraction, more architectural changes.

For staying on topic, here is my suggestion (well, you already proposed this, so its yours technically :D).

I'd just close this one and put a note in the docs that explains how could one implement such a process.

If you have any insights in a actual implementation you could share, this would be awesome. An example - working or not - is always nice to have.

To cover the other topic related to this issue, I'll create an RFC issue.

So, counter my arguments? :P

Not today. :) I have to admit, this is exactly the reason I love open source contributions. Keep up the good work!


1 Don't think of it as a the PHP feature but rather a common point of the application to work with. The most appropriate thing would be an EventListener.
2 More precise: "What should be decoupled?".

@mitom
Copy link
Contributor Author

mitom commented Oct 15, 2013

If you have any insights in a actual implementation you could share, this would be awesome. An example - working or not - is always nice to have.

I will have to come up with one for my job, only later. If legal allows, I'll definitely let you know about the details so it can be added to the wiki, if not, well I'm sure there will be something I could share.

And I think before the answer to this question (more like: the answer to these questions) is not clear and fully discussed we should refrain from implementing more dependencies, more abstraction, more architectural changes.

If you wish, I am most every day online on #symfony on irc.freenode.net (though I run as milka there). You can hit me up for a bit faster and more direct way of communication, and later we can write an architecture document or the like with the proposed changes and reasons behind them, which could also make it easier for people to understand them, than having to read trough these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants