New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side image upload refactor #635

Closed
ErisDS opened this Issue Sep 5, 2013 · 17 comments

Comments

Projects
None yet
6 participants
@ErisDS
Member

ErisDS commented Sep 5, 2013

At the moment, Ghost image uploads are thrown straight at the local file system. This is great and exactly what we want for standard Ghost - no config of an external system required.

However, it does pose a problem for hosting. Most node hosting platforms do not persist the local file system, and this will be true for our own platform.

The server-side part of accepting and handling uploads, along with the retrieval of images needs to be refactored such that the local file system handler is a module and is the 'default' image upload handler.

The handler should support upload/post requests, retrieve/get requests, and delete requests, but not modify. The handler should store the files in an immutable fashion, using a unique filename (closing #619).

Ghost should then be wired up in such a way that if another handler is configured it will use that, but otherwise it will use the local file system handler by default for all requests to store or retrieve an image (there's no concept of delete in the UI yet).

We will also need to implement a different file upload handler, but the details of what it should do, and whether or not it should be core or a plugin are not yet confirmed.

@jamesbloomer

This comment has been minimized.

Show comment
Hide comment
@jamesbloomer

jamesbloomer Sep 26, 2013

Member

A few approaches could be taken.

1 - The easiest is to assume that any images will always be available via a normal url. This would be applicable for the local file system or something like S3. In this case the retrieve method doesn't need to be implemented, it's just a url. The upload method needs to put the file wherever and return the correct url, the delete likewise. In this case the admin controller upload function needs to forward to the handler.

2 - If however the uploaded image is not available via a static url, maybe in a database somewhere, then the image will have to be streamed down to the client. The difficulty here is to try and enable some caching and the load on the server may be higher, see here for a discussion.

3 - Also a hybrid approach could be taken, using a dynamic route in express, then using res.sendfile to return the file. This has the advantage of allowing some dynamic logic but also caching, see res.sendfile. Although with this approach it looks like the file still has to be on disk for express.

The first approach is the easiest but makes some assumptions that the files will always be accessible in a certain way.

Member

jamesbloomer commented Sep 26, 2013

A few approaches could be taken.

1 - The easiest is to assume that any images will always be available via a normal url. This would be applicable for the local file system or something like S3. In this case the retrieve method doesn't need to be implemented, it's just a url. The upload method needs to put the file wherever and return the correct url, the delete likewise. In this case the admin controller upload function needs to forward to the handler.

2 - If however the uploaded image is not available via a static url, maybe in a database somewhere, then the image will have to be streamed down to the client. The difficulty here is to try and enable some caching and the load on the server may be higher, see here for a discussion.

3 - Also a hybrid approach could be taken, using a dynamic route in express, then using res.sendfile to return the file. This has the advantage of allowing some dynamic logic but also caching, see res.sendfile. Although with this approach it looks like the file still has to be on disk for express.

The first approach is the easiest but makes some assumptions that the files will always be accessible in a certain way.

@jamesbloomer

This comment has been minimized.

Show comment
Hide comment
@jamesbloomer

jamesbloomer Sep 28, 2013

Member

Anyone else started doing anything with this? Got time to start extracting the local file system storage, can then push to a branch for other people to work on.

Member

jamesbloomer commented Sep 28, 2013

Anyone else started doing anything with this? Got time to start extracting the local file system storage, can then push to a branch for other people to work on.

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Sep 28, 2013

Member

No one has picked it up just yet, I can create a branch and pick it up from you 👍

Member

ErisDS commented Sep 28, 2013

No one has picked it up just yet, I can create a branch and pick it up from you 👍

@jamesbloomer

This comment has been minimized.

Show comment
Hide comment
@jamesbloomer

jamesbloomer Sep 28, 2013

Member

Sounds good.

Member

jamesbloomer commented Sep 28, 2013

Sounds good.

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Sep 28, 2013

Member

Branch is called image-refactor - just so it's easy to find ;)

Member

ErisDS commented Sep 28, 2013

Branch is called image-refactor - just so it's easy to find ;)

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Sep 29, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Sep 29, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Sep 29, 2013

ErisDS added a commit that referenced this issue Sep 29, 2013

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

ErisDS added a commit that referenced this issue Sep 29, 2013

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Sep 29, 2013

Member

Progress on this issue exists on the image-refactor branch. There is a somewhat related issue open #937.

Member

ErisDS commented Sep 29, 2013

Progress on this issue exists on the image-refactor branch. There is a somewhat related issue open #937.

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 2, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 2, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 2, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 3, 2013

@matteocrippa

This comment has been minimized.

Show comment
Hide comment
@matteocrippa

matteocrippa Oct 19, 2013

'll try to give my 2 cents about this, probably you will have to point out the two most common situation when Ghost is hosted in a not writable fs hosting:

  • user has got a S3 repo (or similar), so you upload your files on his/her bucket;
  • user without any S3 or similar... this is probably the worst situation, I remember that there are some ports of Wordpress for Heroku that make use of Postgres and the images/pdf/attachments are saved inside the DB as blob.

Dunno if you plan to have this already in place now, but probably it would be interesting to allow user to pickup their own way allowing to use a plugin to re-route in the best way their uploads.

Eg. (I admit prob I don't know if this is still allowed) Some user may prefer to use Dropbox to store the public attachments.

matteocrippa commented Oct 19, 2013

'll try to give my 2 cents about this, probably you will have to point out the two most common situation when Ghost is hosted in a not writable fs hosting:

  • user has got a S3 repo (or similar), so you upload your files on his/her bucket;
  • user without any S3 or similar... this is probably the worst situation, I remember that there are some ports of Wordpress for Heroku that make use of Postgres and the images/pdf/attachments are saved inside the DB as blob.

Dunno if you plan to have this already in place now, but probably it would be interesting to allow user to pickup their own way allowing to use a plugin to re-route in the best way their uploads.

Eg. (I admit prob I don't know if this is still allowed) Some user may prefer to use Dropbox to store the public attachments.

@jamesbloomer

This comment has been minimized.

Show comment
Hide comment
@jamesbloomer

jamesbloomer Oct 21, 2013

Member

Almost ready to push the S3 storage version, all pretty straight forward. Undoubtedly there will be other types of storage that people want to use.

Member

jamesbloomer commented Oct 21, 2013

Almost ready to push the S3 storage version, all pretty straight forward. Undoubtedly there will be other types of storage that people want to use.

@hugo

This comment has been minimized.

Show comment
Hide comment
@hugo

hugo Oct 21, 2013

Contributor

Currently it looks like Ghost will act as a relay for any image storage service i.e. the image will be uploaded to Ghost and then to S3/Azure/etc. Is the plugin system flexible enough that eventually this could be refactored to enable direct upload from clients to file storage services (S3, Azure and GCS all provide this) without hitting the Ghost server.

Things to worry about here that I see (probably there are more):

  • File hash naming scheme is prohibitively hard[1]
  • more places that connection failure/high latency/etc. can cause issues[2]

[1] You can get the hash from S3, but then you have to relate hashes to images somewhere, so you've added a lot of work and an extra database field, a least, to avoid a hopefully uncommon situation of double uploads of the same file.

[2] You have to get the filename from the client, check with the file store if it exists, or what the next available increment-appended name is, get an endpoint URL, trigger the upload from the client, verify the upload completed when the client says it did, and store the final URL somewhere.

Contributor

hugo commented Oct 21, 2013

Currently it looks like Ghost will act as a relay for any image storage service i.e. the image will be uploaded to Ghost and then to S3/Azure/etc. Is the plugin system flexible enough that eventually this could be refactored to enable direct upload from clients to file storage services (S3, Azure and GCS all provide this) without hitting the Ghost server.

Things to worry about here that I see (probably there are more):

  • File hash naming scheme is prohibitively hard[1]
  • more places that connection failure/high latency/etc. can cause issues[2]

[1] You can get the hash from S3, but then you have to relate hashes to images somewhere, so you've added a lot of work and an extra database field, a least, to avoid a hopefully uncommon situation of double uploads of the same file.

[2] You have to get the filename from the client, check with the file store if it exists, or what the next available increment-appended name is, get an endpoint URL, trigger the upload from the client, verify the upload completed when the client says it did, and store the final URL somewhere.

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 22, 2013

refactoring image storage and implementing S3 storage
part of #635
- in order to crystallise whether local storage module was correct it was easiest to implement S3 module
- config is currently only in development section
- implemented as a config option rather than a plugin, as discussed
- may want to restructure config depending on what other storage comes along
- the S3 bucket must have read permissions as the link is direct to the bucket
- used aws-sdk for S3 interaction because:
	- it is by Amazon
	- it is active
	- it has all AWS services
	- it has built in retries
	- I have used it since early versions with no issues

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 23, 2013

ErisDS added a commit to ErisDS/Ghost that referenced this issue Oct 24, 2013

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

ErisDS added a commit to ErisDS/Ghost that referenced this issue Oct 24, 2013

ErisDS added a commit to ErisDS/Ghost that referenced this issue Oct 24, 2013

jamesbloomer added a commit to jamesbloomer/Ghost that referenced this issue Oct 30, 2013

Allow setting of S3 credentials in config.js
part of #635
- refactor the structure of the config in the process

ErisDS added a commit that referenced this issue Oct 30, 2013

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

ErisDS added a commit that referenced this issue Oct 30, 2013

ErisDS added a commit that referenced this issue Oct 31, 2013

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

ErisDS added a commit that referenced this issue Oct 31, 2013

jptacek added a commit to jptacek/Ghost that referenced this issue Nov 4, 2013

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

jptacek added a commit to jptacek/Ghost that referenced this issue Nov 4, 2013

jptacek added a commit to jptacek/Ghost that referenced this issue Nov 4, 2013

@ghost ghost assigned ErisDS Nov 7, 2013

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Nov 7, 2013

Member

Wow. This issue is being referenced to death 😲

Just wanted to drop a note on the end of here to say I'm currently doing further work on abstracting the filesystem in Ghost. The intention is to remove the assumption that we use the local file system, and instead use a module which proxies to the local file system if nothing else is defined.

Expect to see stuff change over the coming days.

Member

ErisDS commented Nov 7, 2013

Wow. This issue is being referenced to death 😲

Just wanted to drop a note on the end of here to say I'm currently doing further work on abstracting the filesystem in Ghost. The intention is to remove the assumption that we use the local file system, and instead use a module which proxies to the local file system if nothing else is defined.

Expect to see stuff change over the coming days.

ErisDS added a commit to ErisDS/Ghost that referenced this issue Nov 10, 2013

SQUASHED Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module
- contains other commits to add cleanup and fixes from @ErisDS and @halfdan

ErisDS added a commit to ErisDS/Ghost that referenced this issue Nov 10, 2013

SQUASHED Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module
- contains other commits to add cleanup and fixes from @ErisDS and @halfdan

Conflicts:
	core/server.js

ErisDS added a commit to ErisDS/Ghost that referenced this issue Nov 11, 2013

image upload controller refactor
issue #635

- upload controller shouldn't assume fs
- filesystem module proxies all the fs work
- proxies and exposes middleware for serving images
- creating a date based path and unique filename is a base object util
- unit tests updated
@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Nov 12, 2013

Member

Consider this complete for now.

Member

ErisDS commented Nov 12, 2013

Consider this complete for now.

@ErisDS ErisDS closed this Nov 12, 2013

@jamesbloomer

This comment has been minimized.

Show comment
Hide comment
@jamesbloomer

jamesbloomer Nov 14, 2013

Member

Depending on what your definition of proxy is on the previous comment...
When I started this I had a look at using Ghost to proxy to an external storage location eg. S3, rather than expose the image directly from S3 as a link. It's definitely possible but has the downside of double the bandwidth, the image from S3 to the Ghost server, then the image to the client. Worth bearing in mind.

Member

jamesbloomer commented Nov 14, 2013

Depending on what your definition of proxy is on the previous comment...
When I started this I had a look at using Ghost to proxy to an external storage location eg. S3, rather than expose the image directly from S3 as a link. It's definitely possible but has the downside of double the bandwidth, the image from S3 to the Ghost server, then the image to the client. Worth bearing in mind.

@thecolorblue

This comment has been minimized.

Show comment
Hide comment
@thecolorblue

thecolorblue Nov 14, 2013

Is anyone else working on another storage method? I might start on implementation for modulus.io.

thecolorblue commented Nov 14, 2013

Is anyone else working on another storage method? I might start on implementation for modulus.io.

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@ErisDS

ErisDS Nov 15, 2013

Member

We haven't finished work on our App boilerplate yet, so just now there is no way to override the storage method other than hacking Ghost. Once the App stuff is officially launched adding this kind of thing will be MUCH easier, you can see progress in #1474.

Member

ErisDS commented Nov 15, 2013

We haven't finished work on our App boilerplate yet, so just now there is no way to override the storage method other than hacking Ghost. Once the App stuff is officially launched adding this kind of thing will be MUCH easier, you can see progress in #1474.

ErisDS added a commit that referenced this issue Nov 21, 2013

SQUASHED Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module
- contains other commits to add cleanup and fixes from @ErisDS and @halfdan

Conflicts:
	core/server.js

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014

Moving file system storage to a module
issue #635

- refactored file system storage into module
- convert save to return a promise
- convert admin controller to use storage module

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014

image upload controller refactor
issue #635

- upload controller shouldn't assume fs
- filesystem module proxies all the fs work
- proxies and exposes middleware for serving images
- creating a date based path and unique filename is a base object util
- unit tests updated
@bioball

This comment has been minimized.

Show comment
Hide comment
@bioball

bioball Jul 13, 2015

Has any progress been made on this front?

bioball commented Jul 13, 2015

Has any progress been made on this front?

@ErisDS

This comment has been minimized.

Show comment
Hide comment
@bioball

This comment has been minimized.

Show comment
Hide comment
@bioball

bioball Jul 14, 2015

Ah great, thanks!

bioball commented Jul 14, 2015

Ah great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment