New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filedepot: store file data as storage-configured blobs #361

Merged
merged 65 commits into from Jan 30, 2015

Conversation

Projects
None yet
5 participants
@tiberiuichim
Copy link
Contributor

tiberiuichim commented Dec 13, 2014

@tiberiuichim writes:
Open issues:

  • No work done on UploadedFile.public_url integration, this means that filedepot's middleware is not usable right now - because no links are generated that would use it. I have created the UploadedFileResponse class which can be used to stream files to the browser. I don't think that we should make the middleware a default for Kotti, that would mean all files are openly available, without any security. An option would be to use the UploadedFileResponse to send a redirection to the public_url, where it is set by the depot that stores the files. The problem is, the proper filedepot way to get a file's url is to call DepotManager.url_for(path), which calls DepotManager.get_middleware().url_for(path) where path is 'depotname/file_id'.
  • We need a kotti.depot_default configuration option. I'm not sattisfied with the way things are now: we're adding the default 'fs0' depot, but once in use, you can't change its backend (or those files would stop working without a migration). Because I've added that option to kotti's conf_defaults, you can't really get rid of it anyway - you can only change its backend class by adding another kotti.depot.fs0.backend option in the .ini file. I have assumed that the default for multiple configured backends could be controlled by the order in which they are specified in the .ini files, but it seems that pyramid likes to sort the settings, so that can't be relied.
  • Maybe the File.data column should be renamed to File._data. This way we can have a setter for it and do the conversion to cgi.FieldStorage for the incoming data
  • The following documentation sections need changes and updates:
  • Configuration http://diskotti.readthedocs.org/en/feature-rtd_theme_339/developing/basic/configuration.html
  • Blob-Storage http://diskotti.readthedocs.org/en/feature-rtd_theme_339/developing/advanced/blobstorage.html
  • Maybe a separate entry in the Developer manual, detailing how to create and serve new blob fields.
@dnouri

This comment has been minimized.

Copy link
Member

dnouri commented Dec 15, 2014

Hey Tibi -- thanks for this pull request. Would you mind updating the description with the remaining things that still need to be done or discussed?

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 15, 2014

Open issues: [...]
(Updated the description with this)

@dnouri dnouri changed the title Filedepot [WIP] Filedepot Dec 15, 2014

@dnouri dnouri modified the milestone: 1.1 Dec 15, 2014

@disko

This comment has been minimized.

Copy link
Member

disko commented Dec 15, 2014

Maybe the File.data column should be renamed to File._data. This way we can have a setter for it and do the conversion to cgi.FieldStorage for the incoming data.

👍 This is what I had done for the now removed blobstore fetaure: d01a6c7#diff-090c6d6f97cd0028f14840e7b7f6aa94L635

We need a kotti.depot_default configuration option. […] I have assumed that the default for multiple configured backends could be controlled by the order in which they are specified in the .ini files, but it seems that pyramid likes to sort the settings, so that can't be relied.

How about having a kotti.depots option instead that takes a list? List item order should be preserved IIRC. The list items could be some kind of URLs that would contain all information needed (to avoid multiple settings per depot).

No work done on UploadedFile.public_url integration […]

Piping everything through Kotti (especially to ensure permission checks) would be fine with me for the first iteration.

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 15, 2014

About the kotti.depot configuration: some of them would be a bit hard to translate to a URI schema without escaping, for example this bit of code from depot's gridfs tests:

GridFSStorage('mongodb://localhost/gridfs_example', 'testfs')

Some other options that I can think of:

  • kotti.depot points to a json or YAML file
  • kotti.depot points to a separate function, similar to a kotti.configurator
@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 15, 2014

Piping everything through Kotti (especially to ensure permission checks) would be fine with me for the first iteration.

Yes, this is my favorite too. I think redirection from the UploadedFileResponse to the field's public_url would also be a good idea (and it's the best possible choice, with the current codebase, in the case of a cloud such as S3). The biggest concern, from my perspective - that of loading huge files in memory - has been solved through the streaming that the UploadingFileResponse does.

In case you have better ideas about naming classes and variables (or any other changes), please comment them on the pull request.

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 16, 2014

Let's take a decision regarding the configuration

No 1. The current sistem of kotti.depot.uniquename.backend + another separate kotti.default_depot. This assumes that there's a default fs0 configured.

Pros: easy to implement
Cons: Integrator need to understand how to configure it, also how to override default store

No 2. A configurator function

Pros: easy to implement. Easy to configure for integrators. Makes overriding default depot easier.
Cons: another piece of code for the integrators to write. This is configuration, it should sit in a configuration file - although kotti.configurators are a normal thing with Kotti

No 3. kotti.depot points to yml file

Pros: easy to configure for integrators. Easy to override default depot
Cons: separate file that needs to be taken care of, integrated to code generators

No 4. No default at all. Raise an error when starting. Use a kotti.depot.backend and kotti.depot.optionA configuration in ini files. Kotti will support only one active depot and it's the job of the integrator to migrate between storages.

So far, I like option 4 the best (or maybe a variation). The thing is, the possibility to upload to a different store at runtime is only theoretical - there's no integration done yet, so it makes sense to have only one depot, but take advantage of filedepot to offer flexibility in how those files are stored. It makes configuration easy - even if we include a default, it is easy to override.

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 17, 2014

Based on discussions with disko, this is what we want:

  • we want to have an open future, where we have multiple active depots (for example, configure a model to save its files in a different depot from the default).
  • in the kotti.default_config we will have:
kotti.depot.0.backend = kotti.filedepot.DBFileStorage
kotti.depot.0.name = dbdepot
  • any number of depots can be further configured, and any number of keyword options added to each:
kotti.depot.1.backend = depot.io.gridfs.GridFSStorage
kotti.depot.1.name = gridfs
kotti.depot.1.gridout = mongodb://localhost/gridfs_example
  • the default depot will always be configured using the depot configured at position 0 with kotti.depot.0[backend|name|option]
  • a migration tool can then be used to move files between storages and update the metadata in the UploadedFileField stored metadata. This may be a bit tricky because there's no defined protocol for a FileStorage to enumerate its files - so we would have to find all fields of type UploadedFileField, read their metadata and see if they match the source depot name.
kotti-migrate-files —from-storage=0 —to-storage=1 config.ini
@disko

This comment has been minimized.

Copy link
Member

disko commented Dec 17, 2014

a migration tool can then be used to move files between storages and update the metadata in the UploadedFileField stored metadata. This may be a bit tricky because there's no defined protocol for a FileStorage to enumerate its files - so we would have to find all fields of type UploadedFileField, read their metadata and see if they match the source depot name.

In Kotti itself we only have kotti.resources.File.data, so this could be the only one supported by the migration script, especially as all subclasses would be covered as well. If someone decided to have a UploadedFileField in their own models they'd need to take care of migration themselves. Would be good enough for me. Alternatively the script could have an option to specify the columns to be migrated (additionally), I see no need to autodiscover them. This should make implementation considerably easier.

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 19, 2014

Taking a look through the documentation available here: http://diskotti.readthedocs.org/en/feature-rtd_theme_339/index.html

(PR description updated)

@disko

This comment has been minimized.

Copy link
Member

disko commented Dec 20, 2014

The following documentation sections need changes and updates:

Configuration http://diskotti.readthedocs.org/en/feature-rtd_theme_339/developing/basic/configuration.html

Blob-Storage http://diskotti.readthedocs.org/en/feature-rtd_theme_339/developing/advanced/blobstorage.html

You're referring to an outdated version of the docs. The current version doesn't contain the sections / documents any more.

Maybe a separate entry in the Developer manual, detailing how to create and serve new blob fields.

Yes, please. I'd even say this is a must. :)

@tiberiuichim

This comment has been minimized.

Copy link
Contributor

tiberiuichim commented Dec 24, 2014

I've added documentation, fixed some stuff along the way. What else is needed? Also looking for feedback to the code itself.

@dnouri

This comment has been minimized.

Copy link
Member

dnouri commented Jan 3, 2015

Looking over the changes, I think you've done some great work here @tiberiuichim! @disko, anything else that needs to be done? Would we still want to include this in the 1.0 release?

a variety of ways: filesystem, GridFS, Amazon storage, etc.

By default :app:`Kotti` will store its blob data in the configured SQL
database, using :class:``~kotti.filedepot.DBFileStorage`` storage, but you can

This comment has been minimized.

@disko

disko Jan 5, 2015

Member

These need to be single ticks.


As always when dealing with migrations, make sure you backup your data first!

.. _filedepot: https://pypi.python.org/pypi/filedepot/

This comment has been minimized.

@disko

disko Jan 5, 2015

Member

Great docs!

@disko

This comment has been minimized.

Copy link
Member

disko commented Jan 5, 2015

I can only agree: great work @tiberiuichim, also on the docs! Please add a few lines to the changelist though. At first glance I cannot see anything missing.

However: I'd prefer to release 1.0.0 unchanged in the next days (if no more issues arise) and have a 1.1.0-alpha with this feature shortly after that. Let's stick to not adding features between alpha and final releases and release more often instead, shall we (@dnouri)?

@dnouri

This comment has been minimized.

Copy link
Member

dnouri commented Jan 5, 2015

@disko: Fully agree; let's do the 1.0 soon, and put this into 1.1.

@tiberiuichim tiberiuichim changed the title [WIP] Filedepot Filedepot: store file data as storage-configured blobs Jan 26, 2015

@disko

This comment has been minimized.

Copy link
Member

disko commented Jan 29, 2015

@tiberiuichim can you please try to rebase again (or merge if that fails ;))? TIA.

tiberiuichim added some commits Dec 15, 2014

Move last_modified modification after setting the data, as that
automatically sets the last_modified value
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 29, 2015

Coverage Status

Coverage increased (+0.34%) to 94.66% when pulling 2a5bbb0 on tiberiuichim:filedepot into 976f5e8 on Kotti:master.

disko added a commit that referenced this pull request Jan 30, 2015

Merge pull request #361 from tiberiuichim/filedepot
Filedepot: store file data as storage-configured blobs

@disko disko merged commit da4ca61 into Kotti:master Jan 30, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@dnouri

This comment has been minimized.

Copy link
Member

dnouri commented Jan 30, 2015

W00t, thanks!

@tonthon

This comment has been minimized.

Copy link
Contributor

tonthon commented Feb 21, 2015

impressive job has been made on this feature, congratulations !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment