Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some suggestions for Gaufrette #8

Open
AntonStoeckl opened this issue Jul 8, 2011 · 18 comments

Comments

@AntonStoeckl
Copy link

commented Jul 8, 2011

Hi Antoine,

I recently found your nice project and I think I'll use it for further web projects (with Zend Framework).

Browsing through the sources I have some suggestions for you:

  1. Gaufrette\Adapter\Cache

The new source is read twice, which is an unnecessary overhead imho.
You could change that to:

public function read($key)
{
    if ($this->needsReload($key)) {
        $freshSource = $this->source->read($key);
        $this->cache->write($key, $freshSource);
        return $freshSource;
    }

    return $this->cache->read($key);
}
  1. Gaufrette\Adapter\AmazonS3

I have implemented some AmazonS3 stuff in running projects and from that experience I think you should add the possibility to add meta data in the write operation, like this (I'm using putFile there but putObject also accepts $meta):

    $meta = array(
        'Expires' => 'Fri, 5 Mar 2021 00:00:00 CET',
        'Cache-Control' => 'max-age=315360000',
            'x-amz-storage-class' => 'REDUCED_REDUNDANCY',
    );

    $this->_s3->putFile($file_path, $object_name, $meta);

I'm aware that this conflicts with the Adapter's Interface, as other stores don't have such meta data.
It could be an optional parameter in the Interface or you could add another Adapter AmazonS3Web that extends AmazonS3.
That way it would be possible to deliver media assets from S3 to the visitors of the webpage very easily. The 'x-amz-storage-class' meta is not tied to that usage, you simple want to store quite a lot of data with reduced redundancy because it is way cheaper on Amazon!

  1. Slow adapters in general:

I think there could to be another cache which stores if an object is available (and not expired) in the slow store. This would save many expensive connections to the slow store. I'm using a memcache backend with Zend_Cache in my project for that, but surely a file cache would also be possible.

Let me know that you think!

Best regards, Anton

@Herzult

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2011

Hi @AntonStoeckl,

First, thank you for the suggestions.

I'll work a bit on Gaufrette today and I think I'll especially try to find a clean way to allow you pass attributes such as the ones for AmazonS3.

It would be nice if you could work on the others and make PRs for them.

Sorry for the time it took me to answer.

@AntonStoeckl

This comment has been minimized.

Copy link
Author

commented Jul 24, 2011

Hi there,

thx for your reply!

I'll keep this message in my inbox and see when I find some time.
We just got a baby one week ago so my priorities surely are on our
little daughter. ;-)

Best regards, Anton

On 22.07.2011 09:33, Herzult wrote:

Hi @AntonStoeckl,

First, thank you for the suggestions.

I'll work a bit on Gaufrette today and I think I'll especially try to find a clean way to allow you pass attributes such as the ones for AmazonS3.

It would be nice if you could work on the others and make PRs for them.

Sorry for the time it took me to answer.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2011

Hi there @Herzult !

I just recently found out about this project and I must say I like the simple and understandable architecture. I've also been doing something similar (with GridFS and local filesystem support) for our previous PHP framework. However, now that we're using Symfony2 in our company, I think I might contribute a little on this one.

I noticed this thing doesn't support GridFS yet, so first thing I decided to do, was to make an Adapter for GridFS. Our fork with GridFS adapter is available here: https://github.com/Rohea/Gaufrette It should work at some level but it should be considered as early alpha version.

I also felt a strong need to implement some 'initial' metadata support because GridFS supports metadata and I think @AntonStoeckl has a point when he's saying that this kind of filesystem abstraction should support metadata for adapters that can support it. I had to tweak several other classes and implement at least one new method, namely 'boolean supportsMetadata()', for all adapters. But I'm quite certain I didn't break backwards compatibility.

I also added support for GridFS in KnpGaufretteBundle (for Symfony2). It's available here https://github.com/Rohea/KnpGaufretteBundle. Documentation about using the whole thing in Symfony2 is available there with the rest of the documentation

So.. These are my two cents. I'm probably going to improve my implementation during next few days. But when it's good enough, it's available for 'public use' and you can merge it to your master branch if you want.

Br, Tomi

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2011

Oh, and I might also suggest some topics to be considered for future development:

  • Non-public file serving, how it is done? (though this might mostly be application level issue and not a library issue)
  • Querying files with partial key (at least GridFS supports reg_exps, I think)
@Herzult

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2011

Hi @TomiS, thank you for helping.

About the GridFS support, it's a really good news as it is one of the reasons for Gaufrette. So once you're finished, please open a PR; I think you will not be the only to be interested in such an adapter.

The metadata problem is a bit more complicated but I agree with you and @AntonStoeckl : it's necessary. So you're work on this will also be very appreciated.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2011

Hi again @Herzult

There are a couple of questions that I'm pondering at the moment.

  1. In Adapter interface, there is a function keys(). Am I understanding it correctly? Should it really return all keys inserted in the filesystem? I'm just wondering, what if there are like million files in the filesystem? In that case, calling that function would practically crash the server. [EDIT: I might have figured out a way to do this for GridFS but I'm not sure about it yet]

  2. I think we would need some kind of a function for querying files, for example by using a partial key or metadata. For example: I often use following key structure: "/someContext/relatedObjectId/filename.extension/". This works at least for local filesystem paths and GridFS keys. I would like to be able to query all files for a given object i.e. "/someContext/relatedObjectId/*". I know GridFS supports both regular expressions (for key) and queries for metadata. I think Adapter interface would benefit from generic function for such queries. Would that be find($args) that returns file objects or what? Any ideas?

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2011

An idea about metadata and/or queries. How about making two new interfaces "MetadataInterface" and "QueryableInteface"? Then adapters compatible with one or both of them could just implement them without the need of expanding the original Adapter class.

@Herzult

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2011

Hi @TomiS,

At the beginning, the ->keys() method had an optional $pattern argument. I removed it because it was not working well on all the adapters. But I agree we can't continue listing the whole filesystem everytime...

As you said, we need to "query" the filesystems, so we could maybe introduce a Query that is passed to the ->keys() method of the filesystem. So the filesystem is responsible of returning the right result. Maybe we could also provide some emulations of unsupported query parts, depending on the current adapter (configurable because of perfs problems it could lead to...).

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2011

Hi again @Herzult

Since filesystem abstractions are quite complicated topic, I tried to approach it by writing some stuff down in somewhat structured form. This is what I think about things.

Facts (or at least I think they're facts)

F1: Building large assoc arrays from thousands of files requires lots of (= too much) runtime memory
F2: Requesting file's content is a heavy I/O operation that should be avoided
F3: Every database request is 'too heavy' in the context of filesystem abstraction so the amount of them should be minimal

Rules (some design rules based on facts above and my personal experience on filesystem abstractions):

R1: All adapters should behave as consistently as possible and return identical values for identical function calls
R2: Identical feature set for all adapters is not possible but differences must be clearly communicated
R3: Filesystems should mainly be used through Filesystem and File objects
R4: Keys for files should be somehow consistent between adapters
R5: All filedata should be retrieved using lazy-loading principle. However, (File object) data should also be as fully populated as possible when it's easily available.

Proposals and thoughts (this is just me thinking :)

  1. Users should request data from the filesystem mainly by using get() or the 'multi-alternative' of get, let's say: query($args) or what ever. This is because these return File object(s) and most of data manipulation and retrieval should be done through File objects

  2. All functions that return multiple File objects, should do it as a FileCursor. For example, GridFS returns always multiple files in a cursor instead of an array because cursor only consumes memory for the index pointer and current object whereas array always has to be fully prepared before it can be used. This is not a problem for small file request. But consider, for example, a command line maintenance script that requests all files from the file system to do something. Using arrays in these situations practically always causes 'memory exhausted' error. With cursors, this can be somewhat easily avoided.

  3. As far as I understand, the only absolutely common thing between all filesystem adapters is a key (or path) to the file. This leads to my most uncertain proposal. I would, however, like the idea, that in key generation we would encourage (but maybe not enforce) following pattern:
    /[context]/[1..N indentifiers]/filename.extension
    So, for example, following keys/paths would be valid:
    /userAvatars/12345/thumbnail.jpg
    /productImages/thumbs/width300/4e80b2096803fa3b0900006f/image.png
    I can say that this pattern would work for Local FS and GridFS. I don't know if it works for the rest of the adapters. Benefits of this approach would come from the fact that we could always assume that when exploding this string with delimiter '/', we could always find out what is the context (first item in array) and human-readable filename (last item in array).

  4. When accessing file's content (bytes), it should be done using lazy loading, namely, data should be read from the filesystem no earlier than the user actually requests it.
    This should never need to be done just to access file's name or metadata.

I've implemented some code (mostly GridFS) that follows these principles in our fork. If you have time, please view them and tell me what you think. At least check out FileCursor.php and Adapter/GridFS. There are also some changes in Filesystem.php and Adapter.php. As I said, I'm not sure how my solutions work for example for S3 or Rackspace.

@stof

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2011

@TomiS just some thoughts about your work:

  • you should follow the CS of the library (no `public`` keyword for interfaces, spaces for the indentation...) which are the Symfony2 CS (you can find the human-readable version in the doc and the CodeSniffer config here)
  • you should use a feature branch when implementing new features
  • you should not revert the S3 adapter to ZF as the Zend implementation is broken, which is the reason why it was switched to the official Amazon SDK

Also, if you want some review of your work, the easiest way is to send a PR as this will allow us to comment on it instead of having to comment on individual commits.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2011

@stof Thanks. I'll try to fix the CS issues. And I'll send a PR after that. I'm quite new to Github so I didn't know it's the most convenient way to do this kind of reviewing/commenting.

Oh, and changes in S3 adapter are just me not knowing how to use git.. Didn't mean to do anything special to that file. I'm trying to work that out too.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2011

About queries. I have been pondering the question of querying multiple files because I find it far from trivial. Let's break it down a little. We have at least following query possibilities:

1) Queries using partial key.

The case where query parameter is a sub string (beginning from the first character) of full key is relatively easy. It's a trivial to implement for GridFS and I think it's at least possible (if not easy) to implement it for local filesystem by using some kind of recursive iteration of folders starting from the last full segment of the file path (=key)

So the questions here are:

  • Do we need more complex queries, for example such that don't require user to provide key fragment that begins from the beginning of the full key?
  • Are all adapters able to implement at least this query type?

2) Queries using generally available file info

By generally available, I mean stuff like file size, creation date, mimetype etc. Should the user be able to query files using that data? Again, this is easy in GridFS but how about other adapters?

3) Queries using user-defined metadata

This one is clearly the one that cannot be supported by all adapters (e.g. Local FS). But do we need to worry about this at all?


My gut feeling is that, with clever key/path generation, most relevant queries can be made just using partial key beginning from the first character. So at least that should be as possible as possible, imo :)

@AntonStoeckl

This comment has been minimized.

Copy link
Author

commented Jan 24, 2012

Hi Antoine,

finally .. I have some time to have a look at this again.

I see that you have integrated metadata into the S3 Adapter!
But why does function "supportMetadata()" return false?
Is it now working / not completed yet?

Also wondering about my suggestion 1) .... that's a very simple change, is there a reason you did not implement that?
Or do you want me to commit that into my fork and send a pull request?

Best regards, Anton

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2012

Hi again @Herzult . I put this here because I don't want to start a new ticket for a question.

Is it possible to import a custom adapter to Gaufrette from outside? The case is following: We have a very specific custom adapter for our custom CDN. Would be nice to be able to use it through Gaufrette in application end but is it possible without forking the Gaufrette lib and/or KnpGaufretteBundle? Symfony2 Bundle inheritance didn't seem to help because we weren't able to figure out how to extend dependency injections.

Thanks.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2012

@Herzult
Also on another topic. We've been pondering quite a lot about the topic of more advanced CDN features of a Filesystem abstraction in our company. I just red that you consider loosing all metadata features and such. We, however, feel that things like metadata are quite important for a filesystem abstraction because many CDNs offer those features and they are quite practical for many things (such as storing privacy information of the file, etc.). Do you think Gaufrette can be extended to include some level of advanced CDN features with, for example, some optional interface classes or something? Or are those strictly out of the scope?

We are willing to contribute but we must be sure the stuff we implement does not just disappear in the future. So what do you think? Should we just fork our library and include more complex features (and maybe loose some simpler adapters in the process) or do you think this stuff could fit into Gaufrette?

To be honest, I feel Gaufrette might become irrelevant if it cannot include any of the more advanced features we all need and use every day. Wouldn't like to see that happening, though, because I like Gaufrette and how it's been designed.

@Herzult

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2012

If we find a good robust solution that is generic enough, I'm ok.

@TomiS

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2012

@Herzult Great to hear that :) We'll try to figure out something and contact you when (if ever) we have any real ideas.

@Herzult

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2012

Cool :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.