Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at defining some common collection interfaces #7941

Closed
wants to merge 1 commit into from

Conversation

ewinslow
Copy link
Contributor

Comparison of collection libraries/interfaces

https://docs.google.com/a/elgg.org/spreadsheets/d/1a7krO6pIQPm52Y8tKYr8JOPvOrrTlohty59W7MRz6Fk/edit#gid=0

Believe it or not, I've tried to keep this down to the minimum number of interfaces I think we'd need if we migrated all our array-based interfaces to be collections-based. Because of my philosophy of "promise the least API surface possible", we have lots of fine-grained interfaces which allow us to do just that, which makes this feel a more complicated than just using arrays, but the possibilities for performance and developer appeal are fantastic, I think, compared to arrays.

// no superfluous arguments
$recentBlogs = $entities->where([
  'type' => 'object',
  'subtype' => 'blog',
])->orderBy('time_created')->reverse();

echo "Count: " . count($recentBlogs);
foreach ($recentBlogs->slice($offset, $limit) as $blog) {
  echo $blog->getTitle();
}

I find this significantly cleaner that the current approach:

$options = [
  'type' => 'object',
  'subtype' => 'blog',
  // Was it order_by or sort_by??? No errors if you get it wrong.
  // Also, SQL formatting sneaking through.
  'order_by' => 'time_created desc',
  'count' => true,
  // Ignore when count = true
  'offset' => 0,
  'limit' => 10,
];

// returns a number
$count = elgg_get_entities($options);
$options['count'] = false;
// returns a different kind of response based on a string key argument... oy...
$blogs = elgg_get_entities($options);

TODOs:

  • Commit should mention Proposal for entity list system (Trac #2567) #2567 and Query object (Trac #4) #5071 ...
  • Add clear use-cases for each one of these interfaces.
  • Don't make Queue inherit from Sequence. This requires implementors to provide all the iteration helpers (filter/map/etc.), which we don't necessarily want to do (e.g. see Elgg\Queue\DatabaseQueue, which does not need all those things to do its job).

For discussion:

  • Most importantly, do we want to go down this path? Everything is private right now, so it shouldn't really be a huge huge deal, but as soon as we start leaking this API out to plugin devs (which we would eventually want to do, I would think), this will be a rather large library of things to deal with...
  • Generics support (i.e. how to place type restrictions on the items in a collection?)
    • Generics will be documentation-only until PHP supports them natively, not runtime enforced.
  • Should the sorting functions (e.g. orderBy) be available on Collection instead of just Sequence?
    • seems like you should be able to sort any collection of items and end up with a sequence, but then this creates a circular reference b/w Sequence and Collection interfaces, which feels wrong.
  • What should happen when an item is added to a Set that already contains it?
    • throw?
    • fail silently?
    • leave it up to implementations? <--- We'll do this, but throwing an exception is within the realm of possibilities, so clients will have to guard against this either way...
    • E.g. if you have an mutable ordered set, and you insert an item at a given index, but the item is already in the set, should you move the item from its position to the new position, or should you throw with an error like "only one instance allowed per set"?
  • What should the naming convention for Map accessors be?
    • get, getKey
    • containsKey, exists, isset, issetKey, keyExists, offsetExists, offsetIsset
    • delete, deleteKey, removeKey, unset, unsetKey, offsetUnset
    • set, setKey, put, putKey, offsetSet
    • Note that other interfaces already have remove($item), contains($item)
  • What should the naming convenion for Sequence accessors be?
    • getAt, getNth
    • removeAt, removeNth
    • insertAt, insertNth
    • replaceAt, replaceNth

*
* @return Collection
*/
public function filter(callable $filter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this moved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetizing

@ewinslow
Copy link
Contributor Author

@mrclay I know this is kind of a big one, but I'd appreciate your input.

@juho-jaakkola
Copy link
Member

Could you squash the commits? Would make it easier to read.

@mrclay
Copy link
Member

mrclay commented Feb 17, 2015

It would be helpful to have some idea of where you can imagine all of these being used. Do we pull in the whole thing regardless?

I'd lean towards implementations deciding how to handle duplicate insertions. I have cases where that makes sense and where it doesn't.

Generics... I think we get most of the win via static analysis, so I'm not all that into runtime checks.

I feel like sorting and filtering is often so custom (and operations are so frequently dependent on the underlying data storage API) that putting it in the main interfaces seem limiting. E.g. ElggList is going to be a link table. Would sorting it create a new list ID and a copy of all the elements? I can't imagine we'd want that.

@ewinslow
Copy link
Contributor Author

@juho-jaakkola -- you can just look at the Files Changed tab?

@ewinslow
Copy link
Contributor Author

@mrclay sorting an entity list does not mutate the list or create a new DB entry, it is just a virtual view over it. exactly analagous to using ORDER BY in SQL. The default order of lists would be specified by their items' position property in the DB.

@ewinslow
Copy link
Contributor Author

@mrclay Another option is to define alternative methods that are guaranteed not to throw when they fail to perform their intended operation (e.g. "offer" which returns false instead of "add" which throws).

I feel this bloats the API, though, so I'm not super keen on it.

@ewinslow
Copy link
Contributor Author

Brain dump: would also like a WritableRepository interface that extends Sequence. This would be one of the ultimate practical uses for the genetic collections APIs. If we can model our data storage as a collection, it will become trivial to write tests for it or swap out back ends.

get(string $id): T for fetching a particular item (Map<string,T>?)
create(array $props = []): T for generating new entities
delete(string $id) for removing an item from the repository
update(string $id, array $props) for persisting new properties to that item

T may need to extend a RepositoryEntry interface that implements a getId method... Otherwise there is no way to get the ids of new items...

@ewinslow
Copy link
Contributor Author

@mrclay to answer your question about what these would be used for, the high level thought is just "any time you would return an array, return a collection instead".

But just to be clear, some concrete examples:

  • Elgg\Queue* => Elgg\Structs\Queue
  • Elgg\Filesystem\Directory::getFiles. Return a collection of files instead of an array.
  • Views system -- map from string to view
  • Database -- model entities as a collection instead of a bunch of sql tables. Same with relationships.
  • Entity lists -- these could be modeled as mutable, ordered sets.
function addTranslation($country_code, $language_array) {

        if (!isset($this->CONFIG->translations)) {
            $this->CONFIG->translations = array();
        }

        $country_code = strtolower($country_code);
        $country_code = trim($country_code);
        if (is_array($language_array) && sizeof($language_array) > 0 && $country_code != "") {
            if (!isset($this->CONFIG->translations[$country_code])) {
                $this->CONFIG->translations[$country_code] = $language_array;
            } else {
                $this->CONFIG->translations[$country_code] = $language_array + $this->CONFIG->translations[$country_code];
            }
            return true;
        }
        return false;
    }

With a standard collections API and some trivial implementations, this code could almost be simplified to:

addTranslation($country_code, Map $language_array) {
  $this->translations->get($country_code)->addAll($language_array);
}

@ewinslow
Copy link
Contributor Author

Latest idea: what if I just put all this stuff in its own repo? I can see these being very useful outside of Elgg, and having them in a separate repo makes that much easier... Then we can just include them in Elgg/Elgg via composer if/when we want...

@mrclay
Copy link
Member

mrclay commented Feb 24, 2015

Is it worth looking into the other popular collections repos? https://packagist.org/search/?q=colle

@ewinslow
Copy link
Contributor Author

Great point. I actually did look around a bit already. It would be good to
document this, and hopefully avoid doing too much custom work. Main
observations:

  • Doctrine -- No read-only interface. All collections are maps
    (basically just OO version of PHP array, but we want more
    flexibility). "Criteria" API is verbose.
  • Xi-collections -- immutability, yay, but not flexible enough to
    support database-backed collections (E.g. how does Collections::create
    static method work for a database class?) Also, hasn't been touched in 2
    years.
  • yvoyer/collection -- extends Doctrine API. Has many methods we don't
    need (e.g. collections with a concept of a currently selected item).
  • php-collection -- Apache 2 licensed -- is this a problem?, mutable by
    default. No sets support. Don't like certain apis like get/remove($index).
    Should be more explicit.
  • ebidtech/collection -- Traits-centric. Does not provide functional
    programming methods like filter/map.
  • italolelis_/_collections -- extends Spl* collections :(. No
    read-only interfaces.

Happy to use any of these as an implementation detail, but I really feel
like exposing the minimum possible interface is important to give us
maximum flexibility going forward.

Happy to put my research into a spreadsheet or something, just to be
thorough.

On Tue Feb 24 2015 at 1:07:33 PM Steve Clay notifications@github.com
wrote:

Is it worth looking into the other popular collections repos?
https://packagist.org/search/?q=colle


Reply to this email directly or view it on GitHub
#7941 (comment).

@ewinslow
Copy link
Contributor Author

I also looked at https://github.com/IcecaveStudios/collections and they
have a nice read-only/mutable distinction, but their API is completely
bloated. We would not be able to implement their interfaces.

At the end of the day I think what I want is total control over the
interfaces we expose to the outside world. How these things are implemented
is not important and should be able to change on a whim. I am totally for
using someone else's hard work to implement whatever interfaces we decide
on.

So I take back the idea of putting these in a separate repo. I don't want
these to be open for implementation/extension by any classes except the
ones in Elgg/Elgg.

On Tue Feb 24 2015 at 2:53:06 PM Evan Winslow evan@elgg.org wrote:

Great point. I actually did look around a bit already. It would be good to
document this, and hopefully avoid doing too much custom work. Main
observations:

  • Doctrine -- No read-only interface. All collections are maps
    (basically just OO version of PHP array, but we want more
    flexibility). "Criteria" API is verbose.
  • Xi-collections -- immutability, yay, but not flexible enough to
    support database-backed collections (E.g. how does Collections::create
    static method work for a database class?) Also, hasn't been touched in 2
    years.
  • yvoyer/collection -- extends Doctrine API. Has many methods we don't
    need (e.g. collections with a concept of a currently selected item).
  • php-collection -- Apache 2 licensed -- is this a problem?, mutable
    by default. No sets support. Don't like certain apis like
    get/remove($index). Should be more explicit.
  • ebidtech/collection -- Traits-centric. Does not provide functional
    programming methods like filter/map.
  • italolelis_/_collections -- extends Spl* collections :(. No
    read-only interfaces.

Happy to use any of these as an implementation detail, but I really feel
like exposing the minimum possible interface is important to give us
maximum flexibility going forward.

Happy to put my research into a spreadsheet or something, just to be
thorough.

On Tue Feb 24 2015 at 1:07:33 PM Steve Clay notifications@github.com
wrote:

Is it worth looking into the other popular collections repos?
https://packagist.org/search/?q=colle


Reply to this email directly or view it on GitHub
#7941 (comment).

@mrclay
Copy link
Member

mrclay commented Feb 25, 2015

Agreed.

@ewinslow
Copy link
Contributor Author

ewinslow commented Mar 1, 2015

One obvious use-case for Map in case I haven't yet mentioned it is i18n stuff. Basically this is a Map of locales to message bundles, where message bundles are maps of strings to Messages. Messages can be formatted with arguments to produce fully localized strings.

@ewinslow
Copy link
Contributor Author

ewinslow commented Mar 9, 2015

Added a spreadsheet which is the start of further research into the existing collection interfaces that are out there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants