Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to sort the entries of PlexusIoFileResourceCollection #18

Closed
plamentotev opened this issue Jun 3, 2019 · 8 comments
Closed
Assignees
Milestone

Comments

@plamentotev
Copy link
Member

Currently PlexusIoFileResourceCollection does not return the entries in any particular order. For some use cases (such as reproducible builds) it is useful if some predictable order could be set.

@plamentotev plamentotev changed the title Add option to sort the entries of PlexusIoFileResourceCollection Add option to sort the entries of PlexusIoFileResourceCollection Jun 3, 2019
@michael-o
Copy link
Member

This should be solved with a discrete implementatoin, shouldn't it?

@plamentotev
Copy link
Member Author

@michael-o I'm not sure if I understood your question. You mean that we should have separate class and not do it in PlexusIoFileResourceCollection?

@michael-o
Copy link
Member

Yes, I guess so. Sorting is overhead.

@plamentotev
Copy link
Member Author

@michael-o I agree. The default definitely should be as it is now - not sorted. The idea is to have some way of telling the order(probably by providing comparator). Either in PlexusIoFileResourceCollection or some other class. I'm still looking where in the hierarchy it will fit best as it could be useful not only for file resources.

@michael-o
Copy link
Member

Maybe the collection should not be touched at all the consumer has to sort while processing because the collection is intermediate. Or we provide the option to have configurable iterators which sort only when necessary.

@plamentotev
Copy link
Member Author

Maybe the collection should not be touched at all the consumer has to sort while processing because the collection is intermediate.

This is an option. But then the archiver has to re-arrange the entries of the archive. In the general case this means additional IO operations. Unfortunately the archiver could not just get all entries and sort them before creating the archive. Some PlexusIoResourceCollection implementations such as PlexusIoTarFileResourceCollection does not have random access to the element of the collection. For example tar files does not have index or table of content so in order to get the files (even just their names) in it, the whole file should be read. That means iterating twice (once to get all elements and once to add them to archive) and that mean additional IO as well. Opening the stream for those implementation also means sequential scan of the file so this is not an option. Of course one may create an index for the tar files while reading it but there still be overhead and needlessly added complexity. And it won't solve the problem in the general case for all PlexusIoResourceCollection implementations.

That is why I don't think sorting should be added to all PlexusIoResourceCollection. Only for those that it makes sense.

Or we provide the option to have configurable iterators which sort only when necessary.

I don't think we have such option right now. But as sorting does not make sense for all PlexusIoResourceCollection implementations I'm not sure if we should add this to the iterator. If we configure the iterator (when getting it) that means "sortable" collections should be handled differently that "non-sortable" and I would rather avoid this. There are collection that does not keep their entries in any order and there are variants of the same collections that keep the entries in order (HashMap and TreeMap for example). So maybe we can either take this approach and have two different classes for file collections (one sorted and one not) or have one class that may take compoarator and sorts the entries only if the compoarator is present.

@michael-o
Copy link
Member

Thanks, so you are saying this should be completely delegated to the client consuming those collections, i.e, sorting it and leaving PlexusIoResourceCollections as-is?

@hboutemy hboutemy self-assigned this Aug 25, 2019
@hboutemy hboutemy added this to the 3.2.0 milestone Aug 25, 2019
@hboutemy
Copy link
Member

option added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants