Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds asSet() to Bag for #726 #797

Closed
wants to merge 3 commits into from

Conversation

victornoel
Copy link
Contributor

@victornoel victornoel commented Feb 15, 2020

⚠️ Edit: all of this is outdated, see #869 for the latest design ⚠️

Finally had the time to finish my PR for #726.

I made the following design decisions:

  • The different Bag interfaces exposes asSet() which is a view on the unique elements of the bag
  • They are either a SetIterable or a MutableSet
  • I didn't tackle returning sorted o immutable variants because
  • implementing it was way too complex
  • this is not the primary use case for accessing a Bag as a Set IMHO
  • The synchronized, unmodifiable and multi reader variants are implemented though
  • Only for ImmutableArrayBag and ImmutableSortedBagImpl, I had to implement it by subclassing AbstractImmutableSet inline: since they are both based on arrays, I only implemented the minimum because I don't think it would have much impact on performance to do more.

One open question is: should the inline subclassing be extracted in a separate class that wraps an array maybe?

Copy link
Contributor

@motlin motlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main feedback is that returning mutable views breaks bag invariants.

@victornoel
Copy link
Contributor Author

@motlin any feedbacks on the best way to tackle the problems discussed above?

@motlin
Copy link
Contributor

motlin commented Mar 1, 2020

@victornoel my recommendation would still be to just use SetIerable for now. That way the interface doesn't promise to return a MutableSet in places where you'd expect to be able to mutate but we can't yet support it. It's a step in the right direction and we can always support mutation in another major version.

@victornoel
Copy link
Contributor Author

@motlin yes, I will go with that then, thanks

@victornoel
Copy link
Contributor Author

@motlin one more advice request: if we use SetIterable everywhere, it becomes difficult to take advantage of existing Synchronized and Unmodifiable wrappers for mutable sets, what do you recommend? Writing some for SetIterable itself?

@victornoel
Copy link
Contributor Author

Okay, so I pushed a final version with:

  • read only sets as result of asSet:
    • SetIterable for all various MutableBag
    • corresponding Immutable and Sorted and so on for all the non mutable bags.
  • for multireader and synchronized bags, I had to add SynchronizedSetIterable and SynchronizedParallelSetIterable classes

@victornoel victornoel force-pushed the 726-bag-as-set branch 2 times, most recently from 918c869 to f07cec3 Compare March 14, 2020 16:54
{
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motlin why is that? Maybe we don't want to test for being in instance of Set and let AbstractCollectionAdapter define equals directly?

{
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this catch exceptions? Can we have hashcode here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motlin some of the implementations of equals that rely on Collection.containsAll do catch those exception because Collection.containsAll document them and if they are thrown, it's not an error, it means the element are not contained (because of different types or things like that).

I can add hashCode, I didn't because I expected that all subclass should be more relevant to implement it, but if I add it, I'm not sure what is the best default implementation? An hashCode of all elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motlin concerning hashCode, actually findbugs is complaining about it, so I moved it to the two subclasses that needed it.

{
return false;
}

return this.delegate.equals(obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems out of place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motlin what do you mean? Should I move equals and hashCode to AbstractCollectionAdapter maybe? But I wouldn't be able to test for instance of Set though

@motlin
Copy link
Contributor

motlin commented Mar 14, 2020 via email

@motlin
Copy link
Contributor

motlin commented Mar 14, 2020

Using inner class views makes me nervous, especially in the immutable collections. I was imagining a class like SetFromBagAdapter that's almost like SetAdapter, but implemented by delegating to a Bag. As long as it doesn't delegate to a MutableBag, we can have high confidence that it never allows mutation as well as unsynchronized access.

@victornoel victornoel force-pushed the 726-bag-as-set branch 2 times, most recently from 5325b39 to b3e7fbb Compare March 14, 2020 19:31
@victornoel
Copy link
Contributor Author

@motlin trying back to work on this, you said:

Using inner class views makes me nervous, especially in the immutable collections. I was imagining a class like SetFromBagAdapter that's almost like SetAdapter, but implemented by delegating to a Bag. As long as it doesn't delegate to a MutableBag, we can have high confidence that it never allows mutation as well as unsynchronized access.

How would that work? In order to implement SetFromBagAdapter by delegating to a Bag I would need to be able to access the internal of the bag... or do you mean that it is created by passing to it the underlying datastructure used inside the bag? the array or the map, depending on the implementation?

@victornoel
Copy link
Contributor Author

victornoel commented Apr 12, 2020

@motlin I have pushed some changes that I think all together make sense, here is a rundown of the changes that could raise discussion and rationale for the choices I made:

  • I extracted the internal classes into their own file as you proposed
    • ImmutableSetAdapter wraps any UnsortedSetIterable and is used by ImmutableHashBag
      • I tried to override what was needed to have best performances
    • ImmutableSetFromArrayBagAdapter wraps the array of a ImmutableArrayBag directly
      • this could also be moved to set package and called ImmutableArraySetAdapter or something like that but I feel like it's better if it stays hidden next to the class using it
    • same for ImmutableSortedSetFromArrayBagAdapter.
    • we can't just have a simple ImmutableSetFromBagAdapter because it would need asSet() to be implemented (and that would lead to risk of strange infinite loops :)
    • I'm not sure how the serialization stuffs work in there, so I may have made mistakes...
  • MutableBag.asSet() returns MutableSet
    • they all are unmodifiable for now: we should create a new issue to tackle this later.
    • it is mainly needed to elegantly take advantage of all the MutableSet decorators for synchronized, unmodifiable, multi reader, etc without pain. I tried many times in the previous iterations to do without it, but there always was a problem somewhere so I think it's best to return MutableSet and for now not honour the mutability contract of it.
  • MutableSortedBag does not return MutableSortedSet
    • because of MutableSortedMap.keySet contract
    • I added comments there for now but maybe there is something else to be done so that this can go on?
    • this is the only place where it would be needed to change the signatures of interface methods in the future
  • MultiReaderBag returns MultiReaderSet
    • I added a MultiReaderSetAdapter that share most of its implementation with MultiReaderUnifiedSet (shared in in AbstractMultiReaderSet).
    • I'm not sure how the serialization stuffs work in there, so I may have made mistakes...

@victornoel victornoel force-pushed the 726-bag-as-set branch 4 times, most recently from c5cca0f to 92a25b8 Compare April 12, 2020 13:09
@motlin
Copy link
Contributor

motlin commented Apr 12, 2020

we can't just have a simple ImmutableSetFromBagAdapter because it would need asSet() to be implemented (and that would lead to risk of strange infinite loops :)

Can't it return this?

MutableSet.asSet() returns MutableBag

Something sounds backwards in this sentence

ImmutableSetAdapter wraps any UnsortedSetIterable

This isn't what I meant - I'll have to take a closer look. Why does the wrapper need access to the Bag internals?

@victornoel
Copy link
Contributor Author

@motlin

Can't it return this?

what I mean is that if you wrap a bag, to implement some of the set method, you need to have access to the underlying datatstructure, for example to implement iterator, and in our situation there are two cases when this happens:

  • for ImmutableArrayBag, in this case you need access to the array
    • That's why I introduced ImmutableSetFromArrayBagAdapter and also ImmutableSortedSetFromArrayBagAdapter
  • for ImmutableHashBag, in this case you can indeed have an hypothetical ImmutableSetFromBagAdapter that takes the hidden mutable HashBag and then call asSet() on it, but if you pass it this, it will go into infinite loop of course
    • That's why I introduced ImmutableSetAdapter that introduces no risks but allow to achieve the same objective in a more generic way.

MutableSet.asSet() returns MutableBag

Something sounds backwards in this sentence

it was, I will fix my comment :)

@motlin
Copy link
Contributor

motlin commented Apr 12, 2020

I tried to write a set-from-bag adapter to see how feasible it is. It's nearly possible, it would need one new method on Bags for a distinct keys view. Take a look at the iterator() method to see what I mean.

public class SetFromBagAdapter<T>
        extends AbstractImmutableSet<T>
{
    private final Bag<T> bag;

    public SetFromBagAdapter(Bag<T> bag)
    {
        this.bag = bag;
    }

    @Override
    public int size()
    {
        return this.bag.sizeDistinct();
    }

    @Override
    public T getFirst()
    {
        return this.bag.getFirst();
    }

    @Override
    public T getLast()
    {
        return this.bag.getLast();
    }

    @Override
    public void each(Procedure<? super T> procedure)
    {
        this.bag.forEachWithOccurrences((each, parameter) -> procedure.accept(each));
    }

    @Override
    public Iterator<T> iterator()
    {
        // This works but is inefficient. It really creates the map in O(n) time and space.
        Iterator<T> iterator = this.bag.toMapOfItemToCount().keysView().iterator();

        // This doesn't work today, but it could without too much effort.
        return this.bag.distinctView().iterator();
    }
}

@motlin
Copy link
Contributor

motlin commented Apr 12, 2020

The in places like AbstractHashBag, it's pretty easy to implement the distinct view.

    @Override
    public RichIterable<T> distinctView()
    {
        return this.items.keysView();
    }

@victornoel
Copy link
Contributor Author

@motlin if you look back at the issue #726 that spawned this whole PR, this keyView is the reason it exists :) so yes, a keyView() would be useful, but implementing means doing all that was done here.

@motlin
Copy link
Contributor

motlin commented Apr 12, 2020

I did forget that context but I think this shows we can implement both with just a few lines of code?

@victornoel
Copy link
Contributor Author

@motlin no, because of all the other implementation of Bag that do not rely on a Map as in your example. For example ImmutableArrayBag is such an example. Let's also not forget that we want to have a O(1) method asSet, if I just want to be able to iterate on the distinct keys, as you said, I can always use toMapOfItemToCount but it's not efficient.

The purpose of this PR is to:

  • implement such distinct key method, we decided in Distinct element view of Bag  #726 that it was a Set view of the map
  • that it behave in O(1), that's why the name of the method starting with as

Hence this proposal for asSet.

Honestly I've tried many alternative to arrive to this solution, with various iterations that can we witnessed here, I think we did go around all the possibilities and that the design proposed here is the best. Maybe to better understand that, it would make sense for you to play with the code and experience with the alternative you propose to either see why it doesn't work or to show me how it would work.

- Prevent useless locking
- Prevent potentially costly delegation

Signed-off-by: Victor Noël <victor.noel@brennus-analytics.com>
Better protect against exceptions in some cases

Signed-off-by: Victor Noël <victor.noel@brennus-analytics.com>
This is limited to returning immutable or unmodifiable sets.

Signed-off-by: Victor Noël <victor.noel@brennus-analytics.com>
@victornoel
Copy link
Contributor Author

Closing this as this will most certainly never be merged and based on very outdated code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants