CHAMP algorithm is current state of the art for immutable/persistent collections #6

Open
mikehearn opened this Issue Jul 21, 2016 · 12 comments

Comments

Projects
None yet
10 participants
@mikehearn

I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.

@hastebrot

This comment has been minimized.

Show comment
Hide comment
@hastebrot

hastebrot Jul 21, 2016

This would be very great.

Just for reference: A while ago I found this older implementation of a HAMT
in Kotlin: https://github.com/KenBarclay/KData I'm not quite sure if this
uses the CHAMP algorithm.

On Jul 21, 2016 12:36 PM, "Mike Hearn" notifications@github.com wrote:

I have nothing against using the PCollections library but the CHAMP
algorithm is the current state of the art for JVM based immutable
collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin
shipped a library that wasn't using the current best research into the
field, especially as an implementation already exists. It'd be a nice
competitive advantage over other languages.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAeSf6vaCm6aJxeP6QnTF94Eui-8dVBNks5qX0uvgaJpZM4JRqbW
.

This would be very great.

Just for reference: A while ago I found this older implementation of a HAMT
in Kotlin: https://github.com/KenBarclay/KData I'm not quite sure if this
uses the CHAMP algorithm.

On Jul 21, 2016 12:36 PM, "Mike Hearn" notifications@github.com wrote:

I have nothing against using the PCollections library but the CHAMP
algorithm is the current state of the art for JVM based immutable
collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin
shipped a library that wasn't using the current best research into the
field, especially as an implementation already exists. It'd be a nice
competitive advantage over other languages.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAeSf6vaCm6aJxeP6QnTF94Eui-8dVBNks5qX0uvgaJpZM4JRqbW
.

@ilya-g

This comment has been minimized.

Show comment
Hide comment
@ilya-g

ilya-g Jul 21, 2016

Member

@mikehearn as far as I understand, CHAMP is a data structure suitable for implementing hash maps and sets.

Do you know, is it possible to use CHAMP for ordered sets and maps (those that preserve element insertion order) and indexed lists?

Member

ilya-g commented Jul 21, 2016

@mikehearn as far as I understand, CHAMP is a data structure suitable for implementing hash maps and sets.

Do you know, is it possible to use CHAMP for ordered sets and maps (those that preserve element insertion order) and indexed lists?

@pniederw

This comment has been minimized.

Show comment
Hide comment
@pniederw

pniederw Jul 22, 2016

Last time I checked PCollections, it seemed to be considerably less state-of-the-art than Clojure's and Scala's persistent collections.

@ilya-g The author of capsule answered my questions on the scope of the project here: usethesource/capsule#2

At this time capsule feels more like a research prototype to me, and the project hasn't had much activity lately. However the author will deliver a talk at the upcoming JVMLS.

The most state-of-the-art persistent collections library implemented in Java and (supposedly) production-ready that I could find is https://github.com/GlenKPeterson/UncleJim. The starting point for this library was Clojure's persistent collections implementation. The author is also working on a new and improved persistent vector: https://github.com/GlenKPeterson/UncleJim/tree/2016-05-22_RRB-Tree
The author's own comparison to PCollections is here: https://github.com/GlenKPeterson/UncleJim/wiki/UncleJim-vs.-PCollections

pniederw commented Jul 22, 2016

Last time I checked PCollections, it seemed to be considerably less state-of-the-art than Clojure's and Scala's persistent collections.

@ilya-g The author of capsule answered my questions on the scope of the project here: usethesource/capsule#2

At this time capsule feels more like a research prototype to me, and the project hasn't had much activity lately. However the author will deliver a talk at the upcoming JVMLS.

The most state-of-the-art persistent collections library implemented in Java and (supposedly) production-ready that I could find is https://github.com/GlenKPeterson/UncleJim. The starting point for this library was Clojure's persistent collections implementation. The author is also working on a new and improved persistent vector: https://github.com/GlenKPeterson/UncleJim/tree/2016-05-22_RRB-Tree
The author's own comparison to PCollections is here: https://github.com/GlenKPeterson/UncleJim/wiki/UncleJim-vs.-PCollections

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jul 27, 2016

I don't know the answer to that, sorry.

I don't know the answer to that, sorry.

@msteindorfer

This comment has been minimized.

Show comment
Hide comment
@msteindorfer

msteindorfer Aug 11, 2016

As the author of the CHAMP paper and the usethesource/capsule library, I want to jump in an answer some questions you're having.

@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to usethesource/capsule. You can expect the implementation to appear in the codebase by September.

@ilya-g, @mikehearn, indeed, CHAMP is suitable for implementing hashed data structures. For implementing immutable indexed lists, the best candidate is currently the RRB Vector. Sorted data structures (e.g., that take a comparator) are probably still implemented best by a balanced binary tree (e.g., such as a red-black tree).

@pniederw et al., usethesource/capsule is more than a research prototype. It powers the builtin data structures of the http://www.rascal-mpl.org programming language. But were you're right is that it still needs polishing and work to make it more appealing to wider audience. From September on, I'll have more time for engineering than for research, therefore you can expected a lot of activity on the project.

Overall the aim of usethesource/capsule is to be a vehicle for implementing cutting edge research results on immutable data structures, and the current focus lies on hashed collections. I'm happy to discuss my roadmap with you and to see how common interests can be aligned.

As the author of the CHAMP paper and the usethesource/capsule library, I want to jump in an answer some questions you're having.

@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to usethesource/capsule. You can expect the implementation to appear in the codebase by September.

@ilya-g, @mikehearn, indeed, CHAMP is suitable for implementing hashed data structures. For implementing immutable indexed lists, the best candidate is currently the RRB Vector. Sorted data structures (e.g., that take a comparator) are probably still implemented best by a balanced binary tree (e.g., such as a red-black tree).

@pniederw et al., usethesource/capsule is more than a research prototype. It powers the builtin data structures of the http://www.rascal-mpl.org programming language. But were you're right is that it still needs polishing and work to make it more appealing to wider audience. From September on, I'll have more time for engineering than for research, therefore you can expected a lot of activity on the project.

Overall the aim of usethesource/capsule is to be a vehicle for implementing cutting edge research results on immutable data structures, and the current focus lies on hashed collections. I'm happy to discuss my roadmap with you and to see how common interests can be aligned.

@GlenKPeterson

This comment has been minimized.

Show comment
Hide comment
@GlenKPeterson

GlenKPeterson Aug 15, 2016

I really think the Clojure collections should be considered. My understanding is that Scala collections grew out of Rich Hickey's implementations of Okasaki's Purely Functional Data Structures and Bagwell/Rompf's papers. Hickey had many critical insights and brought Okasaki's ideas into the mainstream (at least for the JVM world) by adding some of his own and providing a highly performant, working implementation.

We've been using Paguro's type-safe versions of the Clojure collections in production for about 2 years now. When we find bugs, it's in the code that doesn't use these collections. I've been using Paguro in Kotlin as-is for about 3 or 4 months now. We could add some NotNull annotations, but that's about all I see necessary for ideal Kotlin use right now.

Even if you aren't interested in Paguro, I think the original Clojure collections deserve mention on the front page of this project. Without them, there might not be the Scala ones. And Clojure's transformation api is beautifully simple.

GlenKPeterson commented Aug 15, 2016

I really think the Clojure collections should be considered. My understanding is that Scala collections grew out of Rich Hickey's implementations of Okasaki's Purely Functional Data Structures and Bagwell/Rompf's papers. Hickey had many critical insights and brought Okasaki's ideas into the mainstream (at least for the JVM world) by adding some of his own and providing a highly performant, working implementation.

We've been using Paguro's type-safe versions of the Clojure collections in production for about 2 years now. When we find bugs, it's in the code that doesn't use these collections. I've been using Paguro in Kotlin as-is for about 3 or 4 months now. We could add some NotNull annotations, but that's about all I see necessary for ideal Kotlin use right now.

Even if you aren't interested in Paguro, I think the original Clojure collections deserve mention on the front page of this project. Without them, there might not be the Scala ones. And Clojure's transformation api is beautifully simple.

@norswap

This comment has been minimized.

Show comment
Hide comment
@norswap

norswap Sep 21, 2016

I'm adding my vote for CHAMP.

I implemented a immutable map in Kotlin using the approach:
https://github.com/norswap/triemap

It's a learning project: It doesn't have the niceties that capsule nor compatibility with the Java collections API. But it should give you a sense of what's involved. Once you've grasped the basics, the implementation isn't terribly hard.

norswap commented Sep 21, 2016

I'm adding my vote for CHAMP.

I implemented a immutable map in Kotlin using the approach:
https://github.com/norswap/triemap

It's a learning project: It doesn't have the niceties that capsule nor compatibility with the Java collections API. But it should give you a sense of what's involved. Once you've grasped the basics, the implementation isn't terribly hard.

@danieldietrich

This comment has been minimized.

Show comment
Hide comment
@danieldietrich

danieldietrich Mar 20, 2017

Hi, Daniel chiming in :)

Disclaimer: I'm the creator of Javaslang

The ideas presented here are very interesting.

I'm not sure if the best way to provide interoperability with standard Java collections is to implement java.util.* collection interfaces. The mutator methods (i.e. the ones that return void) can only throw at runtime - which is unsafe. Javaslang's persistent collections for example will provide views for that purpose.

Javaslang was created to bring a big portion of the persistent/immutable part of Scala to Java (8). We benchmark our collections against Scala, Clojure, PCollection, Functional Java, Eclipse Collections and Java mutable collections. Here is how our Vector performs for example (> 1 := faster than).

We based our Hash* collections on Hash Array Mapped Tree (HAMT), the Sorted* collections on a Red/Back-Tree and the Vector on a Bit Mapped Trie (BMT). This is the collection hierarchy so far (upcoming 2.1.0):

javaslang-2-1-0-alpha

Here is a comparison of the APIs of two persistent Vector implementations.

Left: Javaslang / Right: PCollections

javaslang-vs-pcollections

danieldietrich commented Mar 20, 2017

Hi, Daniel chiming in :)

Disclaimer: I'm the creator of Javaslang

The ideas presented here are very interesting.

I'm not sure if the best way to provide interoperability with standard Java collections is to implement java.util.* collection interfaces. The mutator methods (i.e. the ones that return void) can only throw at runtime - which is unsafe. Javaslang's persistent collections for example will provide views for that purpose.

Javaslang was created to bring a big portion of the persistent/immutable part of Scala to Java (8). We benchmark our collections against Scala, Clojure, PCollection, Functional Java, Eclipse Collections and Java mutable collections. Here is how our Vector performs for example (> 1 := faster than).

We based our Hash* collections on Hash Array Mapped Tree (HAMT), the Sorted* collections on a Red/Back-Tree and the Vector on a Bit Mapped Trie (BMT). This is the collection hierarchy so far (upcoming 2.1.0):

javaslang-2-1-0-alpha

Here is a comparison of the APIs of two persistent Vector implementations.

Left: Javaslang / Right: PCollections

javaslang-vs-pcollections

@paplorinc

This comment has been minimized.

Show comment
Hide comment
@paplorinc

paplorinc Mar 20, 2017

Here is how our Vector performs for example (> 1 := faster than).

To make it clear what I meant by PCollections being slower:

Operation   Ratio                                                 10      100     1026 
Create      slang_persistent/pcollections_persistent          29.70×  114.82×  243.27×
Head        slang_persistent/pcollections_persistent           1.75×    2.43×    2.81×
Tail        slang_persistent/pcollections_persistent           4.38×    6.00×    7.59×
Get         slang_persistent/pcollections_persistent           4.86×    6.08×    8.99×
Update      slang_persistent/pcollections_persistent           1.73×    2.34×    3.07×
Prepend     slang_persistent/pcollections_persistent           0.32×    0.74×    1.31×
Append      slang_persistent/pcollections_persistent           0.51×    0.74×    1.09×
Slice       slang_persistent/pcollections_persistent           0.53×    0.38×    0.40×
Iterate     slang_persistent/pcollections_persistent          13.31×   10.68×    8.02×

i.e. except for slice (which has a memory leak in Java, Clojure and PCollections, i.e. a wrapper over all elements), Javaslang is several times faster, consistently.

I think Kotlin is a very nice new language, we would be glad to incorporate techniques from CHAMP into Javaslang (where applicable), if that would help its Kotlin integration :)

Here is how our Vector performs for example (> 1 := faster than).

To make it clear what I meant by PCollections being slower:

Operation   Ratio                                                 10      100     1026 
Create      slang_persistent/pcollections_persistent          29.70×  114.82×  243.27×
Head        slang_persistent/pcollections_persistent           1.75×    2.43×    2.81×
Tail        slang_persistent/pcollections_persistent           4.38×    6.00×    7.59×
Get         slang_persistent/pcollections_persistent           4.86×    6.08×    8.99×
Update      slang_persistent/pcollections_persistent           1.73×    2.34×    3.07×
Prepend     slang_persistent/pcollections_persistent           0.32×    0.74×    1.31×
Append      slang_persistent/pcollections_persistent           0.51×    0.74×    1.09×
Slice       slang_persistent/pcollections_persistent           0.53×    0.38×    0.40×
Iterate     slang_persistent/pcollections_persistent          13.31×   10.68×    8.02×

i.e. except for slice (which has a memory leak in Java, Clojure and PCollections, i.e. a wrapper over all elements), Javaslang is several times faster, consistently.

I think Kotlin is a very nice new language, we would be glad to incorporate techniques from CHAMP into Javaslang (where applicable), if that would help its Kotlin integration :)

@zvozin

This comment has been minimized.

Show comment
Hide comment
@zvozin

zvozin Apr 9, 2017

@danieldietrich I, for one, have been using Javaslang in Kotlin a while now as is and loving it. A wrapper that turns static utility methods like

Try<Seq<T>> sequence(Iterable<? extends Try<? extends T>> values)

into extensions is all that I would wish for, and would in fact volunteer to come up with such a package myself - I have half of it spread across my projects already anyway :).

zvozin commented Apr 9, 2017

@danieldietrich I, for one, have been using Javaslang in Kotlin a while now as is and loving it. A wrapper that turns static utility methods like

Try<Seq<T>> sequence(Iterable<? extends Try<? extends T>> values)

into extensions is all that I would wish for, and would in fact volunteer to come up with such a package myself - I have half of it spread across my projects already anyway :).

@paplorinc

This comment has been minimized.

Show comment
Hide comment
@paplorinc

paplorinc Apr 9, 2017

@zvozin, I think that would be a very valuable contribution! :)

@zvozin, I think that would be a very valuable contribution! :)

@danieldietrich

This comment has been minimized.

Show comment
Hide comment
@danieldietrich

danieldietrich Apr 10, 2017

@zvozin That sounds really great! I will setup an empty Github repository javaslang-kotlin and invite you (with write privileges).

danieldietrich commented Apr 10, 2017

@zvozin That sounds really great! I will setup an empty Github repository javaslang-kotlin and invite you (with write privileges).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment