GitHub - clojure/data.int-map: A map optimized for integer keys

CLI/deps.edn dependency information:

org.clojure/data.int-map {:mvn/version "1.3.0"}

Leiningen:

[org.clojure/data.int-map "1.3.0"]

This library has special implementations of immutable maps and sets, optimized for integer keys. They are both faster in both updates and lookups than normal Clojure data structures, but also more memory efficient, sometimes significantly.

Maps

> (require '[clojure.data.int-map :as i])
nil
> (i/int-map)
{}

These maps support transient/persistent semantics, and also provide special merge, merge-with, update, and update! methods that provide significantly faster performance than their normal Clojure counterparts. The elements must be in the range [Long/MIN_VALUE, Long/MAX_VALUE]. They can be used to represent normal maps which have integral keys, or sparse vectors.

The fact that int-maps are mergeable means that they can be used very effectively with Clojure's reducer mechanism. For instance, consider populating a data structure. Typically, we'd use into:

> (into {} [[1 2] [3 4]])
{1 2, 3 4}

Under the covers, into looks something like this:

> (persistent!
    (reduce conj!
      (transient {})
      [[1 2] [3 4]]))

This makes use of Clojure's transients, but is still inherently sequential. This is because merging together standard Clojure maps is an O(N) operation, so any parallel work would still result in a linear walk of the map's entries. However, int-maps merges are typically much faster, which means we can build sub-maps on parallel threads, and then cheaply merge them together. We can use the fold method in clojure.core.reducers to easily express this:

> (require '[clojure.core.reducers :as r])
nil
> (r/fold i/merge conj entries)
...

If entries is a data structure that fold can split, such as a vector or hash-map, the performance benefits of this are significant. Consider this table, which gives the times on a four-core system for populating a map with a million entries, with all values in milliseconds:

	unsorted entries	sorted entries
`(into {} ...)`	630	500
`(into (sorted-map) ...)`	2035	1080
`(into (i/int-map) ...)`	529	187
`(fold i/merge conj ...)`	273	53

As we can see, the int-map implementation is faster in all cases, and an entire order of magnitude faster when using fold on ordered entries.

Sets

> (require '[clojure.data.int-map :as i])
nil
> (i/int-set)
#{}
> (i/dense-int-set)
#{}

There are special union, intersection and difference operators for these sets, which are significantly faster than those in clojure.set. The elements must be in the range [Long/MIN_VALUE, Long/MAX_VALUE].

dense-int-set behaves the same as int-set, the difference is only in their memory efficiency. Consider a case where we create a set of all numbers between one and one million:

(def s (range 1e6))

(into #{} s)              ; ~100mb
(into (int-set) s)        ; ~1mb
(into (dense-int-set) s)  ; ~150kb

Both of these are significantly smaller than the standard set, but the dense int-set is almost an order of magnitude smaller than the normal int-set. This is because the dense int-set allocates larger contiguous chunks, which is great if the numbers are densely clustered. However, if the numbers are sparse:

(def s (map (partial * 1e6) (range 1e6)))

(into #{} s)               ; ~100mb
(into (int-set) s)         ; ~130mb
(into (dense-int-set) s)   ; ~670mb

In this case, the dense int-set is much less efficient than the standard set, while the normal int-set is equivalently large. So as a rule of thumb, use dense-int-set where the elements are densely clustered (each element has multiple elements within +/- 1000), and int-set for everything else.

Developer information

data.int-map is being developed as a Clojure Contrib project, see the What is Clojure Contrib page for details. Patches will only be accepted from developers who have signed the Clojure Contributor Agreement.

License

Distributed under the Eclipse Public License, the same as Clojure.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
epl.html		epl.html
pom.xml		pom.xml
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Maps

Sets

Developer information

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Languages

License

clojure/data.int-map

Folders and files

Latest commit

History

Repository files navigation

Maps

Sets

Developer information

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Languages

Packages