Skip to content
A fast & memory-efficient data structure that cat tell if it saw a string before
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Elephant Logo

A fast & memory-efficient data structure that cat tell if it saw a string before


In case you need a fast, memory-efficient data structure that would remember whether or not it had seen a string before, for example you are running through a list of filename with possible duplicates that you do not want to double-process, or through long list of emails/GUIDs with possible duplicate that you wanna deal with, here is the module for you.

The Node.js/c++ binding is 2/3 times faster, and uses 6 times less memory than a JavaScript implementation. The pure c++ is even faster and uses 8 times less memory than the nodejs binding, that is 200% faster than pure JavaScript, and 50 times (5000%) more memory-efficient*. Check the Time Table for details.

* For some reason arguments passed from JavaScript to c++ lingers in memory, if you have a way to set them free let me know


You will need google sparsehash, under Ubuntu you can do sudo apt-get install sparsehash.


Check the test/time-binding.js file for an example usage.


var elephant = require('elephant')

var obj = new elephant.Elephant();

console.log(obj.memorize("hello"));	// false: first time to be seen, increases stats_added by 1
console.log(obj.memorize("hello"));	// true: seen before, increases stats_duplicates by 1
console.log(obj.remember("hello"));	// true: remembered, told before to memorize it
console.log(obj.remember("hi"));	// false: not remembered, as has not been told to memorize it
console.log(obj.remember("hi"));	// false: still not remembered, of course

console.log(obj.stats());			// { added: 1, duplicates: 1 }

console.log('Success, because you can read this : )');


  • bool .memorize("string") tells the elephant to memorize a string, returns true if the string was seen before, and false otherwise.
  • bool .remember("string") ask the elephant whether it remembers a string (has seen, aka told to memorize that string before), returns true if the string was seen before, and false otherwise.
  • object .stats() returns some stats.

Time Table

Numbers between (brackets) refer to the difference between current cell value and the one to its left.

Command being timed"build/time-native 5000000""node test/time-binding.js 5000000""node test/time-node.js 5000000"
User time (seconds)10.2714.6 (142%)22.18 (152%)
System time (seconds)00.1 (Infinity%)0.41 (410%)
Percent of CPU this job got99%100%100%
Elapsed (wall clock) time (h:mm:ss or m:ss)0:10.310:14.650:22.53
Average shared text size (kbytes)00 (=)0 (=)
Average unshared data size (kbytes)00 (=)0 (=)
Average stack size (kbytes)00 (=)0 (=)
Average total size (kbytes)00 (=)0 (=)
Maximum resident set size (kbytes)44576364368 (817%)2188960 (601%)
Average resident set size (kbytes)00 (=)0 (=)
Major (requiring I/O) page faults00 (=)0 (=)
Minor (reclaiming a frame) page faults299723614 (788%)167480 (709%)
Voluntary context switches114790 (1479000%)19914 (135%)
Involuntary context switches8721253 (144%)2008 (160%)
Swaps00 (=)0 (=)
File system inputs00 (=)0 (=)
File system outputs00 (=)0 (=)
Socket messages sent00 (=)0 (=)
Socket messages received00 (=)0 (=)
Signals delivered00 (=)0 (=)
Page size (bytes)40964096 (=)4096 (=)
Exit status00 (=)0 (=)


  • Add more hashes options, besides the default 32-bit MurmurHash3, probably FNV-1 32-bit & 64-bit..

Copyright & License

© 2013 Hasan Arous. All rights reserved.

Used libraries are copyrighted to their owners, as seen in files comments.

Mozilla Public License Version 2.0

You can’t perform that action at this time.