Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A fast & memory-efficient data structure that cat tell if it saw a string before
C++ JavaScript C
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
doc
inc
src
test
.gitignore
.npmignore
Makefile
README.md
binding.gyp
package.json

README.md

node-elephant

Elephant Logo

A fast & memory-efficient data structure that cat tell if it saw a string before

Summary

In case you need a fast, memory-efficient data structure that would remember whether or not it had seen a string before, for example you are running through a list of filename with possible duplicates that you do not want to double-process, or through long list of emails/GUIDs with possible duplicate that you wanna deal with, here is the module for you.

The Node.js/c++ binding is 2/3 times faster, and uses 6 times less memory than a JavaScript implementation. The pure c++ is even faster and uses 8 times less memory than the nodejs binding, that is 200% faster than pure JavaScript, and 50 times (5000%) more memory-efficient*. Check the Time Table for details.

* For some reason arguments passed from JavaScript to c++ lingers in memory, if you have a way to set them free let me know

Prerequisites

You will need google sparsehash, under Ubuntu you can do sudo apt-get install sparsehash.

Usage

Check the test/time-binding.js file for an example usage.

Example

var elephant = require('elephant')

var obj = new elephant.Elephant();

console.log(obj.memorize("hello")); // false: first time to be seen, increases stats_added by 1
console.log(obj.memorize("hello")); // true: seen before, increases stats_duplicates by 1
console.log(obj.remember("hello")); // true: remembered, told before to memorize it
console.log(obj.remember("hi"));    // false: not remembered, as has not been told to memorize it
console.log(obj.remember("hi"));    // false: still not remembered, of course

console.log(obj.stats());           // { added: 1, duplicates: 1 }

console.log('Success, because you can read this : )');

API

  • bool .memorize("string") tells the elephant to memorize a string, returns true if the string was seen before, and false otherwise.
  • bool .remember("string") ask the elephant whether it remembers a string (has seen, aka told to memorize that string before), returns true if the string was seen before, and false otherwise.
  • object .stats() returns some stats.

Time Table

Numbers between (brackets) refer to the difference between current cell value and the one to its left.

Command being timed "build/time-native 5000000" "node test/time-binding.js 5000000" "node test/time-node.js 5000000"
User time (seconds) 10.27 14.6 (142%) 22.18 (152%)
System time (seconds) 0 0.1 (Infinity%) 0.41 (410%)
Percent of CPU this job got 99% 100% 100%
Elapsed (wall clock) time (h:mm:ss or m:ss) 0:10.31 0:14.65 0:22.53
Average shared text size (kbytes) 0 0 (=) 0 (=)
Average unshared data size (kbytes) 0 0 (=) 0 (=)
Average stack size (kbytes) 0 0 (=) 0 (=)
Average total size (kbytes) 0 0 (=) 0 (=)
Maximum resident set size (kbytes) 44576 364368 (817%) 2188960 (601%)
Average resident set size (kbytes) 0 0 (=) 0 (=)
Major (requiring I/O) page faults 0 0 (=) 0 (=)
Minor (reclaiming a frame) page faults 2997 23614 (788%) 167480 (709%)
Voluntary context switches 1 14790 (1479000%) 19914 (135%)
Involuntary context switches 872 1253 (144%) 2008 (160%)
Swaps 0 0 (=) 0 (=)
File system inputs 0 0 (=) 0 (=)
File system outputs 0 0 (=) 0 (=)
Socket messages sent 0 0 (=) 0 (=)
Socket messages received 0 0 (=) 0 (=)
Signals delivered 0 0 (=) 0 (=)
Page size (bytes) 4096 4096 (=) 4096 (=)
Exit status 0 0 (=) 0 (=)

Roadmap

  • Add more hashes options, besides the default 32-bit MurmurHash3, probably FNV-1 32-bit & 64-bit..

Copyright & License

© 2013 Hasan Arous. All rights reserved.

Used libraries are copyrighted to their owners, as seen in files comments.

Mozilla Public License Version 2.0

Something went wrong with that request. Please try again.