Skip to content

Data Structures

dan-eyles edited this page Jul 25, 2013 · 34 revisions

Algorithms + Data Structures = Programs

SculeJS is built from the ground up on general purpose data structures, in fact you can probably just think of it as data structures with a rich query interface dropped on top. All of the structures used to support SculeJS collections and indices can also be used in any other sort of project you might be working on.

Let's take a look at what data structures are available in SculeJS. For the purposes of this article we'll assume that you're using the web based version of SculeJS rather than the NodeJS version, all example code will be written for the web unless otherwise stated. JSDoc detailing the complete API for all of the classes described below can be generated directly from the SculeJS source code.

Table of contents:

Linked Lists

Linked lists in SculeJS are null terminated data structures used for linear access to data in worst case O(n) time. If you need more information read the wikipedia article on the subject.

To instantiate a new, empty linked list simply make a call to getLinkedList function on the datastructures namespace - in the web based version of SculeJS the call would look like:

var list = Scule.getLinkedList();

Where as the NodeJS version would look like:

var datastructures = require('com.scule.datastructures');
var list = datastructures.getLinkedList();

Linked lists are handy for a number of reasons - they allow O(n) time sequential access to data with O(1) time insertion and deletion. Sorted lists usually provide sub-linear search time, keep that in mind.

SculeJS linked lists include some handy utility functions, explained below:

split( idx[Number] )

Splits the list at the specified index, returning the resulting linked list. This operation is in place, meaning that the list being split is modified and will include the 0 - n portion.

middle

Returns the middle index of the list as an integer value, the algorithm used to resolve the middle index uses two pointers, one fast and one slow, meaning it should complete in O(n/2) time.

reverse

Reverses the linked list in place.

forEach( callback[Function] )

An iterator style interface similar to the function of the same name on the Array prototype. This function provides sequential iteration on the contents of the linked list, executing the provided closure at each step with the current element as the sole argument.

var list = Scule.getLinkedList();
for(var i=0; i < 10000; i++) { 
   var n = Scule.global.functions.randomFromTo(1, 10000);
   list.insert(n, n);
}
list.forEach(function(element) {
   console.log(element.key + ' => ' + element.value);
});

sort ( cmp[Function], key[String] )

Sorts the list in place, optionally using the provided comparison function and field name - meaning you can actually change the behaviour of the sort by defining your own comparison logic. The merge sort algorithm is used internally.

var list = Scule.getLinkedList();
for(var i=0; i < 10000; i++) { 
   var n = Scule.global.functions.randomFromTo(1, 10000);
   list.insert(n, n);
}
list.sort(); // default key sort
list.clear();

for(var i=0; i < 10000; i++) { 
   var n = Scule.global.functions.randomFromTo(1, 10000);
   list.insert(n, {a:n, b:i});
}
list.clear();
list.sort(null, 'a'); // default sort on field 'a'

for(var i=0; i < 10000; i++) { 
   var n = Scule.global.functions.randomFromTo(1, 10000);
   list.insert(n, {a:n, b:i});
}
list.clear();
list.sort(function(a, b) {
   return a - b;
}, 'a'); // custom sort on field 'a'

SculeJS also includes a DoublyLinkedList class and a proxy class called CachingLinkedList that adds an LRU cache on top of either a doubly or singly linked list to speed up search time.

var list = Scule.getCachingLinkedList(30, 'a');
for(var i=0; i < 10000; i++) { 
   var n = Scule.global.functions.randomFromTo(1, 10000);
   list.insert(n, {a:n, b:i});
}
var value = list.search(300);

LRU Cache

SculeJS includes a general purpose, tunable LRU cache implementation. Internally the cache is built around a hash table with pointers to elements in a doubly linked list. The list acts as a priority queue and allows invalidation in constant time at the cost of using a little more space.

To instantiate a new, empty LRU cache simply make a call to getLRUCache function on the datastructures namespace - in the web based version of SculeJS the call would look like:

var cache = Scule.getLRUCache(30); // the cache threshold is 30 - entries will be trimmed once n >= 30
cache.put('myobject', {foo:'bar'});
var object = cache.get('myobject');
object.contains('myobject'); // true
object.remove('myobject');
object.contains('myobject'); // false

Stacks and Queues

SculeJS includes several stack classes - all of them descending from the LinkedList class. The stack implementations included are:

  • LIFO (last in first out) stack
  • FIFO (first in first out) stack
  • Queue

Queues are more or less FIFO stacks with enqueue and dequeue functions. Below is some example code showing how to instantiate and use stacks in SculeJS:

var stack = Scule.getLIFOStack();
stack.push(1);
stack.push(2);
stack.push(3);
var value = stack.pop(); // value is 3

stack = Scule.getFIFOStack();
stack.push(1);
stack.push(2);
stack.push(3);
value = stack.pop(); // value is 1

var queue = Scule.getQueue();
queue.enqueue(1);
queue.enqueue(2);
queue.enqueue(3);
value = queue.dequeue(); // value is 1

Hash Tables

SculeJS includes two hash table classes:

  • HashTable - a wrapper around a JavaScript associative array providing standard hash table functions
  • HashMap - an actual hash table written in JavaScript implementing the same class contract as the HashTable class

Which is the best one to use? Frankly, HashTable is much faster for insertion although both data structures have similar search performance. I wrote HashMap because I hadn't implemented a hash table since my data structures class back in university and I was interested in seeing what the performance characteristics would be like.

Internally the HashMap class uses the Jenkins hash function, however you can substitute your hash function of choice by setting the hash pointer on the HashMap function prototype.

var map = Scule.getHashMap(1000);
map.hash = function (key, size) { // the elf hash function
  var h = 0, g;
  for(var i=0; i < key.length; i++) {
     h = ( h << 4 ) + key.charCodeAt(i);
     if ( g = h & 0xF0000000 ) {
        h ^= g >> 24;
     }
     h &= ~g;
  }
  return h % size;        
};

The class also includes an implementation of the djb2 hash function. Based on cursory testing both functions are performant and provide an adequately even distribution of outputs. I've included murmur2/murmur3 and elf hash function implementations.

The HashMap class uses separate chaining to handle collisions, with buckets being instances of the Binary Search Tree class.

The data structure will resize itself geometrically using a factor of 2 once the load factor reaches approximately 0.7. This means that search time remains roughly constant, however the performance of inserts is affected. For best performance it's a good rule of thumb to allocate a table 1.5 times larger than the data set you expect to store in it to avoid resizing during initial insertion.

var table = Scule.getHashTable();
for(var i=0; i < 10000; i++) {
   table.put('foo' + i, 'bar' + i);
}
table.get('foo1000');

var map = Scule.getHashMap(1000);
for(var i=0; i < 10000; i++) {
   map.put('foo' + i, 'bar' + i);
}
map.get('foo1000');

Binary Search Tree

This is pretty self explanatory - it's a binary search tree. The structure should provide worst case O(log(n)) time complexity for search, insertion, and deletion. The SculeJS BST implementation isn't self balancing, however you can manually balance the tree by calling the balance function on the BinarySearchTree class.

var tree = Scule.getBinarySearchTree();
for(var i=0; i < 10000; i++) {
   var n = Scule.global.functions.randomFromTo(1, 10000);
   tree.insert(n, n);
   if(i%10 == 0) {
     tree.balance();
   }
}
tree.search(1000); // should be 1000
tree.remove(1000);
tree.search(1000); // should be null

Bit Set

SculeJS includes a bit set (bit array) implementation that uses bitwise operations on 32-bit integer words to perform peeks and pokes on individual bits. Bit sets are great for compactly representing series of data in memory (for example - bit map indices).

bitmap = Scule.getBitSet(1024);
bitmap.set(512);
bitmap.get(512); // true
bitmap.get(500); // false
bitmap.clear(512);
bitmap.get(512); // true

Bloom Filter

Bloom filters are an extension on Bit Sets and represent a family probabilistic data structures that allow efficient look ups on series data. I won't go into the explanation of how Bloom filters work, if you want to read more on the subject go check out the wikipedia article.

The SculeJS Bloom filter is based on the general pattern described by Burton Howard Bloom back in 1970. Internally the class uses the Fowler–Noll–Vo hash function with two random input values to create k permutations of the function.

filter = Scule.getBloomFilter(1024, 15); // 1024 bits with 15 hash functions
filter.add('key1');
filter.add('key2');
filter.add('key3');
filter.query('key2'); // true
filter.query('key4'); // false - maybe?

Counters

Last but not least, SculeJS includes a simple atomic counter implementation.

var counter = Scule.getAtomicCounter(0);
counter.increment(1);
counter.getCount(); // 1
counter.increment(10);
counter.getCount(); // 11
counter.decrement(5);
counter.getCount(); // 6