Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

Frequently Asked Questions

derknorton edited this page Dec 18, 2014 · 5 revisions

Why is this project called a collection framework?

The main goal of the project is to provide a simple to use set of the typical collection classes that needed by most programming projects. But a secondary goal is to do this in a way that makes it easy to define your own special purpose collection classes as well. The abstract classes (abstractions) provided in this project define a framework within which to define both the typical collection classes which come with the framework but also any additional collection classes that the user might want to define themselves. This project even takes it one step further by having the abstractions implement a set of interfaces also provided by the project. The interfaces define the different views a collection might want to expose and can be used to define additional collection frameworks or extend the existing framework.

Why aren't the standard Java collections good enough?

This is obviously a great question to ask since so many projects already use the standard Java collections without too many issues and they are therefore very well tested. Nevertheless, they do suffer from "old age". They were developed nearly two decades ago at a time when people weren't as familiar with good (and more importantly pragmatic) object oriented techniques and were initially done void of any coherent framework. This makes them hard to understand and a bit tedious to use. There are also newer, better performing algorithms available now that weren't available when the original classes were first implemented tying them to older approaches for backward compatibility.

Why do the collection indexes start with one instead of zero?

There is a long standing tradition of using zero-based indexing when accessing arrays and collections. This dates back to the days of C where the indexes were referring directly to offsets in memory. The need for this convention has long passed, and it is now time to follow a more intuitive indexing strategy that makes sense to humans and fits in better with the additional reversed indexing style supported by many of the newer scripting languages. Humans number things using ordinal numbers "first", "second", "third", or 1, 2, 3. This project uses the same approach for the indexes in its collections.

How do the collections "self optimize"?

The primitive collections that underlie the List, Dictionary, Map, Queue, and Stack collections manage the size of the arrays that are used to store their elements. When an array gets full, its size is doubled. When an array gets to 1/4 full, its size is halved. This approach ensures that memory is reclaimed as the size of the collections shrink. The implementations also originally optimized the underlying structure between an array and a linked list based on the ratio of reads to writes but performance analysis found that the array is significantly better in very nearly all cases, so now a linked list is only used to order the associations in the Map and Dictionary classes, with a hash table being used to index each link in the list of associations.

I know what interfaces and classes are, but what are abstractions?

We us the term abstractions for abstract classes that collectively define a framework. They abstract away the details of how things are done but still enforce constraints or invariants on when and under what circumstances these things can be done. In the case of the collections framework the abstract classes pull together the interfaces that define what it means to be an open or closed collection, or an ordered or sortable collection. They also implement the invariant methods for each abstraction.

Why are the method names so long?

Naming things in programs with short names is another throwback to a time when programmers wrote all code by typing it into a terminal (or worse, onto punch cards). These days most programmers use an IDE to help them write code. Any IDE worth its salt does auto-completion when typing names, so long names are no longer a hindrance, in fact they become an asset by allowing the code to be more self-documenting.

Why are all sets (and bags) ordered?

Well, technically all collections have some sort of order, even if it looks random to the user. But for higher level collections, having a random order rarely provides any benefits. For sets and bags the order is determined implicitly based on some predefined comparison mechanism. By default, the elements in a set or bag are maintained in their natural order as defined by the compareTo(E element) method defined on the element type. Alternative comparison mechanisms (including random) can be specified as well.

What is the deference between an ordered collection and a sortable collection?

Both ordered and sortable collections maintain their elements in a specific order. The difference is whether the order is determined explicitly by the user or implicitly by a predefined comparison mechanism. The order of the elements in a sortable collection is determined by the user. The user can insert the elements anywhere in the collection and can adjust or sort them as needed later on. The order of the elements in an ordered collection is determined by applying a predefined comparison mechanism to each element to make sure they stay in that predefined order. The user cannot explicitly change the order of the elements in an ordered collection.

What is the difference between a dictionary and a map?

A dictionary is just a specialization of a general map that always uses strings as its key type. Maps on the other hand may have any type of object as its key type.

What do the acronyms FIFO and LIFO mean?

These two acronyms define the order in which elements are added and removed from a list. FIFO stands for "first in first out" and is usually associated with a queue. LIFO stands for "last in first out" and is usually associated with a stack.

Why isn't there a linked list among the collection classes?

A linked list is a lower level structure that can be used to implement higher level collections. From a performance perspective though, it is rather rare that a linked list will be better suited for an implementation than a well written dynamic array.

Why is a randomized binary tree better than other tree implementations like red-black, splay, or treap?

It of course depends on what criteria is used to compare the approaches, but here are some of the advantages we see:

  • A randomized binary tree does not require rebalancing but rather spreads the structure modification out over all the update calls on the tree.
  • The randomized binary tree approach was developed to address some of the problems with the treap approach. It is simpler to implement and avoids a rare index collision that can occur with treap.
  • The splay approach has a problem where the tree height can become linear when all of its elements are accessed in increasing order.
  • The red-black approach does not amortize the rebalancing over all operations so performance is unpredictable.

Here is a summary of the worst case big O analysis of the different algorithms:

Algorithm Space Search Insert Delete
randomized O(n) amortized O(log n) amortized O(log n) amortized O(log n)
treap O(n) amortized O(log n) amortized O(log n) amortized O(log n)
splay O(n) amortized O(log n) amortized O(log n) amortized O(log n)
red-black O(n) O(log n) O(log n) O(log n)