# Priority Queues

A **priority queue** is a variant of sorting that generalizes the idea to provide a more flexible data structure, that you can then use for other applications. It doesn't remove the most recently added item (stack), the least recently added item (queue), or a random item (randomized queue), but the **largest** (or **smallest**) item. The items need to be comparable.

Some applications for priority queues are:

- Event-driven simulations (customers in a line, colliding particles)
- Numerical computation (reducing roundoff errors)
- Data compression (Huffman codes)
- Graph searching (Dijkstra's algorithm, Prim's algorithm)
- Number theory (sum of powers)
- Artificial intelligence (A\* search)
- Statistics (maintain largest $M$ values in a sequence)
- Operating systems (load balancing, interrupt handling)
- Discrete optimization (bin packing, scheduling)
- Spam filtering (Bayesian spam filter)

A priority queue is an abstraction for a stack, queue, or randomized queue.

Client example: finding the largest $M$ items in a stream of $N$ items. This applies to fraud detection (you want to isolate the largest dollar transactions) or file maintenance (you want to find the biggest files or directories). For each new item, insert an item if it's larger than the smallest of the top $M$ biggest ones, then remove the smallest.

Several ways to implement a priority queue. One way is you can tack new items onto a linked list, then scan entire set to find the maximum when needed (insert is constant time, but removing and finding the max is $N$ running time). Another way is to order the items as they are added (insert is $N$ running time, remove max and find max are constant). The goal is $\log N$ for all operations.

## Binary Heaps

A **binary heap** is a simple data structure based on a binary tree. A **binary tree** is either empty or it's a node with links to the left and right binary tree. A complete tree is perfectly balanced except for the bottom level (each level is full). The height of the complete tree with $N$ nodes is $\log N$ and only increases when $N$ is a power of two.

In the array representation, indices start at 1 (ignore 0), then you take the nodes in level order. For a **heap-ordered binary tree**, the keys are in the nodes and a parent node's key is no smaller than its children's keys. The largest key `a[1]` is the root of the binary tree. You can use the array indices to move through the tree - the parent of the node at $k$ is $k/2$ (integer division). The children of the node at $k$ are $2k$ and $2k + 1$