---

# Garbage Collection
**[Wentao Sun](mailto:sunw53@mcmaster.ca), McMaster University, June 2024**

---

### Introduction

Manual garbage collection:
In C: malloc then free. Basically reserve the memory and you have to free the physical memory at the end of the program in order to avoid resource occupation. Failure to do so might increase the chance of memory leak and eventually run out of available memory. Automatic Memory Management System will work out when finished using memory and give memory back to the OS. That means only malloc is needed without calling free. 


Almost all modern programming languages make use of `dynamic memory allocation`. This allows objects to be allocated and deallocated even if their total size was not known at the time that the program was compiled, and if their lifetime may exceed that of the subroutine activation that allocated them. A dynamically allocated object is stored in a heap, rather than on the stack (in the activation record or stack frame of the procedure that allocated it) or statically (whereby the name of an object is bound to a storage location known at compile or link time). 

Heap allocation is particularly important because it allows the programmer to:

    1. choose dynamically the size of new objects (thus avoiding program failure through exceeding hard-coded limits on arrays);
    2. deﬁne and use recursive data structures such as lists, trees and maps;
    3. return newly created objects to the parent procedure (allowing, for example, factorymethods);
    4. return a function as the result of another function (for example, closures or suspensionsin functional languages, or futures or streams in concurrent ones).

The goal of an ideal garbage collector is to reclaim the space used by every object that will no longer be used by the program. Any automatic memory management system has three tasks and they're not independent:

    1. to allocate space for new objects;
    2. to identify live objects as well as dead ones; 
    3. to reclaim the space occupied by dead objects.

### Some Terminologies

`Roots:` Typically refer to objects that are directly accessible or referenced by the interpreter, e.g registers, thread stacks, global variables. These objects serve as starting points for the garbage collector to traverse the object graph and determine which objects are reachable and which ones are unreachable for the purpose of memory management. `Any object reference from a variable is a root.`

`Dangling references: ` Pointers or references in a program that point to memory locations that have been deallocated, released, or otherwise invalidated. In simpler terms, they are references that are left "hanging" without a valid target. 

Here's 3 possible scenarios:

1.Deallocated Memory: If memory allocated for an object is freed or deallocated, but there are still pointers or references pointing to that memory location, those pointers become dangling references.

2.Out-of-Scope References: In languages like C or C++, if a pointer to an object is stored in a local variable within a function and that function returns, the local variable goes out of scope, but if the pointer is still accessed or used elsewhere, it becomes a dangling reference because it points to memory that may have been reused for other purposes.

3.Object Ownership: In languages without garbage collection, if an object is manually deallocated while other parts of the program still hold references to it, those references become dangling.

`Strong Reference:` A strong reference is a direct reference to an object that prevents it from being garbage-collected. As long as there is at least one strong reference to an object, the garbage collector considers it "alive" and will not reclaim its memory. Most references in programming are strong references by default. For example, in languages like Java and C#, simply declaring and assigning an object to a variable creates a strong reference. These references ensure that the object remains in memory throughout its use, preventing it from being collected prematurely.

`Weak Reference:` A weak reference, on the other hand, does not prevent the garbage collector from reclaiming the object it points to. This means that an object with only weak references is eligible for garbage collection and can be collected at any time when the system runs a garbage collection cycle. Weak references are useful in situations where you need a reference to an object without preventing it from being garbage-collected, which is especially valuable in scenarios involving caching, listeners, or large data structures where memory management is crucial.

`Unowned reference`: They are similar to weak references, but they are used when the other instance has the same or longer lifetime. An unowned reference is expected not to become nil once assigned, unlike a weak reference.

`Mutators:` These are essentially the application threads that run the main logic of the program. A mutator modifies the heap memory by allocating new objects and potentially making some objects unreachable. The term "mutator" is used because these threads mutate the state of the memory, continually changing which objects are accessible and which are not.

`Collectors:` These are part of the garbage collection system. A collector's job is to identify memory that is no longer in use and can be reclaimed. The collector operates in its own thread or set of threads separate from those of the application. It scans the memory, marks unused objects, and eventually deallocates or recycles this memory, making it available for future use.





### 1. Reference Counting

When allocating memory, put a little counter on the object header saying how many parts of the program are still using this. Local variables and formal parameters cause decrements at the end of 
their lifetime. `Assigning nil causes a decrement.` Reference counting maintains a simple invariant: an object is presumed to be live if and only if the number of references to that object is greater than zero. Reference counting therefore associates a reference count with each object managed; typically this count is stored as an additional slot in the object's header.
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="1.png"></img> 

In contrast to the tracing algorithms like mark&sweep, **Algorithm 5.1** redefines all pointer `Read` and `Write` operations in order to manipulate reference counts. Even non-destructive operations such as iteration require the reference counts of each element in the list to be incremented and then decremented as a pointer moves across a data structure such as a list.

#### Algorithm 5.1

In [2]:
class Reference:
    def __init__(self):
        self.count = 0

def allocate():
    # This function would typically allocate a resource or memory
    return Reference()

def add_reference(ref):
    if ref is not None:
        ref.count += 1

def delete_reference(ref):
    if ref is not None:
        ref.count -= 1
        if ref.count == 0:
            free(ref)

def free(ref):
    # Here you would typically release the allocated resource
    print("Resource freed.")

class Memory:
    def __init__(self):
        self.references = []

    def new(self):
        ref = allocate()
        if ref is None:
            print("Out of memory")
            return None
        ref.count = 0
        return ref

    def write(self, src, i, ref):
        add_reference(ref)
        if i < len(src) and src[i] is not None:
            delete_reference(src[i])
        if i >= len(src):
            src.append(ref)
        else:
            src[i] = ref

This is a `direct collection method`, direct algorithms determine the liveness of an object from the object alone, without recourse to tracing.

This method could have some problems: 

`1. Add counter and forgot to decrement: ` The most direct consequence of not decrementing a reference count when you should is a memory leak. If a reference count is never decremented properly after the last usage of the object, the counter never reaches zero. This prevents the garbage collector or memory management system from reclaiming the object’s allocated memory, even though it's no longer in use. Over time, these leaks can consume significant amounts of memory, degrading system performance and stability. E.g: Complex Control Flows, Exception Handling, Manual Management.

`2. Performance issues: ` From a performance point of view, it is particularly undesirable to add overhead to operations that manipulate registers or thread stack slots. For this reason alone, this naive algorithm is impractical for use as a general purpose, high volume, high performance memory manager.

`3. Can’t deal with cycles:` Cycles occur when two or more objects reference each other in a circular manner. In such cases, even though there are no external references to the objects involved in the cycle, their reference counts will never reach zero because they are still referencing each other. Because of this, memory occupied by cyclically referenced objects will never be deallocated using reference counting alone, leading to what's known as a memory leak. <img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="2.png"></img>  

Applications of Reference Counting:

    better suited for real-time programming;
    used in distributed systems, where tracing all pointers is impractical;
    used in the Unix file system;
    used in the Swift language: strong, weak/unowned references are distinguished



#### Swift Code Example

Here's an example where we have a `Person` class and a `CreditCard` class. A person may or may not have a credit card, and each credit card must be associated with a person. To avoid a retain cycle (where each instance holds a strong reference to the other, preventing both from ever being deallocated), the CreditCard class holds an `unowned reference` to a Person.

In [None]:
%%writefile example.swift
class Person {
    let name: String
    var creditCard: CreditCard?
    
    init(name: String) {
        self.name = name
    }
    
    deinit {
        print("\(name) is being deinitialized")
    }
}

class CreditCard {
    let number: UInt64
    unowned let owner: Person
    
    init(number: UInt64, owner: Person) {
        self.number = number
        self.owner = owner
    }
    
    deinit {
        print("Card \(number) is being deinitialized")
    }
}

// Example usage:
var john: Person? = Person(name: "John Doe")
if let person = john {
    john?.creditCard = CreditCard(number: 1234567890123456, owner: person)
}

// Break the strong reference to see deinitialization in action
john = nil  // This should trigger deinitialization of both `Person` and `CreditCard` instances.


In this example:

john is a `strong reference` to a new Person instance.
Person has an `optional strong reference` to a CreditCard. However, the card has an `unowned reference` back to the person who owns it.
Setting john to nil removes the strong reference to the Person instance. Since there are no more strong references to this Person, it is deallocated, and since the CreditCard only holds an `unowned reference` to Person, it is also deallocated. This demonstrates how unowned references help prevent retain cycles without the need for the reference to become nil (which is necessary for weak references).

#### Languages Using Reference Counting

Several programming languages use reference counting, either as the primary method of memory management or as part of a hybrid system:

`Python`: This is perhaps the most well-known language that uses reference counting as a primary mechanism to manage memory. Python complements reference counting with a cyclic garbage collector to deal with reference cycles, which reference counting alone cannot handle.

`Objective-C`: Before the adoption of Automatic Reference Counting (ARC), Objective-C used manual reference counting. ARC itself is a form of compiler-enforced reference counting, automating much of the manual counting that was previously required.

`Perl`: Perl uses reference counting to manage its memory, cleaning up memory when variables go out of scope and their reference counts fall to zero.

`PHP`: PHP uses reference counting to manage memory in its zval (value) data structures, although it also includes a cycle collector to manage reference cycles.

#### Historical Origin

The concept of reference counting as a form of garbage collection dates back to the early days of computer science. One of the earliest systems to use reference counting was developed by George E. Collins in 1960. However, the technique became more broadly recognized and used after it was described by L. Peter Deutsch and Daniel G. Bobrow in their 1976 paper on reference counting for dynamic storage allocation. (See more in bibliography)

### 2. Deferred Reference Counting

Manipulating all reference counts is expensive compared with the cost to the mutator of simple tracing algorithms. Most high-performance reference counting systems use deferred reference counting. The overwhelming majority of pointer loads are to local and temporary variables, that is, to registers or stack slots. A way to reduce this overhead is to keep only an approximate count: pointer stores to local variables do not update the count, `only stores to the heap do`; the count reflects then the number of heap pointers only. The deferred aspect means that updates to the reference count and the associated garbage collection are not performed immediately but are `deferred` to a later time, which can help in optimizing performance especially in `concurrent execution environments`. If the count reaches zero, then the stack has to be scanned to check if local variables are pointing to the object before it can be recycled. This can be deferred by placing objects with zero count in a zero count table first and scanning that table periodically as shown in **Algorithm 5.2**.

#### Algorithm 5.2

In [None]:
class Reference:
    def __init__(self):
        self.count = 0  # Initial reference count is set to zero.

def allocate():
    return Reference()

def add_reference(ref):
    if ref is not None:
        ref.count += 1

def delete_reference_to_zct(ref, zct):
        ref.count -= 1
        if ref.count == 0:
            zct.add(ref)

def free(ref):
    print("Resource freed.")

class ZeroCountTable:
    """Class to manage the Zero Count Table (ZCT), which tracks objects with zero references."""
    def __init__(self):
        self.items = set()  # Use a set to manage unique items with zero counts.
    
    def add(self, item):
        self.items.add(item)
    
    def remove(self, item):
        self.items.remove(item)
    
    def is_empty(self):
        return len(self.items) == 0

    def pop(self):
        return self.items.pop()

def collect(zct, roots):
    """Garbage collection process: mark and sweep phase."""
    for ref in roots:
        add_reference(ref)  # Mark
    sweep_zct(zct)
    for ref in roots:
        delete_reference_to_zct(ref, zct)  # Unmark

def sweep_zct(zct):
    """Sweep through the ZCT and free resources with zero effective references."""
    while not zct.is_empty():
        ref = zct.pop()
        if ref.count == 0:
            free(ref)

class Memory:
    """Simulates memory management with reference counting."""
    def __init__(self):
        self.references = []
        self.zct = ZeroCountTable()
        self.roots = set()  # Simulating root set

    def new(self):
        """Allocate a new resource and manage out of memory by attempting garbage collection."""
        ref = allocate()
        if ref is None:
            collect(self.zct, self.roots)  # Attempt to collect garbage if allocation fails
            ref = allocate()
            if ref is None:
                print("Out of memory")
                return None
        ref.count = 0
        self.zct.add(ref)
        return ref

    def write(self, src, i, ref):
        """Write a reference to a source container, managing references appropriately."""
        if src == self.roots:
            src[i] = ref
        else:
            add_reference(ref)
            if i < len(src) and src[i] is not None:
                delete_reference_to_zct(src[i], self.zct)
            if i >= len(src):
                src.append(ref)
            else:
                src[i] = ref

#### Languages Using Deferred Reference Counting

Deferred reference counting is used less frequently as a primary memory management technique in mainstream programming languages. However, it has been implemented in various systems and experimental languages, often to optimize performance in specific scenarios:

`Python`: Python’s CPython implementation doesn't use deferred reference counting by default, but there have been experimental modifications and proposals to incorporate deferred reference counting to optimize its performance.

`Real-Time Systems`: Some real-time programming environments may implement deferred reference counting to reduce the frequency of reference count updates, minimizing the impact on the system's real-time performance.

`Research Languages`: Various research-focused programming languages or experimental language extensions have explored deferred reference counting to study its effects on memory management and system performance.

#### Historical Origin

The concept of deferred reference counting has been around as a theoretical improvement to standard reference counting since at least the 1980s and 1990s. One key challenge with standard reference counting is the performance cost associated with incrementing and decrementing the reference count each time a reference is made or destroyed. By deferring these updates, a system can potentially batch these operations during less critical times, reducing the immediate computational overhead.

Deferred reference counting was discussed in academic papers as a way to optimize garbage collection without the need for a full garbage collector, particularly in systems where consistent performance is critical, such as in embedded or real-time systems. (See more in bibliography)

### 3. Mark & Sweep

Garbage collection works in two phases(Mark & Sweep). We assume that each object 
on the heap has one spare bit for marking and unmarking:

`Mark Phase:`
The mark phase involves traversing the entire object graph starting from the roots (objects directly accessible by the program) and marking each object encountered as reachable.

Objects that are reachable are typically marked using a flag or a bit in their header to indicate that they are still in use.
During the traversal, the algorithm follows references from one object to another, marking each object as it encounters them. Such a traversal is called `tracing`.
This phase ensures that all objects reachable from the roots are identified and marked as live.

`Sweep Phase:`
The sweep phase involves traversing the entire heap (memory space allocated for objects) and deallocating memory for objects that are not marked as reachable.
The algorithm scans through each memory block, checking whether the corresponding object is marked. If it's marked, it means the object is still in use, so the mark is cleared. If it's not marked, it means the object is no longer reachable and can be safely deallocated.
After sweeping through all memory blocks, the algorithm frees the memory associated with unmarked objects, making it available for future allocations.

The mark-sweep interface with the mutator is very simple. If a thread is unable to allocate a new object, the collector is called and the allocation request is retried (**Algorithm 2.1**). To emphasize that the collector operates in stop-the-world mode, without concurrent execution of the mutator threads, we mark the collect routine with the `atomic` keyword. If there's still insufficient memory available to meet the allocation request, then heap memory is exhausted. Often this is a fatal error. However, in some languages, New may raise an exception in this circumstance that the programmer may be able to catch. If memory can be released by deleting references (for example, to cached data structures which could be recreated later if necessary), then the allocation request could be repeated.

#### Algorithm 2.1

In [None]:
class Reference:
    """Class to simulate an object that can be managed by garbage collection."""
    def __init__(self, data):
        self.data = data
        self.marked = False

def allocate():
    """Simulate memory allocation. We limit the number of allocations to simulate a full heap."""
    if len(heap) < HEAP_LIMIT:
        ref = Reference("data")
        heap.append(ref)
        return ref
    else:
        return None

def mark_from_roots(roots):
    """Mark all reachable objects starting from the root set."""
    for root in roots:
        if not root.marked:
            mark(root)

def mark(ref):
    """Recursively mark all references reachable from this reference."""
    ref.marked = True
    for child_ref in ref.data:  # Assuming ref.data contains iterable references
        if isinstance(child_ref, Reference) and not child_ref.marked:
            mark(child_ref)

def sweep(start, end):
    """Sweep through the heap and free all unmarked objects."""
    global heap
    heap = [ref for ref in heap if ref.marked]
    for ref in heap:
        ref.marked = False  # Reset the mark for the next GC cycle

class MemoryManager:
    def __init__(self):
        self.roots = set()

    def new(self):
        """Allocate a new reference and perform garbage collection if allocation fails."""
        ref = allocate()
        if ref is null:
            self.collect()
            ref = allocate()
            if ref is null:
                raise MemoryError("Out of memory")
        return ref

    def collect(self):
        """Perform the mark-and-sweep garbage collection."""
        mark_from_roots(self.roots)
        sweep(0, len(heap))  # Simplified as we're not using actual memory addresses

Before traversing the object graph, the collector must first prime the marker's work list with starting points for the traversal (markFromRoots in **Algorithm 2.2**). Each root object is marked and then added to the work list.

#### Algorithm 2.2


In [None]:
class Node:
    def __init__(self):
        self.marked = False
        self.pointers = []

def mark_from_roots(roots):
    worklist = []
    initialise(worklist)
    for root in roots:
        if root is not None and not root.marked:
            set_marked(root)
            worklist.append(root)
            mark(worklist)

def initialise(worklist):
    worklist.clear()

def mark(worklist):
    while worklist:
        ref = worklist.pop(0)  # Assuming remove() takes the first item from the list
        for child in ref.pointers:
            if child is not None and not child.marked:
                set_marked(child)
                worklist.append(child)

def set_marked(node):
    node.marked = True

The sweep phase returns unmarked nodes to the allocator (**Algorithm 2.3**). Typically,the collector sweeps the heap linearly, starting from the bottom, freeing unmarked nodes and resetting the mark bits of marked nodes in preparation for the next collection cycle.Note that we can avoid the cost of resetting the mark bit of live objects if the sense of the bit is switched between one collection and the next.

#### Algorithm 2.3

In [None]:
class Node:
    def __init__(self):
        self.marked = False
        self.next = None  # Points to the next node in memory, simulating memory layout

def sweep(start, end):
    scan = start
    while scan != end:
        if scan.marked:
            unset_marked(scan)
        else:
            free(scan)
        scan = next_object(scan)

def unset_marked(node):
    node.marked = False

def free(node):
    print(f"Freeing node: {node}")

def next_object(node):
    return node.next

This is an `indirect collection algorithm`, meaning it doesn't detect garbage per se, but rather identifies all the live objects and then concludes that anything else must be garbage. Note that it needs to recalculate its estimate of the set of live objects at each invocation. Not all garbage collection algorithms behave like this.

Its `drawbacks` include fragmentation of memory and pauses during the garbage collection process, which can impact the performance of real-time or interactive applications. Java could left a pointer behind, that pointer could point to millions of other objects, taking up a lot of space. Also for each recursive call, space on the stack for the activation frame is needed. In case we are reclaiming the memory of a singly linked list, that may need more memory space than the list itself!

#### Languages Using Mark & Sweep

Mark and sweep is used either directly or as part of a more complex garbage collection system in several programming languages:

`Java`: Java's garbage collection is perhaps the most well-known user of mark and sweep, though it combines this method with others, such as generational and concurrent mark-sweep, in its various garbage collectors like the Concurrent Mark Sweep (CMS) and G1 (Garbage First).

`JavaScript`: Modern JavaScript engines like V8 (used in Chrome) and SpiderMonkey (used in Firefox) employ variations of the mark and sweep algorithm, often optimized with techniques like incremental and generational collection to improve performance.

`Ruby`: Ruby's garbage collector has also implemented mark and sweep, particularly in its newer versions, to manage memory more efficiently.

`Python`: While Python primarily uses reference counting, it also uses a generational garbage collector that employs mark and sweep to deal with cyclic references that reference counting cannot handle.

#### Historical Origin

The mark and sweep algorithm was first developed and described by John McCarthy around 1960 in connection with the development of the Lisp programming language, which required efficient memory management due to its use of symbolic expressions and frequent object creation. McCarthy's work on Lisp included the development of the first mark and sweep garbage collector to automatically manage memory, a crucial development for the advancement of programming languages that support complex data structures and dynamic memory allocation.

Mark and sweep was a significant improvement over earlier manual memory management techniques, where programmers had to explicitly allocate and free memory, a process prone to errors such as memory leaks and dangling pointers. The adoption of mark and sweep and its variations has been central to the development of high-level programming languages that abstract away much of the complexity of memory management, allowing developers to focus more on program logic rather than memory handling. (See more in bibliography)

### 4. In-Place Garbage Collection (Deutsch-Schorr-Waite)

This algorithm is used to modify pointer fields in objects during a `depth-first search`, making it suitable for garbage collection in languages like C and C++ where manual memory management is necessary. It's a non-recursive algorithm that updates the pointer fields during traversal to maintain a link back to the parent object from the current object. This algorithm utilizes two pointers, named `current` and `previous`, to establish these backward links. Each object in this context is structured with two pointer fields and an additional field that records the index (either 0 or 1) of the pointer that points to the parent.
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="3.png"></img>  
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="4.png"></img>  

One big disadvantage of this algorithm is that it can't run `concurrently` with anything. The DSW algorithm fundamentally modifies the data structures it traverses. It temporarily changes pointers in the data structure to point backwards rather than forwards, thereby eliminating the need for an explicit stack. These modifications mean that the data structure is in a transient, inconsistent state during traversal. Running the DSW algorithm concurrently with other operations that either read or modify the same set of nodes can lead to several concurrency hazards:

Race Conditions: If another process is trying to read or modify the data structure while it is being altered by the DSW algorithm, it might encounter incorrect or unexpected pointers, leading to errors or crashes.

Deadlocks: Since the DSW algorithm involves complex manipulations of pointers, running it in parallel with other processes that lock the same resources could lead to deadlocks where each process waits for the other to release resources indefinitely.

`Copying Collection:` The heap is divided into `from-space` and `to-space`. Local and global variables - the roots - point to the from-space:
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="5.png"></img>  
`free` points the next free location; if it reaches `limit`, garbage collection is initiated, copying all reachable objects to the to-space.

<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="6.png"></img> 

`Advantages` of this Coping Collection:

Allocation of new objects only involves incrementing the free pointer; no need to maintain lists of free blocks.

Copying collections also performs a compaction; there is no possibility for fragmentation.

All reachable objects are copied, but depending on the programming language, many objects are not reachable.

`Main Drawback` is that only half of the memory can be used. This can be reduced by dividing the memory into n blocks, selecting one to-space and one from-space among those, and sliding the window. However, objects must be smaller than the block size:

<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="7.png"></img> 

#### Languages Using In-Place Garbage Collection

While the Deutsch-Schorr-Waite algorithm isn't commonly specified in the documentation of mainstream high-level programming languages' garbage collectors, it is a concept studied in computer science and could be implemented in systems where memory efficiency and the conservation of stack space are particularly crucial:

Custom implementations: Systems that require efficient use of memory and stack might implement this algorithm specifically. It's more commonly discussed in academic contexts or in systems programming circles where custom garbage collection strategies are being designed.

Educational tools: The algorithm is often implemented in educational settings where the intricacies of garbage collection are taught, helping students understand in-place marking techniques and pointer manipulation.

#### Historical Origin

The Deutsch-Schorr-Waite algorithm was developed by L. Peter Deutsch and Ronald Schorr, and later described by Waite. It's an enhancement of the basic mark and sweep algorithm and was designed to overcome the limitation of excessive stack usage. By modifying the pointers of the objects in the heap itself during the traversal, it avoids the need for a separate stack to track the path during the marking phase of garbage collection. This modification is reversed once the object is processed, returning the pointers to their original state.

This algorithm is an excellent example of early innovations in garbage collection techniques that sought to optimize the use of limited computing resources, such as memory and CPU cycles. The Deutsch-Schorr-Waite algorithm showcases how critical efficient garbage collection was, even in the early days of programming, for enabling more complex and resource-intensive applications.

### 5. Mark-Compact Garbage Collection

Mark-compact collection is similar to mark-sweep, but does compaction like copying collection. Initially, starting from the root set, all live objects are `marked`, like in mark-sweep:
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="8.png"></img> 

In a `sweep` phase, the new address of each object is calculated and stored it is forwarding field; the new address is the sum of the sizes of all objects encountered so far.
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="9.png"></img> 

In the `third` phase, all pointers are updated to point to the new locations.

In the `fourth` phase, the objects are moved to their new location; the pointers are left unchanged. On this occasion, all objects are unmarked and all forwarding pointers become unused again:
<img style="width:15em;display: block;margin-left: auto;margin-right: auto" src="10.png"></img> 

`Advantage` of this algorithm:
Like copying collection, allocation of new objects only involves incrementing the free pointer; no need to maintain free lists.

One extra pointer for forwarding is needed in each object; mark-sweep `does not need extra space`, copying collections needs `twice as much space`(the `from` and `to` space).

Like with mark-sweep, the marking phase traverses the whole heap; if it is larger than main memory, swapping occurs. Copying collection only traverses live objects; sometimes only 5% are live.

Compaction preserves the original order of objects, unlike copying collection; this supports `locality`(by keeping the original order of objects and reducing the gaps between them, the data structure or memory layout optimizes both spatial and temporal locality) and `better caching`.

Mark-compact tends to accumulate long-lived objects at the `bottom of the heap` and would not move them. Copying collections always moves objects, whether small or large.


#### Languages Using Mark-Compact Garbage Collection

Several programming languages and runtime environments utilize the mark-compact algorithm, either as a primary method of garbage collection or as one option among several:

`Java`: Java's Garbage First (G1) garbage collector, introduced in Java 7, uses a form of mark-compact in certain phases of its operation, particularly during the major collection phase where it compacts the heap to prevent fragmentation.

`Haskell`: The Glasgow Haskell Compiler (GHC) uses a generational garbage collector that includes a mark-compact phase to manage the older generation, helping to maintain performance over long-running programs by reducing memory fragmentation.

`.NET Framework`: Microsoft's .NET runtime has various garbage collectors, and versions like the server GC can use a mark-compact strategy during full garbage collections to optimize memory usage and access speed.

#### Historical Origin

The mark-compact algorithm has its origins in the early days of automated memory management research, emerging as an important technique during the 1960s and 1970s. One of the key motivations for developing this method was to address the fragmentation issues associated with the mark-sweep approach, where memory could become increasingly fragmented over time, leading to inefficient use of space and potential program failure due to insufficient contiguous memory blocks.

The development of the mark-compact algorithm is often associated with efforts to optimize garbage collection for systems with limited memory resources where maximizing the utilization of available memory was crucial. Its introduction allowed for more efficient and reliable performance in systems that handle large data or have long runtimes, making it a valuable addition to the repertoire of garbage collection techniques. (See more in bibliography)

### 6. Generational Garbage Collection

Most objects die shortly after they are created: either an object is created as a "local" variable by the program or it's created to store the result of expression evaluation. Almost all objects collected are created since the last collection. `In other words, once an object survives the first collection, it is likely to survive subsequent ones.`

Generational collection divides the heap into a number of generations, say an old and a young generation. Different strategies are used for different generations:

`A tracing collection like mark-compact is used on the old generation.`

`A copying collector is used on the new generation.`

Once the heap is full, a minor collection on the youngest generation is performed; if that does not reclaim sufficient memory, the next older generation is collected.
<img style="width:40em;display: block;margin-left: auto;margin-right: auto" src="15.png"></img> 
The young generation is collected by following the pointers from the root set but ignoring those to the old generation.

However, pointers from the old generation to the new generation can exist and have to be treated as belonging to the root set. Several options exists:

`Remembered set:` whenever a pointer to a new object is stored in an old object, the pointer is added to a remembered set; whenever a new object becomes old, all its pointers to new objects are placed in the remembered set as well.

`Card marking:` the heap is divided into cards of equal size and a vector with a bit for every card is maintained. Whenever a pointer is stored in an object, the bit for the location of the pointer is set. A minor collection will check all the marked cards in the old generation. (Used by JDK 1.4.)

`Page marking:` if the card size is the page size of the virtual memory, the dirty bit of the page can be used for marking; this assumes that the dirty bit is available to user programs!

#### Languages Using Generational Garbage Collection

`Java`: Java is one of the most prominent users of generational garbage collection. Java virtual machines (JVMs) typically use a young generation where most new objects are allocated and a copying collector is employed because it's efficient for managing short-lived objects. For the old generation, which houses objects that have survived multiple garbage collection cycles, more thorough methods like mark-compact or mark-sweep are used because these objects are less likely to be garbage and more expensive to collect.

`C#/.NET`: .NET uses a similar generational approach in its garbage collector, employing a small-object heap divided into three generations. It uses a combination of mark-and-sweep and compacting strategies, particularly in the older generations.

`Python`: Python's CPython implementation uses a form of generational garbage collection alongside reference counting. The collector divides the heap into three generations and primarily uses a mark-sweep algorithm for the older generations.

`Ruby`: Ruby also uses generational garbage collection, particularly in the form of its "RGenGC" (Ruby Generational Garbage Collector), which was introduced to improve the performance of its previously existing mark-sweep collector by adding generational semantics.

#### Historical Origin

The concept of generational garbage collection was first formulated and put into practice in the 1980s. One of the seminal works that proposed the use of generational collection was by Lieberman and Hewitt in 1983. They suggested that most objects die young, which was a pivotal observation that led to the development of this technique. The idea was to optimize garbage collection by collecting the young generation more frequently and using faster, less intensive collection methods due to the high mortality rate of young objects, while employing more thorough collection methods for the older generation where objects are less likely to be garbage and more costly in terms of performance to collect.

The adoption of generational garbage collection has significantly impacted the development of programming languages and applications, allowing for more efficient memory management, reduced GC pause times, and overall better performance, especially in memory-intensive applications. This method reflects an ongoing evolution in garbage collection technologies, aimed at balancing performance with efficient resource management in complex computing environments. (See more in bibliography)

### Bibliography


<div class="csl-bib-body" style="line-height: 1.35; margin-left: 2em; text-indent:-2em;">
<a id='RJ'></a><div class="csl-entry">Richard Jones, Antony Hosking, Eliot Moss. 
    <i>The Garbage Collection Handbook: The Art of Automatic Memory Management 2nd Edition</i> <a href="https://learning.oreilly.com/library/view/the-garbage-collection/9781000883688">https://learning.oreilly.com/library/view/the-garbage-collection/9781000883688/</a>.</div>
<a id='LPD'></a><div class="csl-entry">L. Peter Deutsch and Daniel G. Bobrow.
    <i>An Efficient, 
Incremental, 
Automatic Garbage 
Collector</i> <a href="https://dl.acm.org/doi/pdf/10.1145/360336.360345">https://dl.acm.org/doi/pdf/10.1145/360336.360345</a>.</div>
<a id='DA'></a><div class="csl-entry">Daniel Anderson, Guy E. Blelloch, Yuanhao Wei.
    <i>Concurrent Deferred Reference Counting with
Constant-Time Overhead</i> <a href="https://dl.acm.org/doi/pdf/10.1145/3453483.3454060">https://dl.acm.org/doi/pdf/10.1145/3453483.3454060</a>.</div>
<a id='DRE'></a><div class="csl-entry">Daniel R. Edelson.
    <i>A Mark-and-Sweep Collector for C++</i> <a href="https://dl.acm.org/doi/pdf/10.1145/143165.143178">https://dl.acm.org/doi/pdf/10.1145/143165.143178</a>.</div>
<a id='AL'></a><div class="csl-entry">Alexey Loginov, Thomas Reps, and Mooly Sagiv.
    <i>Automated Veriﬁcation of the Deutsch-Schorr-WaiteTree-Traversal Algorithm</i> <a href="https://www.researchgate.net/publication/221477267_Automated_Verification_of_the_Deutsch-Schorr-Waite_Tree-Traversal_Algorithm">https://www.researchgate.net/publication/221477267_Automated_Verification_of_the_Deutsch-Schorr-Waite_Tree-Traversal_Algorithm</a>.</div>
<a id='SY'></a><div class="csl-entry">Siqiu Yao.
    <i>Immix: a Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance</i> <a href="https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/immix/">https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/immix/</a>.</div>
<a id='MJ'></a><div class="csl-entry">MALINA JIANG.
    <i>Java Garbage Collection: Analysis of GC Algorithms</i> <a href="https://stanford-cs242.github.io/f17/assets/projects/2017/malinaj.pdf">https://stanford-cs242.github.io/f17/assets/projects/2017/malinaj.pdf</a>.</div>