Skip to content

Latest commit

 

History

History
660 lines (479 loc) · 24.3 KB

scanning.rst

File metadata and controls

660 lines (479 loc) · 24.3 KB

single: scanning; introduction single: object format; scan method

Scanning

Scanning <scan> is the process of identifying the references in a block of memory and "fixing" <fix> them. It's the process at the heart of the Memory Pool System, and the most critical of the memory management functions that have to be implemented by the client program.

Scanning is used to carry out three tasks:

  1. During tracing <trace>, blocks are scanned in order to follow references, and so determine which blocks are reachable and which are not.
  2. After objects have been moved in memory, blocks are scanned in order to identify references that need to be updated to point to the new locations of these objects.
  3. When iterating over allocated blocks in a pool using :cmps_pool_walk, blocks are scanned in order to keep data structures consistent when references are updated.

All these tasks use the same protocol, described here.

single: scanning; protocol

Scanning protocol

There are several types of scanning functions (the scan method in an object format, of type :cmps_fmt_scan_t, and root scanning functions of various types) but all take a scan state argument of type :cmps_ss_t, and a description of a region to be scanned. They must carry out the following steps:

  1. Call the macro :cMPS_SCAN_BEGIN on the scan state.
  2. For each reference in the region:
    1. Call :cMPS_FIX1, passing the scan state and the reference.
    2. If :cMPS_FIX1 returns false, the reference is not of interest to the MPS. Proceed to the next reference in the region.
    3. If :cMPS_FIX1 returns true, the reference is of interest to the MPS. Call :cMPS_FIX2, passing the scan state and a pointer to a location containing the reference.
    4. If :cMPS_FIX2 returns a result code other than :cMPS_RES_OK, return this result code from the scanning function as soon as practicable.
    5. If :cMPS_FIX2 returns :cMPS_RES_OK, it may have updated the reference. Make sure that the updated reference is stored back into the region being scanned.
  3. Call the macro :cMPS_SCAN_END on the scan state.
  4. Return :cMPS_RES_OK.

This description of the protocol simplifies a number of important details, which are covered in the following sections.

pair: scanning; tagged reference

Tagged references

If your references are tagged <tagged reference> (or otherwise "encrypted"), then you must remove the tag (or decrypt them) before passing them to :cMPS_FIX1 and :cMPS_FIX2.

The reference passed to :cMPS_FIX2 must be the address of the base of the block referred to (unless the referent belongs to an object format with in-band headers, in which case it must be a reference to the address just after the header).

However, :cMPS_FIX1 allows some leeway: if you pass it a reference to the interior of an allocated block, then :cMPS_FIX1 correctly determines whether a reference to the block is of interest to the MPS.

This means that if your tag is in the low bits of the reference, you may not have to remove it before calling :cMPS_FIX1. For example, if you use three tag bits, then your reference is at most base + 7, and if your objects are at least 8 bytes long, then the reference is within the object and need not be stripped. So your code might look like this:

if (MPS_FIX1(ss, obj->ref)) {
    /* strip the tag */
    mps_addr_t p = obj->ref & ~0x7;
    mps_res_t res = MPS_FIX2(ss, &p);
    if (res != MPS_RES_OK) return res;
    /* restore the tag and update reference */
    mps_word_t tag = obj->ref & 0x7;
    obj->ref = (obj_t)((char *)p + tag);
}

This saves the cost of stripping the tag in the case that obj->ref is not of interest to the MPS.

Similarly, if you use interior pointers, you do not need to convert them to base pointers before calling :cMPS_FIX1 (or, indeed, before calling :cMPS_FIX2, if the target of the referent belongs to an object format with in-band headers).

pair: scanning; critical path

Critical path

Scanning is an operation on the critical path of the MPS and so it is vital that it runs fast. The scanning protocol is designed to ensure that as much of the scanning code can be run inline in the client program as possible. In particular, the macro :cMPS_FIX1 does not need to call into the MPS.

The purpose of :cMPS_FIX1 is to provide a fast check as to whether a reference is "of interest" to the MPS. It is legitimate to call this on any word: it does not even have to be an address. So if you have a mixture of references and non-references, it might turn out to be faster to call :cMPS_FIX1 on each word before you even determine whether or not the word is a reference.

Whether this is in fact an optimization depends on the proportion of references to non-references, on how often genuine references turn out to be "of interest", and what kind of code the compiler has generated. There is no substitute for measurement.

See design-critical-path.

Note

In one application with a high proportion of unboxed values, it turned out to be fastest to check the tag and reject non-references before calling :cMPS_FIX1.

Warning

If you passed a word that might not be a reference to :cMPS_FIX1, and it returned true, this might be a false positive. You must be certain that the alleged reference is genuine as well as "of interest" before passing it to :cMPS_FIX2.

Another technique that can speed up scanning is to segregate objects into pools whose object formats contain different scan methods. In particular, if you can segregate objects that do not contain any references into leaf object pools like pool-amcz, these objects do not need to be scanned at all.

pair: scanning; ambiguous reference

Ambiguous references

If the references in the object being scanned are ambiguous <ambiguous reference> then :cMPS_FIX2 does not update the reference (because it can't know if it's a genuine reference). The MPS handles an ambiguous reference by pinning the block pointed to so that it cannot move.

You could use this fact to optimize the scan by avoiding the need to reassemble and store the updated reference after calling :cMPS_FIX2.

Note

The MPS currently has no pools that support ambiguous references, so this cannot arise for the scan method in an object format, but root scanning functions may encounter this case.

pair: scanning; unfixed reference

Unfixed references

The MPS does not require you to fix all your references. But if a reference is not fixed:

  1. it does not keep its target alive (this might be acceptable if you know that the target is being kept alive for another reason, for example if it is in a manually managed <manual memory management> pool, or if there is always another reference to the target that is fixed);
  2. it does not get updated if the target moves (this might be acceptable if you know that the target cannot move, for example if it is in a non-moving <non-moving memory manager> pool, or if it is pinned <pinning> by an ambiguous reference).

These optimizations can be tricky to make correct, and can make the system fragile (for example, it may break if you start using a different pool class), so it is usually safest to fix all references.

single: scanning; example single: Scheme; scanning

Example: Scheme objects

Scanning tends to be a repetitive procedure and so you'll find it is usually helpful to define macros to reduce the size of the source code. The MPS provides a convenience macro :cMPS_FIX12 for the common case of calling :cMPS_FIX1 and then immediately calling :cMPS_FIX2 if the reference is "of interest".

Note

Some compilers generate better code if you use :cMPS_FIX12, and some if you use :cMPS_FIX1 and :cMPS_FIX2. There's no substitute for measurement.

Here's the macro FIX defined by the toy Scheme interpreter:

#define FIX(ref)                                                        \
    do {                                                                \
        mps_addr_t _addr = (ref); /* copy to local to avoid type pun */ \
        mps_res_t res = MPS_FIX12(ss, &_addr);                          \
        if (res != MPS_RES_OK) return res;                              \
        (ref) = _addr;                                                  \
    } while(0)

Note

The comment refers to a temptation to write non-portable code that presents itself here. :cMPS_FIX2 takes a pointer to a location containing the reference (an argument of type mps_addr_t *). It is tempting to take the address of the reference and cast it to this type. The behaviour of such a cast is not defined by the C standard. See topic-interface-pun.

Here's the Scheme scanner:

static mps_res_t obj_scan(mps_ss_t ss, mps_addr_t base, mps_addr_t limit)
{
    MPS_SCAN_BEGIN(ss) {
        while (base < limit) {
            obj_t obj = base;
            switch (obj->type.type) {
                case TYPE_PAIR:
                    FIX(obj->pair.car);
                    FIX(obj->pair.cdr);
                    base = (char *)base + ALIGN(sizeof(pair_s));
                    break;
                case TYPE_VECTOR: {
                    size_t i;
                    for (i = 0; i < obj->vector.length; ++i)
                        FIX(obj->vector.vector[i]);
                    base = (char *)base +
                        ALIGN(offsetof(vector_s, vector) +
                              obj->vector.length * sizeof(obj->vector.vector[0]));
                    break;
                }
                /* ... and so on for the other types ... */
                default:
                    assert(0);
                    fprintf(stderr, "Unexpected object on the heap\n");
                    abort();
                    return MPS_RES_FAIL;
            }
        }
    } MPS_SCAN_END(ss);
    return MPS_RES_OK;
}

Note

This scanner is a simple example intended to make the process clear to the reader. The scanning code and the object layout are not at all optimized.

single: scanning; interface

Scanning interface

single: scanning; fixing single: fixing; interface

Fixing interface

single: scanning; area scanners single: area; scanning

Area scanners

An area scanner scans an area of memory for references. Various functions in the MPS interface, such as :cmps_root_create_thread_tagged, accept area scanners as arguments so that the client program can specify how to scan special areas such as the control stack.

The MPS provides some area scanners for common situations (such as an area which is a vector of words with references identified by tag bits <tag>) but the client program can provide its own.

If you want to develop your own area scanner you can start by adapting the scanners, found in scan.c in the MPS source code.