Skip to content

Latest commit

 

History

History
582 lines (427 loc) · 22.5 KB

File metadata and controls

582 lines (427 loc) · 22.5 KB

JIT Optimization Strategies

SpiderMonkey's optimizing JIT, IonMonkey, uses a number of different optimization strategies to speed up various operations. The most commonplace operations that are relevant for fast program execution are property accesses and function calls.

Optimization information is currently collected for the following operations:

At each operation site, IonMonkey tries a battery of strategies, from the most optimized but most restrictive to the least optimized but least restrictive. For each strategy attempted, its outcome is tracked. An outcome is either success or a reason why the strategy failed to apply.

This page documents the various optimization strategies and their outcomes. It provides information on what they attempt to do, what general level of speed-up they provide, what kind of program characteristics can prevent them from being used, and common ways to enable the engine to utilize that optimization.

BinaryArith

BinaryArith_Concat

Attempts to optimize an addition to string concatenation. The types of the operands are checked to contain a hint of being a concatenation. Both have to be string or one has to be a string and the other a type that easily can get converted to string (like numbers).

BinaryArith_SpecializedTypes

Attempts to do a numeric arithmetic based on operand types. One of the inputs need to be a numeric type and the other a simple arithmetic type.

BinaryArith_SpecializedOnBaselineTypes

Just like BinrayArith_SpecializedTypes tries to do a numeric arithmetic, but based on the observed types in the Baseline Compiler. If it succeeds it will roughly give same optimization as BinaryArith_SpecializedTypes, but the deduction is more fragile. In the best case one should try to get this optimization based on the input types.

BinaryArith_SharedCache

Attempts to optimize a binary arithmetic using inline cache.

BinaryArith_Call

Last resort which cannot get specialized at all and takes the slow path to do the arithmetic.

GetProperty

GetProp_ArgumentsLength

Attempts to optimize an arguments.length property access. This optimization only works if the arguments object is used in well-understood ways within the function. The function containing the arguments.length is allowed to use the arguments object in the following ways without disabling this optimization:

  • Access arguments.length
  • Access arguments.callee
  • Access individual args using arguments[i]
  • Save arguments into variables, as long as those variables cannot be accessed by any nested function, and as long as there exists no eval nywhere within the function or nested function definitions.
  • Call a function using f.apply(obj, arguments)

If the function contains any use of the arguments object that falls out of the cases defined above, this optimization will be suppressed. In particular, arguments cannot be returned from the function, or passed as an argument into calls (except for the apply case above).

GetProp_ArgumentsCallee

Attempts to optimize an arguments.callee property access. This optimization only works if the arguments object is used in well-understood ways within the function. The function containing the arguments.callee is allowed to use the arguments object in the following ways without disabling this optimization:

  • Access arguments.length
  • Access arguments.callee
  • Access individual args using arguments[i]
  • Save arguments into variables, as long as those variables cannot be accessed by any nested function, and as long as there exists no eval nywhere within the function or nested function definitions.
  • Call a function using f.apply(obj, arguments)

If the function contains any use of the arguments object that falls out of the cases defined above, this optimization will be suppressed. In particular, arguments cannot be returned from the function, or passed as an argument into calls (except for the apply example listed above).

GetProp_InferredConstant

Attempts to optimize an access to a property that seems to be a constant. It applies to property accesses on objects which are global-like in that there is only one instance of them per program. This includes global objects, object literals defined at the top-level of a script, and top-level function objects.

This optimization makes the assumption that a property that has not changed after it was first assigned, is likely a constant property. It then directly inlines the value of the property into hot code that accesses it. For example, in the following code:

var Constants = {};
Constants.N = 100;

function testArray(array) {
  for (var i = 0; i < array.length; i++) {
    if (array[i] > Constants.N)
      return true;
  }
  return false;
}

Will have the loop compiled into the following when testArray gets hot.

for (var i = 0; i < array.length; i++) {
  if (array[i] > 100)
    return true;
}

When this optimization is successful, property access is eliminated entirely and replaced with an inline constant.

GetProp_Constant

Attempts to optimize reading a property that contains a uniquely-typed (or "singleton") object. With uniquely-typed objects, it is guaranteed that no other object has that same type. Unique (or "singleton") types are assigned to certain kinds of objects, like global objects, top-level functions, and object literals declared at the top-level of a script. If a property has always contained the same uniquely-typed object, then the engine can use the unique type to map back to a specific object, and eliminate the property access, replacing it with a constant reference to the object.

When this optimization is successful, property access is eliminated entirely and replaced with an inline constant. The different success and failure conditions are documented below:

GetProp_StaticName

Attempts to optimize a property access on window which refers to a property on the global object.

GetProp_TypedObject

Optimizes accesses to properties on TypedObjects.

GetProp_DefiniteSlot

Optimizes access to a well-known regular property on an object. For this optimization to succeed, the property needs to be well-defined on the object. For objects constructed by constructor functions, this means that the property needs to be defined in the constructor, before any complex logic occurs within the constructor.

This is the best case for a regular "field" type property that is not turned into a constant. It compiles down to a single CPU-level load instruction.

function SomeConstructor() {
    this.x = 10;    // x is a definite slot property
    this.y = 10;    // y is a definite slot property
    someComplicatedFunctionCall();
    this.z = 20;    // z is not a definite slot property.
}

In the above example, the properties x and y can be determined to always exist on any instance of SomeConstructor at definite locations, allowing the engine to deterministically infer the position of x without a shape guard.

This optimization can fail for a number of reasons. If the types observed at the property access are polymorphic (more than one type), this optimization cannot succeed. Furthermore, even if the object type is monomorphic, the optimization will fail if the property being accessed is not a definite slot as described above.

GetProp_Unboxed

Similar to GetProp_DefiniteSlot. Unboxed property reads are possible on properties which satisfy all the characteristics of a definite slot, and additionally have been observed to only store values of one kind of value.

Consider the following constructor:

function Point(x, y) {
    this.x = x;
    this.y = y;
}

If only integers are ever stored in the x and y properties, then the instances of Point will be represented in an "unboxed" mode - with the property values stored as raw 4-byte values within the object.

Objects which have the unboxed optimization are more compact.

GetProp_CommonGetter

Optimizes access to properties which are implemented by a getter function, where the getter is shared between multiple types.

This optimization applies most often when the property access site is polymorphic, but all the object types are derived variants of a single base class, where the property access refers to a getter on the base class.

Consider the following example: function Base() {} Base.prototype = { get x() { return 3; } };

function Derived1() {}
Derived1.prototype = Object.create(Base.prototype);

function Derived2() {}
Derived1.prototype = Object.create(Base.prototype);

If a property access for d.x sees only instances of both Derived1 and Derived2 for d, it can optimize the access to x to a call to the getter function defined on Base.

This optimization applies to shared getters on both pure JS objects as well as DOM objects.

GetProp_InlineAccess

Optimizes a polymorphic property access where there are only a few different types of objects seen, and the property on all of the different types is determinable through a shape-check.

If a property access is monomorphic and the property's location is determinable from the object's shape, but the property is not definite (see: GetProp_DefiniteProperty), then this optimization may be used.

Alternatively, if the property access is polymorphic, but only has a few different shapes observed at the access site, this optimization may be used.

This optimization compiles down to one-or more shape-guarded direct loads from the object. The following pseudocode describes the kind of machine code generated by this optimization:

if obj.shape == Shape1 then
  obj.slots[0]
elif obj.shape == Shape2 then
  obj.slots[5]
elif obj.shape == Shape3 then
  obj.slots[2]
...
end

GetProp_Innerize

Attempts to optimize a situation where a property access of the form window.PROP can be directly translated into a property access on the inner global object.

This optimization will always fail on property accesses which are not on the window object.

It is useful because accessing global names via the 'window' object is a common idiom in web programming.

GetProp_InlineCache

This is the worst-case scenario for a property access optimization. This strategy is used when all the others fail. The engine simply inserts an inline cache at the property access site.

Inline caches start off as a jump to a separate piece of code called a "fallback". The fallback calls into the interpreter VM (which is very slow) to perform the operation, and then decides if the operation can be optimized in that particular case. If so, it generates a new "stub" (or freestanding piece of jitcode) and changes the inline cache to jump to the stub. The stub attempts to optimize further occurrences of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization strategies, and are an indication that the type inference engine has failed to collect enough information to guide the optimization process.

SetProperty

SetProp_CommonSetter

Optimizes access to properties which are implemented by a setter function, where the setter is shared between multiple types.

This optimization applies most often when the property access site is polymorphic, but all the object types are derived variants of a single base class, where the property access refers to a setter on the base class.

Consider the following example: function Base() {} Base.prototype = { set x(val) { ... } };

function Derived1() {}
Derived1.prototype = Object.create(Base.prototype);

function Derived2() {}
Derived1.prototype = Object.create(Base.prototype);

If a property write for d.x = val sees only instances of both Derived1 and Derived2 for d, it can optimize the write to x to a call to the setter function defined on Base.

This optimization applies to shared setters on both pure JS objects as well as DOM objects.

SetProp_TypedObject

Optimizes accesses to properties on TypedObjects.

SetProp_DefiniteSlot

Optimizes a write to a well-known regular property on an object. For this optimization to succeed, the property needs to be well-defined on the object. For objects constructed by constructor functions, this means that the property needs to be defined in the constructor, before any complex logic occurs within the constructor.

This is the best case for a regular "field" type property that is not turned into a constant. It compiles down to a single CPU-level load instruction.

function SomeConstructor() {
    this.x = 10;    // x is a definite slot property
    this.y = 10;    // y is a definite slot property
    someComplicatedFunctionCall();
    this.z = 20;    // z is not a definite slot property.
}

In the above example, the properties x and y can be determined to always exist on any instance of SomeConstructor at definite locations, allowing the engine to deterministically infer the position of x without a shape guard.

This optimization can fail for a number of reasons. If the types observed at the property access are polymorphic (more than one type), this optimization cannot succeed. Furthermore, even if the object type is monomorphic, the optimization will fail if the property being written is not a definite slot as described above.

SetProp_Unboxed

Similar to SetProp_DefiniteSlot. Unboxed property writes are possible on properties which satisfy all the characteristics of a definite slot, and additionally have been observed to only store values of one kind of value.

Consider the following constructor:

function Point(x, y) {
    this.x = x;
    this.y = y;
}

If only integers are ever stored in the x and y properties, then the instances of Point will be represented in an "unboxed" mode - with the property values stored as raw 4-byte values within the object.

Objects which have the unboxed optimization are more compact.

SetProp_InlineAccess

Optimizes a polymorphic property write where there are only a few different types of objects seen, and the property on all of the different types is determinable through a shape-check.

If a property write is monomorphic and the property's location is determinable from the object's shape, but the property is not definite (see: GetProp_DefiniteProperty), then this optimization may be used.

Alternatively, if the property write is polymorphic, but only has a few different shapes observed at the access site, this optimization may be used.

This optimization compiles down to one-or more shape-guarded direct stores to the object. The following pseudocode describes the kind of machine code generated by this optimization:

if obj.shape == Shape1 then
  obj.slots[0] = val
elif obj.shape == Shape2 then
  obj.slots[5] = val
elif obj.shape == Shape3 then
  obj.slots[2] = val
...
end

SetProp_InlineCache

This is the worst-case scenario for a property access optimization. This strategy is used when all the others fail. The engine simply inserts an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called a "fallback". The fallback calls into the interpreter VM (which is very slow) to perform the operation, and then decides if the operation can be optimized in that particular case. If so, it generates a new "stub" (or freestanding piece of jitcode) and changes the inline cache to jump to the stub. The stub attempts to optimize further occurrences of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization strategies, and are an indication that the type inference engine has failed to collect enough information to guide the optimization process.

GetElement

GetElem_TypedObject

Attempts to optimized element accesses on array Typed Objects.

GetElem_Dense

Attempts to optimize element accesses on densely packed array objects. Dense arrays are arrays which do not have any 'holes'. This means that the array has valid values for all indexes from 0 to length-1.

GetElem_TypedStatic

Attempts to optimize element accesses on a typed array that can be determined to always refer to the same array object. If this optimization succeeds, the 'array' object is treated as a constant, and is not looked up or retrieved from a variable.

GetElem_TypedArray

Attempts to optimize element accesses on a typed array.

GetElem_String

Attempts to optimize element accesses on a string.

GetElem_Arguments

Attempts to optimize element accesses on the arguments special object available in functions. This optimization only works if the arguments object is used in well-understood ways within the function. The function containing the arguments.length is allowed to use the arguments object in the following ways without disabling this optimization:

  • Access arguments.length
  • Access arguments.callee
  • Access individual args using arguments[i]
  • Save arguments into variables, as long as those variables cannot be accessed by any nested function, and as long as there exists no eval anywhere within the function or nested function definitions.
  • Call a function using f.apply(obj, arguments)

If the function contains any use of the arguments object that falls out of the cases defined above, this optimization will be suppressed. In particular, arguments cannot be returned from the function, or passed as an argument into calls (except for the apply case above).

GetElem_ArgumentsInlined

Similar to GetEelem_Arguments, but optimizes cases where the access on arguments is happening within an inlined function. In these cases, an access of the form arguments[i] can be directly translated into a direct reference to the corresponding argument value in the inlined call.

Consider the following: function foo(arg) { return bar(arg, 3); } function bar() { return arguments[0] + arguments[1]; }

In the above case, if foo is compiled with Ion, and the call to bar is inlined, then this optimization can transform the entire procedure to the following pseudo-code:

compiled foo(arg):
    // inlined call to bar(arg, 3) {
    return arg + 3;
    // }

GetElem_InlineCache

This is the worst-case scenario for a element access optimization. This strategy is used when all the others fail. The engine simply inserts an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called a "fallback". The fallback calls into the interpreter VM (which is very slow) to perform the operation, and then decides if the operation can be optimized in that particular case. If so, it generates a new "stub" (or freestanding piece of jitcode) and changes the inline cache to jump to the stub. The stub attempts to optimize further occurrences of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization strategies, and are an indication that the type inference engine has failed to collect enough information to guide the optimization process.

SetElement

SetElem_TypedObject

Attempts to optimized element writes on array Typed Objects.

SetElem_TypedStatic

Attempts to optimize element writes on a typed array that can be determined to always refer to the same array object. If this optimization succeeds, the 'array' object is treated as a constant, and is not looked up or retrieved from a variable.

SetElem_TypedArray

Attempts to optimize element writes on a typed array.

SetElem_Dense

Attempts to optimize element writes on densely packed array objects. Dense arrays are arrays which do not have any 'holes'. This means that the array has valid values for all indexes from 0 to length-1.

SetElem_Arguments

Attempts to optimize element writes to the arguments special object available in functions. This optimization only works if the arguments object is used in well-understood ways within the function. The function containing the arguments.length is allowed to use the arguments object in the following ways without disabling this optimization:

  • Access arguments.length
  • Access arguments.callee
  • Access individual args using arguments[i]
  • Save arguments into variables, as long as those variables cannot be accessed by any nested function, and as long as there exists no eval anywhere within the function or nested function definitions.
  • Call a function using f.apply(obj, arguments)

If the function contains any use of the arguments object that falls out of the cases defined above, this optimization will be suppressed. In particular, arguments cannot be returned from the function, or passed as an argument into calls (except for the apply case above).

SetElem_InlineCache

This is the worst-case scenario for a element write optimization. This strategy is used when all the others fail. The engine simply inserts an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called a "fallback". The fallback calls into the interpreter VM (which is very slow) to perform the operation, and then decides if the operation can be optimized in that particular case. If so, it generates a new "stub" (or freestanding piece of jitcode) and changes the inline cache to jump to the stub. The stub attempts to optimize further occurrences of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization strategies, and are an indication that the type inference engine has failed to collect enough information to guide the optimization process.

Call

Call_Inline

A function call f(x) usually pushes a frame onto the call stack. Inlining a call site conceptually copies the body of the callee function and pastes it in place of the call site and avoids pushing a new execution frame. Usually, hot functions do well to be inlined. This is one of the most important optimizations the JIT performs.

Ion inlines both interpreted (i.e., written in JavaScript) functions and native (i.e., built-ins such as Math.sin implemented in C++).

A successfully inlined call site has the outcome Inlined.

Failure to inline comes in two flavors: unable (e.g., unable to determine exact callee) and unwilling (e.g., heuristics concluded that the time-space tradeoff will not pay off).