Skip to content

Performance and Lua tables

Jip edited this page Feb 6, 2023 · 26 revisions

Everything written in this article completely depends on your Lua environment. This article is specifically written for the Lua environment of Supreme Commander: Forged Alliance.

A table

A Lua table consists of three parts: a header, an array part and a hash part. The header is required for managing the table. It consumes 40 bytes. We'll use this section to provide context to the array part and the hash part of the table.

For more details we highly recommend you to read chapter 4 of The implementation of Lua 5.0.

Array part

The array part is accessed using numbers. When populating the array part you need to write them sequentially. As an example:

Inline Using `table.insert` Manual approach Manual approach with a mistake
local a = { 
    'array', 
    'array', 
    'array', 
    'array',
    'array'
}
LOG(debug.allocatedsize(a)) 
-- > 40 + 40 = 80 bytes
local a = { }
table.insert(a, 'array')
table.insert(a, 'array')
table.insert(a, 'array')
table.insert(a, 'array')
table.insert(a, 'array')
LOG(debug.allocatedsize(a)) 
-- > 40 + 40 = 80 bytes
local a = { }
a[1] = 'array'
a[2] = 'array'
a[3] = 'array'
a[4] = 'array'
a[5] = 'array'
LOG(debug.allocatedsize(a)) 
-- > 40 + 40 = 80 bytes
local a = { }
a[1] = 'array'
a[2] = 'array'
-- note that a[3] is missing
a[4] = 'hash'
a[5] = 'hash'
LOG(debug.allocatedsize(a)) 
-- > 40 + 16 + 80 = 136 bytes

This is why table.insert exists: it guarantees that the array part is populated sequentially and that the value is stored in the array part. The array part acts like an array from a language like c. Each element consumes 8 bytes in the array part. Accessing the array part is almost as fast as it is in c.

Hash part

The hash part is accessed using keys. This can be anything but nil, usually they are strings. Note that using tables or functions as keys will desync the game. They use their memory reference as their key, which is different for each player. As a result the order of a for loop is no longer in sync. For the programmer the hash part acts as a key-value map: you give it a key and the table returns you a value.

As a programmer you do not need to know how the Lua environment returns the value that belongs to a key. But you do need to understand that there is a cost involved. Each element consumes 20 bytes in the hash part. Accessing the hash part involves a hash operation: a function that converts the key (a string) to a number.

A hash operation occurs each time you write a . or a key between square brackets [] when the key is not in the array part. Usually this is the case when the key is not a number. As a few examples:

local mesh = self.Blueprint.Display.Mesh
--               ^         ^       ^     -> three hash operations

local cats = categories.TECH1 * categories.UEF * categories.STRUCTURE
--                     ^                  ^                ^          -> three hash operations

local ui = self._visible["Horz"]
--             ^        ^^^^^^^^ -> two hash operations

All of these table accesses have the cost mentioned earlier.

Table functions

Some table functions only interact with the array part. We'll summarize a non-exhaustive list:

  • (1) table.insert
  • (2) table.getn
  • (3) table.random
  • (4) table.remove
  • (5) table.sort

Other table functions that operate on both parts:

  • (6) table.getsize
  • (7) table.empty

We mention this because these functions may not act as you'd expect if you do not know the distinction. As an example: table.getn does not take into account elements in the hash part, and may therefore be inaccurate if you expect them to be included.

Table in memory

A Lua table is an expensive data structure in comparison to data structures available in other languages. The header consumes 40 bytes, regardless of whether the table has any elements in it. Each element in the array part consumes 8 bytes. Each element in the hash part consumes 20 bytes.

We'll make a comparison to create a perspective on how much more expensive these data structures are. For simplicity we'll start our comparison with structs from C. We'll start with a data structure that represents a vector or a point. Such a data structure has two entries, one for each axis.

// occupies 8 bytes
struct Vector {
  float x;
  float y;
}
-- occupies 40 + 16 = 56 bytes
local vector1 = { 0, 0 }

-- occupies 40 + 80 = 120 bytes
local vector2 = { x = 0, y = 0 }

This initial comparison on a simple data structure shows the memory overhead of Lua tables: in comparison to a struct in c a Lua table can occupy up to 10 times the amount of memory. A consequence of this is that we need to be careful how and when we use tables.

Performance and Lua tables

Throughout this section we'll use examples to clarify the techniques you can apply to improve the performance of Lua code that work with tables. The techniques are ordered from least intrusive to most intrusive. We'll use commits and pull requests as examples: they show you what changes we made to write code that achieves the same, but with (a lot) less overhead.

Before you apply these techniques it is important to understand that you should only try and optimize code when it meets the following conditions:

  • (1) The code needs to be functional: you can't optimize code that either doesn't exist, or code that is incomplete. Any attempt at optimizing before the code is functional is premature.
  • (2) The (functionality of the) code needs to be testable: you'll introduce significant changes to your code in order to optimize it. This will introduce bugs. It is therefore vital to be able to continuously test your changes as you progress.
  • (3) The (performance criteria of the) code needs to be measurable: no matter how good you (think you) are, if you can't measure it then you're blind. And if you're blind then you may be spending a lot of time on something that is irrelevant.

This is best explained in the explainer of premature performance improvements by xkcd. This article can not help you achieve point (1) above. For point (2) you can think in terms of being able to run the sequence you're trying to optimize by pressing a button. And for point (3) you can measure time using the global GetSystemTimeSecondsOnlyForProfileUse and you can measure the (complete) size of a table using the function ToBytes as introduced by #4523.

1. Only use tables when it is required

Often tables are used not because they are strictly required, but because they help us structure our code. We'll use two examples here:

CalculateBallisticAcceleration #4064

The function CalculateBallisticAcceleration allows bombers to properly compute their bomb trajectory. It is a perfect example of this behavior where the author used tables for their semantics: it naturally groups values together but functionally it is not required. As a few examples, take these local definitions:

  • local proj = {pos=projectile:GetPosition(), vel=VMult(Vector(launcher:GetVelocity()), 10)}
  • local dist = {pos=XZDist(proj.pos, target.pos), vel=XZDist(proj.vel, target.vel)}
  • target.tpos = {target.pos[1] + time * target.vel[1], 0, target.pos[3] + time * target.vel[3]}

These definitions do not leave the scope of the function (they are not returned). Their size is constant - it does not depend on an input parameter. Therefore they solely have a semantic purpose. But technically we allocate various tables, use them once or twice, and then throw them at the garbage collector again. It is a good example of table trashing. The refactored code removes these semantics and instead uses more locals. As an example:

local time = distPos / distVel
local targetNewPosX = targetPosX + time * targetVelX
local targetNewPosZ = targetPosZ + time * targetVelZ
local targetNewPosY = GetSurfaceHeight(targetNewPosX, targetNewPosZ)

The refactored function no longer uses tables semantically. By doing so it saves almost a kilobyte worth of memory every time this function is called.

RotateTowardsEnemy f1bdf61 and 30cf907

The function RotateTowardsEnemy allows point defense to rotate towards the nearest threat as they are created. It is another great example of this behavior where the author noticed that the complexity of the function was higher than it could be. In turn, the code is refactored and the result can be found in f1bdf61. Each 'better' threat would run the MakeTarget function with its sole purpose to return a new table. Each table uses 3 hash entries and therefore we're allocating 120 byte tables for each 'better' threat! A few months later the same author noticed this issue and refactored it into 30cf907: instead of allocating a new table for each threat the same table is re-used.

One has to wonder however: why do we need that table at all? The table is not returned and therefore only used within the scope of the function. We might as well declare three locals - like we did with the first example. It would save dozens of hash operations too, which coincidentally is our next topic 😄 !

2. Cache the hash operation

When tables are required we often end up running the same hashing operations multiple times. These are not cached by Lua - and as we know now the hashing operation is has a price tag. We'll use two examples here:

Caching of categories

Heapify #4399

The Heapify function allows us to restore the heap property where values are stored in a specific order. A heap is a data structure that acts like a priority queue. It is used for path finding. These changes are a great example of caching the hash operation as the performance of the code improved by more than 60%! For ease of understanding we'll put the changes next to each other. We highly recommend you to carefully observe them.

No caching Caching
Heapify = function(self)
    local index = 1
    local left = self:ToLeftChild(index)
    local right = self:ToRightChild(index)

    while self.Heap[left] do
        local min = left

        if self.Heap[right] and 
            (self.Heap[right].TotalCosts < self.Heap[left].TotalCosts) 
        then
            min = right
        end
        
        if self.Heap[min].TotalCosts > self.Heap[index].TotalCosts then
            return
        end

        self:Swap(index, min)

        index = min
        left = self:ToLeftChild(index)
        right = self:ToRightChild(index)
    end
end,
Heapify = function(self)
    -- cache of often used table entries
    local heap = self.Heap
    local heap_size = self.HeapSize

    local index = 1
    local left = 2 * index
    local right = 2 * index + 1

    while left <= heap_size do
        local min = left

        if heap[right] and 
            (heap[right].TotalCosts < heap[left].TotalCosts) 
        then
            min = right
        end

        if heap[min].TotalCosts > heap[index].TotalCosts then
            return
        end

        local tmp = heap[min]
        heap[min] = heap[index]
        heap[index] = tmp

        index = min
        left = 2 * index
        right = 2 * index + 1
    end
end,

You may notice that there are other changes too - Zjonn mentioned that primarily caching the hash operations were responsible for the performance improvements. This is a relatively simple change and is safe when:

  • The value you cache is read only
  • The value you cache is a table, as these are passed by reference

Re-use tables

Abuse tables

Re-organize tables