Skip to content

Shadow map atlas for maintainers

N8n5h edited this page Feb 20, 2021 · 3 revisions

Shadow Map Atlases Technical Article (WIP)

Introduction

This technical article is meant to provide some insight over the details of this feature to help contributors. The sections were ordered in no particular order. For an article that explains how to use this feature and its configuration, see this instead.

  • Introduction to shadow maps

Like most modern game engines, Armory employs shadow maps to draw shadows in a scene. This technique is the process determining if a pixel should be lit or obscure by using the information from images called depth maps, which store the depth of a scene from the perspective of a light.

Depending of the type of light the technique varies slightly but the gist remains the same. For spot and directional lights they require 1 image each (more for directional if cascading is enabled), since they "emit" light in a particular direction. Point lights require 6 for cubemap implementations, because they lit a scene in all directions. This technique it's really efficient to draw shadows but it has a few different problems and limitations.

  • Limited dynamic scenes

To handle a lot of shadow maps they are bundled into arrays when passing them as uniforms to render the final image. This has the problem with old openGL or webGL the standards for those don't allow dynamic indexing of ShadowMapSampler, and this requires the accessing indexes to be known at compiler time, so they can be unrolled into branching if/else if. So this has a considerable effect in performance because of the wasted gpu cycles to determine the correct sampler, and also requires to hard code a fixed amount of them.

  • Shadow Map Atlasing

To solve this, shadow maps atlases were introduced. The gist of this feature is that it instead of giving lights all their own unique shadow map, they now all share a common shadow map (atlas). This change allows circumventing the need for arrays of shadow map samplers and dynamic indexing completely, since it's just a single big texture, and in turn the dynamic part is moved to other parts of the shaders, namely the arrray of mat4 for spot lights and array of vec4 for point lights. Besides solving the problem mentioned earlier, it also it's more efficient since it only requires a lot less texture switches than the default which would require at least N (number of lights) texture switches which can be a lot slower. Shadow map atlases have at least those big advantages but it also has a few problems.

In order to determine correctly the UV coordinates in the atlas, a few GPU cycles are wasted, and this gets amplified when dealing with point lights, which require transforming from different faces, because with atlasing you don't have wrapping textures. Lastly, it also requires implementing some kind of solution to manage the atlases, which may be complicated if you expect to keep lights and scenes dynamic.

  • Implementation

So for the shader part, the premise was simple enough: use a big shadow map texture and do some manipulations to obtain the correct uv. But for the Haxe part, managing and solving where lights shadow maps should go in an atlas can be tricky. So to solve this, ShadowMapAtlas and ShadowMapTile were created.

These classes have the minimally required functionality to solve the basic needs for this: defining some kind of abstraction to prevent overlapping of shadowmaps, finding available space, assigning such space efficiently, locking and freeing this space, etc. This functionality is used by drawShadowMapAtlas(), which is a modified version of drawShadowMap().

  • Atlas for the implementation

Shadow map atlases are represented with perfectly balanced 4-ary trees, where each tree represents a tile or slice that results from "dividing" a squared image into 4 slices or sub-images. The root of this tile is a reference to a ShadowMapTile, and this tile is "divided" in 4 slices, and the process repeats depth-times. If depth is 1, slices are kept at just the initial 4 tiles of max size, which is the default size of the shadow map. arm_shadowmap_atlas_lod allows controlling if code to support more depth levels is added or not when compiling.

  • Tiles for the implementation

the tiles that populate atlases tile trees are simply a data structure that contains a reference to the light they are linked to, a tile array in case LOD is enabled, coordinates to where this tile starts in the atlas that go from 0 to Shadow Map Size, and a reference to a linked tile for LOD. This simple definition allows tiles having a theoretically small memory footprint, but in turn this simplicity might make some functionality that might be responsibility of tiles (for example knowing if they are overlapping) to be over the ones that use tiles instead. This it is to be revised in future iterations of this feature.

  • Rendering Process Explained

The process starts with drawShadowMapAtlas(), which is modified version drawShadowMap() from Inc. The original version of this function, explained simply, handles the logic for drawing shadow maps for the render path. To do so, it will iterate over all lights of the active scene, and for each light it will draw all meshes into a depth map, and repeat the process if there are multiple faces to be rendered.

drawShadowMapAtlas() follows the same principle, but instead of iterating over all lights, it will instead iterate over all the atlases from ShadowMapAtlas and subsequently iterate over their main tiles. This key difference eases control of the order of how lights are rendered, since different faces of a light's shadow map are mapped to different tiles.

The process will begin by iterating over all lights to check if there is a light that can be added to the atlas: light added recently or lights that needs to be added again to the atlas. If there are new candidates that fit the conditions, they get passed to addLight(light) which takes care of adding them to some atlas.

Right after that, the data for computing uvs for point lights in shaders will be updated by calling updatePointLightAtlasData().

Following that, the main iteration starts, which will go through all the atlases that ShadowMapAtlas has. Before going to the actual looping of tiles, some preparation is required: we start by getting the render target by calling getShadowMapAtlas(atlas), which is a modified version of getShadowMap(). After we get the proper render target, we initiate the stream of rendering by passing it to setTargetStream(shadowmap) of the active render path to prepare rendering and then we immediately clear the render target to avoid artifacts from previous renderings.

And then we reach the second iteration, where we go through all the active tiles of the atlas. For each tile: It will first check if the light linked to it is suitable for drawing shadows. If it's not, the tile will be freed and the light unlocked, which allows it to be considered for adding when going through all the lights at the beginning of the function.

Immediately after, if LOD is enabled we check if the tile size must be different by calling getTileSize() and comparing the returned size with the initial. If it must, we check first if the new size is 0, which means that it should be promptly removed from the loop, otherwise we set the newTileSize parameter for later and we push to the tilesToChangeSize. The Tile needs a new size but we render anyway. The reason for that is to avoid artifacts during rendering by making sure the tiles change size post render.

We then update the tileOffsetX, tileOffsetY and tileScale of each linked tiles

Implementation details of different components related to shadow map atlases:

Completely optional: arm_shadowmap_atlas

Because this feature it's brand new and thus experimental, it's completely optional and disabled by default.

To keep it optional, the custom compiler flag arm_shadowmap_atlas is employed to guard any code related to atlases. So if this flag is omitted when building a project, the code related to this feature will be ignored when compiling the code. This guarantees that any issue related to this feature is completely isolated, and consequently provide some kind of safe choice for projects. Though in the future it's probable that this changes and instead atlases may be the default, when the feature is robust enough.

addLight(light)

This function allows adding lights to atlases. The atlas a light will be added to, it's controlled by the function, and it will create an atlas for the light if none exist.

It will then try to assignTiles() to this light. If it succeeds, the returned tile by assignTiles() if it's not null will be pushed to the activeTiles of the atlas so it can enter the render loop.

If tiles couldn't be assigned to the lights for some reason, it will unlock the light so it can be considered to the waiting list again, a process that's repeated from the beginning of the function, which will do the same if the light was culled.

assignTiles(light)

This function serves the purpose of trying to assign tiles to a specific light. This is done within a few different steps, detailed below:

  • First it obtains the size of the tiles for the light either by the newTileSize attribute if LOD is enabled or via computeTileSize(), and if it's 0, it will stop the process.
  • It will then invoke findCreateTiles() to find and create new tiles on demand, which is explained in more detail in its own section.
  • After findCreateTiles() it's done, the tiles that were returned from this function, are locked to the light passed to assignTiles() by calling their lockTile() function.
  • The tiles are then linked together by their linkedTile attribute in linkTiles().
  • At the end, it returns the value returned by linkTiles(), which is a reference to the first tile of the link if findCreateTiles() was successful, it will be null otherwise.

linkedTile

Light types like points and sun with cascading require multiple faces for its shadow map to be rendered. This introduces an implicit relation between tiles that are tied to the same light but not necesarily tied among themselves, which can introduce complications, i.e. when rendering them if it's not done in order. To solve this, tiles that share this relationship are linked together via their linkedTile attribute.

The linkedTile attribute allows linking a sequence of related tiles, by simply referencing the next tile of a given "chain" of tiles that represent multiple faces of a light.

The way it's currently designed, a tile acts as the front-end to every action that affects a chain of tiles, like rendering them, freeing them, or simply iterating them to write uniform values. The function forEachLinkedTile() allows iterating over a sequence of linked tiles while hiding the implementation details of this structure for the most part.

freeTile(), lockTile(light), activeSubTiles

These functions serve several purposes, but mainly they exist to control whether a tile can be linked to a light and also if light can be considered again to be included in an atlas with lightInAtlas.

  • activeSubTiles: It's a variable that counts how many locked children tiles a given tile has. This is done to so that findTilesNaive() knows when to discard a tile with locked children. It's similar to reference counting.
  • lockTile(light) links a tile with a light, and increments the activeSubTiles counter if dynamic tiles is enabled.
  • freeTile() frees a tile .... . Also it unlinks a chain of tiles.

findTile(), findTilesNaive()

tileOffsetX, tileOffsetY, tileScale and how atlasing works

Those three arrays are key for atlasing to work in the shaders correctly. tileOffsetX/Y store the offset for each tile where it starts in some atlas texture in (0.0, 1.0) coordinates, and tileScales stores the scale factor for each tile to scale them accordingly from that (0.0, 1.0) to (0.0, tileSize / atlasSize). So if an atlas is 4096, and some tile is 1024, then the scale factor will be 0.25, and coordinates will go from (0.0, 0.1) to (0.0, 0.25), assuming the tile starts at (0.0, 0.0).

These values are utilized in two different places depending of the light type.

  • For spot lights, they are used when computing light matrices for uniforms that are in the "end" of the rendering process, to obtain depth values from a shadow map (like in deferred.frag): _biasLightWorldViewProjectionProjectionMatrix (used for sun), and _biasLightWorldViewProjectionMatrixSpotArray for spots lights arrays.

The values are used to form matrices to transform the coordinates from (0.0, 1.0) to (0.0, tileScale) and then offset them from (0.0, tileScale) to (tileOffsetStart, tileScale + tileOffsetStart). This is done in one step though, by setting the _00 and _11 values of the matrix for the scale and the _30 and _31 to displace them in X and Y accordingly. This matrix is then multiplied (to add to the composition) after the bias matrix (the one that transform coordinates from (-1.0, 1.0) to (0.0, 1.0) for textures coordinates. So local space coordinates will go to "tile offset and scale" coordinates directly when sampling depth from the atlas.

  • For point lights, the values need to be passed raw as a vec4 array to do the transformation directly in the shader. This is done because of the way point lights shadow mapping works. Which is explained in more detail in its own section, but it's the same logic process that is behind transformation for spot lights, scale and offset the texture coordinates accordingly.

There is an additional step done to convert "tile offset and scale" coordinates to make them direct3D compatible, since direct3D textures are sampled "inverted" on the Y coordinate when looking it from an openGL perspective. This is done by simply setting the Y coordinate to be 1.0 - uv.y to account for that.

Illustration of transformation process

drawShadowMapAtlas()

Similar to its not-atlas counterpart, this function provides a mean to invoke drawing shadowmaps for all lights in the active scene. The key difference is that instead of looping over lights directly, drawShadowMapAtlas() will loop over all atlases and their activeTiles




Shaders

_ShadowMapAtlas

This is the flag that encloses all the code related to shadowmap atlases in the shaders.

LWVPSpotArray

Now light World View Projection Matrices for spot lights are passed as an array of mat4, that is the size of maxLightsCluster, and the key difference now is that it uses dynamic indexing to get the correct matrix for a particular light index in light.glsl without having to branch into a lot of ifs to find it. updateLWVPMatrixArray() is the one responsible for keeping this array up-to-date.

It's important to mention that this array it's coupled to how updateClusters() and updateLightArray() iterates over lights and by also sharing the same conditionals to discard a lights in the loop. This is necessary to make sure a matrix can be easily indexed and without errors by the light index parameter of sampleLight() in light.glsl. This is done to avoid coupling the shader to how matrices are stored for the lights. So for example, if there is 3 points lights and 1 spot light in the scene, and maxLightClusters is 5 for the sake of the example, then the array would look like something like this: [empty, empty, empty, matrix-for-spot-1, empty, empty], since it's the same way lightsArray would store data for it, and the light index that would be saved in the clusters would be 3 which corresponds to the matrices array.

The same occurs for pointLightDataArray and LWVArray.

deferred_light fragment shader deferred_light.frag.glsl

There is no notable changes other than adding the new uniforms (shadowMapAtlas, shadowMapAtlasSun, shadowMapAtlasSpot, shadowMapAtlasPoint) and their overriding logic over the original shadow maps depending of the flags _ShadowMapAtlas and _SingleAtlas.

The logic to detect if a light is a spot light was changed, because it could occur a case where spot lights where detected when really there were none in the scene because of how clustering saves spot lights info. For this change another modification was necessary in updateLightsArray(), explained in its own section.

to make sure that the value is correctly set to 0.0 in any case and then if it's spot it will be changed accordingly.

light.glsl

Notable changes: simplified spot light wvp matrix accessing, by not having branching and simply dynamically indexing with the light index to obtain the correct matrix.

shadows.glsl

PCFFakeCube()

the shader now supports fake cubemap lookups, so that all the faces of a cubemap can be stored in a single image, or in an atlas in this case. the way it works is that the world coordinates that are normally passed to the cubemap texture sampler are instead passed to sampleCube() which is implemented from the openGL 2.0 standards, and behind the scenes it will obtain the face index and the uv coordinates to sample from the correct 2D image / face.

The faceIndex obtained from sampleCube() is then used to obtain the correct tile offset and scale that corresponds that face in the atlas.

sampleCube()

This function is an implementation of the specification for sampling of cubemaps in the opengl 2.0 document. It simply obtains a face and uv coordinates depending of direction vector that goes from the center of the light to the pixel position in world coordinates.

Because filtering is lost when doing a manual lookup of cubemaps, seams will appear in the shadow where the edges of the faces are. To remediate this, the uv coordinates are downscaled by a small factor before returning them. This is a really cheap way to hide those seams, and the difference of size in the shadow it's almost unnoticeable.

transformOffsetedUV()

This is a helper function when applying PCF over point lights shadows. Because of the nature of atlases, textures don't wrap. So instead we need to manually transform coordinates that fall out from a face to the proper one depending on the direction of the UV coordinates.




arm_shadowmap_atlas_lod: LOD for shadowmaps

arm_shadowmap_atlas_lod compiler flag encloses all code related to making the atlas algorithm support tile subdivision and assignation of such tiles for lights that needs a different size of shadowmap.

In a summarized fashion, LOD works by using the data from clustering to determine if a light needs a shadowmap of different size. This data is then used to mark tiles for a "change" of size, when looping over them at the rendering process of an atlas. After the rendering of tiles, those that required a size change are looped over and passed to assignTiles() as a parameter so that they are taken into account when assigning new tiles for the light they are tied to.

Detailed steps:




Iron

  • setTargetStream(), drawMeshesStream(), endStream()

These functions were created as an alternative to allow rendering continuously to the same render target, hence the Stream in the names. They are modified versions of setTarget() and drawMeshes(), with some differences but the most important is that the logic that ended the rendering process was moved to a function, so they require to manually call endStream to close it properly. What motivated the creation of these functions is that it was observed that drawMeshes() closed the rendering process for you when you called it, which is ok when rendering to a normal render target, but for a render target that is supposedly being rendered over and over, this posed some overhead that could be solved by manually allowing closing the process.

drawMeshesStream() also removed some of the logic from the original, specifically the logic to detect if a light should be skipped, since this is already done in drawShadowMapAtlas() so it was wasted computing, and the logic to configure the viewport for cascading, because it didn't make sense to couple this class with atlasing logic.

  • zToShadowMapScale(), shadowMapScale, culledLight

These are some of the few changes introduced to updateClusters() in order to support LOD and view culling of lights shadows for atlasing. View culling works by detecting incorrect states of clustering which are deemed to result when lights fall out of view.

LOD supports works by taking advantage of the clustering algorithm, which uses the minZ value when computing clusters in zToShadowMapScale() to determine the size of the shadow map for a light arbitrarily.

zToShadowMapScale() is a custom function that plots from [1, 16] (cluster Z slices) to [1.0, 0.0] (shadow map scale). Good enough and cheaper results were observed than if they were implemented with the solution proposed in the clustering shadows algorithm (determine the angle for each cluster).

  • discardLight(), discardLightCulled()

These functions were added to make more obvious the dependency of looping (and discarding) lights under the same exact conditions, otherwise is prone to issues because the uniform data for cluster is different from the lights array data. Probably a better solution would be to maybe centralize looping instead of just the conditions to discard lights, but it would require some thought since there are a few different loops for clustering than for lights array

  • pointLightsData

this array is updated by Inc from armory, and it's used to provide data to compute UV for point lights in the atlas. It didn't make sense to couple Uniforms nor LightObject with atlasing logic, so it's just in LightObject to avoid coupling.

  • updateLWVPMatrixArray(), LWVPMatrixArray

Allow passing arrays of uniform world view projection matrices. The uv coordinates of the specific tiles in the atlas is composed directly into the transformation, so it saves gpu cycles.

computeTileSize