diff --git a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md index e94238425a..c89cdc852c 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md @@ -1,21 +1,17 @@ --- -title: Optimizing graphics vertex efficiency for Arm GPUs - -draft: true -cascade: - draft: true +title: Optimize graphics vertex efficiency for Arm GPUs minutes_to_complete: 10 -who_is_this_for: This is an advanced topic for Android graphics application developers. +who_is_this_for: This is an advanced topic for Android graphics application developers aiming to enhance GPU performance through smarter vertex optimization. learning_objectives: - - Optimize vertex representations on Arm GPUs - - How to interpret Vertex Memory Efficiency in Arm Frame Advisor + - Optimize vertex representations on Arm GPUs. + - Analyze Vertex Memory Efficiency using Arm Frame Advisor. prerequisites: - - An understanding of vertex attributes - - Familiarity with Arm Frame Advisor, part of Arm Performance Studio + - Understanding of vertex attributes. + - Familiarity with Arm Frame Advisor (part of Arm Performance Studio). author: - Andrew Kilroy @@ -43,13 +39,17 @@ further_reading: link: https://developer.arm.com/documentation/102693/latest/ type: documentation - resource: - title: Analyse a Frame with Frame Advisor + title: Analyze a Frame with Frame Advisor link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/analyze_a_frame_with_frame_advisor/ type: blog - resource: title: Arm Performance Studio link: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio%20for%20Mobile type: website + - resource: + title: Attribute Layouts + link: https://developer.arm.com/documentation/101897/0304/Vertex-shading/Attribute-layout + type: website diff --git a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md index 7df536257d..3eac65d19f 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md +++ b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md @@ -6,45 +6,39 @@ weight: 5 layout: learningpathall --- -# Optimizing graphics vertex efficiency for Arm GPUs +## Diagnosing poor Vertex Memory Efficiency with Frame Advisor -You are writing a graphics application targeting an Arm Immortalis -GPU, and not hitting your desired performance. When running the Arm -Frame Advisor tool, you spot that the draw calls in your shadow map -creation pass have poor Vertex Memory Efficiency (VME) scores. How -should you go about improving this? +Imagine you're developing a graphics application targeting an Arm Immortalis GPU, but you're seeing subpar performance. -![Frame Advisor screenshot](fa-found-bad-vme-in-content-metrics.png) +After profiling your frame with Arm Frame Advisor, you might notice that the shadow map draw calls have low Vertex Memory Efficiency (VME), as shown in the image below. -In this Learning Path, you will learn about a common source of rendering -inefficiency, how to spot the issue using Arm Frame Advisor, and how -to rectify it. +This raises an important question: what's causing the inefficiency, and how can you fix it? +![Frame Advisor screenshot#center](fa-found-bad-vme-in-content-metrics.png "Arm Frame Advisor showing poor Vertex Memory Efficiency (VME) in shadow map draw calls.") + +## Finding a solution + +This Learning Path shows you approaches to addressing this problem, by demonstrating: + +* Common sources of rendering inefficiencies. +* How to identify and rectify issues using Arm Frame Advisor. ## Shadow mapping -In this scenario, draw calls in the shadow map render pass are the -source of our poor VME scores. Let's start by reviewing exactly what -these draws are doing. +In this scenario, draw calls in the shadow map render pass are responsible for the low Vertex Memory Efficiency (VME) scores. To understand why, let's begin by reviewing what these draws are doing. -Shadow mapping is the mechanism that decides, for every visible pixel, -whether it is lit or in shadow. A shadow map is a texture that is -created as the first part of this process. It is rendered from the -point of view of the light source, and stores the distance to all of -the objects that light can see. Parts of a surface that are visible -to the light are lit, and any part that is occluded must be in shadow. +*Shadow mapping* is the mechanism that decides whether each visible pixel is lit or in shadow. The process begins by rendering a shadow map - a texture rendered from the point of view of the light source. This texture stores the distance to the nearest surfaces visible to the light. + +During the final render pass, the GPU compares the depth of each pixel from the camera’s viewpoint to the corresponding value in the shadow map. If the pixel is farther away than what the light "sees," it’s considered occluded and rendered in shadow. Otherwise, it is lit. ## Mesh layout -The primary input into shadow map creation is the object geometry for -all of the objects that cast shadows. In this scenario, let's -assume that the vertex data for each object is stored in memory as an -array structure, which is a commonly used layout in many applications: +The primary input for shadow map creation is the geometry of all objects that cast shadows. In this scenario, assume that each object’s vertex data is stored in memory as an array structure, a layout commonly used in many applications: ``` C++ struct Vertex { float position[3]; - float color[3]. + float color[3]; float normal[3]; }; @@ -54,59 +48,38 @@ std::vector mesh { ``` -This would give the mesh the following layout in memory: +This gives the mesh the following layout in memory: -![Initial memory layout](initial-memory-layout.png) +![Initial memory layout#center](initial-memory-layout.png "Initial memory layout") -## Why is this sub-optimal? +## Why is this suboptimal? -This looks like a standard way of passing mesh data into a GPU, +At a first glance, this looks like a standard way of passing mesh data into a GPU, so where is the inefficiency coming from? -The vertex data that is defined contains all of the attributes that -you need for your object, including those that are needed to compute -color in the main lighting pass. When generating the shadow map, -you only need to compute the position of the object, so most -of your vertex attributes will be unused by the shadow map generation -draw calls. - -The inefficiency comes from how hardware gets the data it needs from -main memory so that computation can proceed. Processors do not fetch -single values from DRAM, but instead fetch a small neighborhood of -data, because this is the most efficient way to read from DRAM. For Arm -GPUs, the hardware will read an entire 64 byte cache line at a time. +The vertex data that is defined contains all of the attributes that you need for your object, including those that are needed to compute color in the main lighting pass. When generating the shadow map, you only need to compute the position of the object, so most of your vertex attributes will be unused by the shadow map generation draw calls. -In this example, an attempt to fetch a vertex position during shadow -map creation would also load the nearby color and normal values, -even though you do not need them. +The inefficiency comes from how GPUs fetch vertex data from main memory. GPUs don't retrieve individual values from DRAM. Instead, they fetch a small neighborhood of data at once, which is more efficient for memory access. On Arm GPUs, this typically means reading an entire 64-byte cache line at a time. +In this example, fetching a vertex position for shadow map rendering also loads the adjacent color and normal attributes into cache, even though they're not needed. This wastes memory bandwidth and contributes to poor Vertex Memory Efficiency (VME). -## Detecting a sub-optimal layout +## Detecting a suboptimal layout -Arm Frame Advisor analyzes the attribute memory layout for each draw -call the application makes, and provides the Vertex Memory Efficiency -(VME) metric to show how efficiently that attribute layout is working. +Arm Frame Advisor analyzes the vertex attribute memory layout for each draw call and reports a Vertex Memory Efficiency (VME) metric to show how efficiently the GPU accesses vertex data. -![Location of vertex memory efficiency in FA](fa-navigate-to-call.png) +![Location of vertex memory efficiency in FA#center](fa-navigate-to-call.png "Location of vertex memory efficiency in Frame Advisor") -A VME of 1.0 would indicate that the draw call is making an optimal -use of the memory bandwidth, with no unnecessary data fetches. +A VME of 1.0 indicates that the draw call is making an optimal use of the memory bandwidth, with no unnecessary data fetches. -A VME of less than one indicates that unnecessary data is being loaded -from memory, wasting bandwidth on data that is not being used in the -computation on the GPU. +A VME score below 1.0 indicates that unnecessary data is being loaded from memory, wasting bandwidth on attributes not being used in the computation on the GPU. -In this mesh layout you are only using 12 bytes for the `position` -field, out of a total vertex size of 36 bytes, so your VME score would -be only 0.33. +In this mesh layout you are only using 12 bytes for the `position` field, out of a 36-byte vertex, resulting in a VME score of 0.33. +## Fixing a suboptimal layout -## Fixing a sub-optimal layout +Shadow mapping only needs to load position, so to fix this issue you need to use a memory layout that allows position to be fetched in isolation from the other data. It is still preferable to leave the other attributes interleaved. -Shadow mapping only needs to load position, so to fix this issue you -need to use a memory layout that allows position to be fetched in -isolation from the other data. It is still preferable to leave the -other attributes interleaved. On the CPU, this would look like the following: +On the CPU, this looks like this: ``` C++ struct VertexPart1 { @@ -114,7 +87,7 @@ struct VertexPart1 { }; struct VertexPart2 { - float color[3]. + float color[3]; float normal[3]; }; @@ -127,35 +100,14 @@ std::vector mesh { }; ``` -This allows the shadow map creation pass to read only useful position -data, without any waste. The main lighting pass that renders the full -object will then read from both memory regions. - -The good news is that this technique is actually a useful one to apply -all of the time, even for the main lighting pass! Many mobile GPUs, -including Arm GPUs, process geometry in two passes. The first pass -computes only the primitive position, and second pass will process -the remainder of the vertex shader only for the primitives that are -visible after primitive culling has been performed. By splitting -the position attributes into a separate stream, you avoid wasting -memory bandwidth fetching non-position data for primitives that are -ultimately discarded by primitive culling tests. +This allows the shadow map creation pass to read only useful position data, without any waste. The main lighting pass that renders the full object will then read from both memory regions. +The good news is that this technique is actually a useful one to apply all of the time, even for the main lighting pass! Many mobile GPUs, including Arm GPUs, process geometry in two passes: an initial pass that computes only primitive positions, followed by a second pass that runs the full vertex shader only for primitives that survive culling. By placing position data in a separate buffer or stream, you reduce memory bandwidth wasted on fetching attributes like color or normals for primitives that are ultimately discarded. -# Conclusion +## Conclusion -Arm Frame Advisor can give you actionable metrics that can identify -specific inefficiencies in your application to optimize. +Arm Frame Advisor provides actionable metrics that can help identify specific inefficiencies in your graphics application. The Vertex Memory Efficiency metric measures how efficiently you are using your input vertex memory bandwidth, indicating what proportion of the input data is actually consumed by the shader program. You can improve VME by adjusting your vertex memory layout to separate attribute data into distinct streams, ensuring that each render pass only loads the data it needs. Avoid packing unused attributes into memory regions accessed by draw calls, as this wastes bandwidth and reduces performance. -The VME metric shows how efficiently you are using your input -vertex memory bandwidth, indicating what proportion of the input -data is actually used by the shader program. VME can be improved by -changing vertex memory layout to separate the different streams of -data such that only the data needed for type of computation is packed -together. Try not to mix data in that a computation would not use. -# Other links -Arm's advice on [attribute layouts][2] -[2]: https://developer.arm.com/documentation/101897/0304/Vertex-shading/Attribute-layout