-
Notifications
You must be signed in to change notification settings - Fork 0
/
PERFORMANCE.txt
168 lines (137 loc) · 5.32 KB
/
PERFORMANCE.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
GTX 970M
Test is initial start-up scene for Sponza, no moving of mouse
500 point lights
Seed = 0
Averaged values over a couple seconds
====================================
12/19
Release: Render Pass Pipeline, v1.0
Triangles drawn: 985334
Bits per pixel for deferred texturing:
RGB is not supported on my GPU which then converts them to RGBA
Position : RGB32F
- RGBA32F is 128 bits
Normal : RGB16F
- RGBA16f is 64 bits
Diffuse : RGB8
- RGBA8 is 32 bits
Specular : RGBA8
- RGBA8 is 32 bits
Depth : GL_DEPTH_COMPONENT
- Looks like this is probably 16bit
Total : 272 bits per pixel
Timings (microseconds):
FPS - 24.25368
Triangles: 982454
Frustum culling - 274.4117647
Shadow Map - 496.4818824
Deferred To Texture - 1158.057647
Deferred Lighting - 28563.24118
Transparent Time - 7845.497059
Post Process - 191.1284706
Total time : 38528.818 (38.5ms)
======================================
12/19
Getting rid of position texture (calculating using depth instead) and combining
diffuse and specular into 1 texture using bit offsets
Triangles drawn: 985334
Bits per pixel for deferred texturing:
RGB is not supported on my GPU which then converts them to RGBA
Normal : RGB16F
- RGBA16f is 64 bits
Diffuse/Specular/SpecEXP : RGBA16UI
- RGBA16 is 64 bits
Depth : GL_DEPTH_COMPONENT32
- 32bit
Total : 160 bits
Timings (microseconds):
FPS - 32.93074615
Frustum culling - 246.2307692
Shadow Map - 498.0504615
Deferred To Texture - 657.4695385
Deferred Lighting - 17941.93077
Transparent Time - 7852.306923
Post Process - 195.5569231
Total time : 27391.54538 (27.4ms)
======================================
12/29
Implementing octahedron encoding to optimize normals
Triangles drawn: 982454
Bits per pixel for deferred texturing:
Normal : RGB8
- This is 32 bits with the forced alpha
Diffuse/Specular/SpecEXP : RGBA16UI
- RGBA16 is 64 bits
Depth : GL_DEPTH_COMPONENT24
- 32 bits (24 + alpha 8) so that we can do a stencil buffer eventually
Total bits: 128
Timings (microseconds):
FPS - 33.67545714
Frustum culling - 277.2380952
Shadow Map - 494.4761905
Deferred To Texture - 576.2407619
Deferred Lighting - 17376.44762
Transparent Time - 7845.277143
Post Process - 192.5302857
Total time : 26762.2101 (26.8ms)
========================================
1/12/2020
Implementing custom memory manager to take better advantage of cache.
No meaningful performance difference from this implementation. Need to take a look at why that is.
Possibly due to too large of memory arrays (for storage of mesh data, which maybe isn't necessary
to be optimally stored since all that happens is it is sent to the GPU...)
Triangles drawn: 982454
Bits are same
Timings (microseconds):
FPS - 34.07703529
Frustum Culling - 184.4705882
Shadow Map - 494.4150588
Deferred To Texture - 582.4715294
Deferred Lighting - 17336.34706
Transparent Time - 7968.757647
Post Processing - 191.3185882
Total time : 26757.78047 (26.8ms)
======================================
1/25/2020
Discored NVidia Nsight. Did a bunch of performance analyzing and found some issues:
- Terrible L2 cache and texture fetch performance during light volumes
- - Makes sense since each light volume is in a different stop and thus acceses a random texture coordinate. Could be improved in tiled deferred but is worth taking a look now
- Stalling in shaders mainly due to texture look-ups
- - Find way to streamline these. Could also be due to above
=======================================
2/5/2020
Implemented a depth pre-pass to try and speed up my deferred to texture pass. Also combined
depth attachments for gbuffer and my intermediary framebuffer to avoid a slow, non-asynchronous
Blit during the transparent pass. We can't do a depth pre-pass for transparent geometry
as that will cause an issue with transparent objects and depth being just one value.
The boost here is the avoidance of using a Blit in the transparent pass (1.6ms) and pretty
much a wash on the depth prepass. Saved 0.2ms in the deferred to texture step, while
spending 0.13 on the pre-pass... so a 0.07ms boost
Triangles drawn: 1209781(2x opaque geometry, 1.23x overall due to light volumes)
The increase here is due to drawing all of our opaque geometry twice, once for the depth
pre-pass and once for gbuffer stuff.
Timings (microseconds):
FPS - 35.07676471
Frustum Culling - 199.9
Opaque Depth Pre-Pass - 131.1776
Shadow Map - 497.208
Deferred To Texture - 386.5392
Deferred Lighting - 17369.895
Transparent Time - 6367.4645
Post Processing - 188.2096
Total time : 25140.3939 (25.1ms)
I also tried different heuristics for sorting my light volumes to try and take advantage of
the texture cache as that is my biggest bottleneck, but to no avail.
Tiled-Deferred, here I come.
=======================================
5/9/2020
Fully implemented Tiled Deferred and Forward+
Huge boost in performance.
TLDL : Tiled Deferred performs 4x faster than Deferred Light Volumes. Forward+ can be anywhere from 2x as fast
to infinitely faster.
More analysis of results here:
https://docs.google.com/spreadsheets/d/16UaY-6x-lz7JujAXQqsjRXVe5sOM0ai-WGoXz7aaRy4/edit?usp=sharing
Biggest places for optimization in these two additions are finding a better light attenuation
algorithm for my point lights. Currently, they get really large and thus don't take full
advantage of Forward+. Also, finding a more accurate culling method than just tile frustums.
There are some false positives that should be taken care of.