Inline/Optimize SpriteBatch.DrawInternal() & SpriteFont.DrawInto() #5401

Merged
merged 21 commits into from Feb 16, 2017

Conversation

Projects
None yet
6 participants
@nkast
Contributor

nkast commented Jan 6, 2017

Inline SpriteBatch.DrawInternal() & SpriteFont.DrawInto().
Remove temporary Variables and unnecessary Computations.
#5347

Method Develop this PR difference
Draw(Texture2D, Vector2, Color) 0.18652 0.14050 75.3%
Draw(Texture2D, Rectangle, Color) 0.21533 0.12461 57.9%
Draw(Texture2D, Vector2, Rectangle, Color) 0.23807 0.17247 72.4%
Draw(Texture2D, Rectangle, Rectangle, Color) 0.25178 0.15590 61.9%
Draw(Texture2D, Vector2, Rectangle?, Color, float, Vector2, Vector2, SpriteEffects, float) 0.31193 0.29355 94.1%
Draw(Texture2D, Rectangle, Rectangle?, Color, float, Vector2, SpriteEffects, float) 0.31487 0.28922 91.9%
DrawString(SpriteFont, string, Vector2, Color) 0.48899 0.37718 77.1%
DrawString(SpriteFont, string, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float) 0.69743 0.40235 57.7%

@nkast nkast changed the title from Inline/Optimize Inline DrawInternal() & SpriteFont.DrawInto() to Inline/Optimize SpriteBatch.DrawInternal() & SpriteFont.DrawInto() Jan 6, 2017

+ item.Texture = texture;
+
+ // set SortKey based on SpriteSortMode.
+ switch (_sortMode)

This comment has been minimized.

@tomspilman

tomspilman Jan 7, 2017

Member

Wouldn't this be easier to read (and maybe faster) as:

item.SortKey = _sortMode == SpriteSortMode.Texture ? texture.SortingKey : 0;
@tomspilman

tomspilman Jan 7, 2017

Member

Wouldn't this be easier to read (and maybe faster) as:

item.SortKey = _sortMode == SpriteSortMode.Texture ? texture.SortingKey : 0;
+
+ if (sourceRectangle.HasValue)
+ {
+ _tempRect = sourceRectangle.Value;

This comment has been minimized.

@tomspilman

tomspilman Jan 7, 2017

Member

I've always disliked this _tempRect stuff... are we sure using this temporary really saves any performance?

@tomspilman

tomspilman Jan 7, 2017

Member

I've always disliked this _tempRect stuff... are we sure using this temporary really saves any performance?

This comment has been minimized.

@nkast

nkast Jan 10, 2017

Contributor

I don't think so. I am going to remove it and use a local temp.
Although if I do the same for _texCoordTL & _texCoordBR we might get a small hit because C# always initialize variables to their default value.

@nkast

nkast Jan 10, 2017

Contributor

I don't think so. I am going to remove it and use a local temp.
Although if I do the same for _texCoordTL & _texCoordBR we might get a small hit because C# always initialize variables to their default value.

+ item.Texture = spriteFont.Texture;
+
+ // set SortKey based on SpriteSortMode.
+ switch (_sortMode)

This comment has been minimized.

@tomspilman

tomspilman Jan 7, 2017

Member

This test can go outside the loop entirely.

@tomspilman

tomspilman Jan 7, 2017

Member

This test can go outside the loop entirely.

This comment has been minimized.

@nkast

nkast Jan 10, 2017

Contributor

Missed that one...

@nkast

nkast Jan 10, 2017

Contributor

Missed that one...

@tomspilman

This comment has been minimized.

Show comment
Hide comment
@tomspilman

tomspilman Jan 9, 2017

Member

I checked and it seems there is some missing code coverage in the SpriteBatch tests. As part of this PR we should look to fix that to ensure we don't break any functionality.

Member

tomspilman commented Jan 9, 2017

I checked and it seems there is some missing code coverage in the SpriteBatch tests. As part of this PR we should look to fix that to ensure we don't break any functionality.

@nkast

This comment has been minimized.

Show comment
Hide comment
@nkast

nkast Jan 10, 2017

Contributor

I checked and it seems there is some missing code coverage in the SpriteBatch tests. As part of this PR we should look to fix that to ensure we don't break any functionality.

Yeah, the drawString() with StringBuilder I imagine. How should we tackle this one?Create a copy of every test in SpriteFontTest.cs? public void Rotated_sb(), public void Scaled_sb() etc?

Those methods are pretty similar, should I redirect them to Internal DrawString(SpriteFont, CharacterSource ,...) or is it too early to talk about recombining code?

Contributor

nkast commented Jan 10, 2017

I checked and it seems there is some missing code coverage in the SpriteBatch tests. As part of this PR we should look to fix that to ensure we don't break any functionality.

Yeah, the drawString() with StringBuilder I imagine. How should we tackle this one?Create a copy of every test in SpriteFontTest.cs? public void Rotated_sb(), public void Scaled_sb() etc?

Those methods are pretty similar, should I redirect them to Internal DrawString(SpriteFont, CharacterSource ,...) or is it too early to talk about recombining code?

@tomspilman

This comment has been minimized.

Show comment
Hide comment
@tomspilman

tomspilman Jan 11, 2017

Member

How should we tackle this one?Create a copy of every test in SpriteFontTest.cs?

Yeah... copy the most relevant tests and modify it to be sure each draw method is tested.

Those methods are pretty similar, should I redirect them

Combining code means we're back to sacrificing a little performance. In particular the non-rotated versions of the functions which are much much cheaper.

The StringBuilder one in particular is bad to share with the string versions because the whole goal of it is to avoid an allocation of converting from StringBuilder to string.

I think we need to depend on the unit tests to ensure we don't make any mistakes in the duplicated code to get the most performance we can out of it.

Member

tomspilman commented Jan 11, 2017

How should we tackle this one?Create a copy of every test in SpriteFontTest.cs?

Yeah... copy the most relevant tests and modify it to be sure each draw method is tested.

Those methods are pretty similar, should I redirect them

Combining code means we're back to sacrificing a little performance. In particular the non-rotated versions of the functions which are much much cheaper.

The StringBuilder one in particular is bad to share with the string versions because the whole goal of it is to avoid an allocation of converting from StringBuilder to string.

I think we need to depend on the unit tests to ensure we don't make any mistakes in the duplicated code to get the most performance we can out of it.

@tomspilman

This comment has been minimized.

Show comment
Hide comment
@tomspilman

tomspilman Jan 11, 2017

Member

@nkast - Also noticed looking at the DrawString tests they don't seem to verify that newlines or carriage returns work correctly. We should use some of the same strings from the measure string tests:

https://github.com/MonoGame/MonoGame/blob/develop/Test/Framework/Graphics/SpriteFontTest.cs#L124

To verify that text drawing is working correctly.

Member

tomspilman commented Jan 11, 2017

@nkast - Also noticed looking at the DrawString tests they don't seem to verify that newlines or carriage returns work correctly. We should use some of the same strings from the measure string tests:

https://github.com/MonoGame/MonoGame/blob/develop/Test/Framework/Graphics/SpriteFontTest.cs#L124

To verify that text drawing is working correctly.

@nkast

This comment has been minimized.

Show comment
Hide comment
@nkast

nkast Jan 13, 2017

Contributor

@tomspilman Ready.

Contributor

nkast commented Jan 13, 2017

@tomspilman Ready.

+ float w, h;
+ if (sourceRectangle.HasValue)
+ {
+ var srcRect = sourceRectangle.Value;

This comment has been minimized.

@MichaelDePiazzi

MichaelDePiazzi Jan 15, 2017

Contributor

Since this is all about micro-optimisations, I thought I'd mention that GetValueOrDefault() is actually slightly faster than Value - https://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/

@MichaelDePiazzi

MichaelDePiazzi Jan 15, 2017

Contributor

Since this is all about micro-optimisations, I thought I'd mention that GetValueOrDefault() is actually slightly faster than Value - https://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/

nkast added some commits Dec 3, 2016

Refactor Draw(Texture2D, Vector2, Color) & Draw(Texture2D, Rectangle,…
… Color)

Draw(Texture2D, Vector2, Color)
Average Draw(ms): 0.18652 (develop)
Average Draw(ms): 0.14050 (this PR) (75.3%)

Draw(Texture2D, Rectangle, Color)
Average Draw(ms): 0.21533 (develop)
Average Draw(ms): 0.12461 (this PR) (57.9%)
Refactor Draw(Texture2D, Vector2, Rectangle?, Color) & Refactor Draw(…
…Texture2D, Rectangle, Rectangle?, Color)

Draw(Texture2D, Vector2, Rectangle, Color)
Average Draw(ms): 0.23807 (develop)
Average Draw(ms): 0.17247 (this PR) (72.4%)

Draw(Texture2D, Rectangle, Rectangle, Color)
Average Draw(ms): 0.25178 (develop)
Average Draw(ms): 0.15590 (this PR) (61.9%)
Refactor Draw(Texture2D, Rectangle, Rectangle?, Color, float, Vector2…
…, SpriteEffects, float)

Draw(Texture2D, Rectangle, Rectangle?, Color, float, Vector2,
SpriteEffects, float)
Average Draw(ms): 0.31487 (develop)
Average Draw(ms): 0.28922 (this PR) (91.9%)
Refactor Draw(Texture2D, Vector2, Rectangle?, Color, float, Vector2, …
…Vector2, SpriteEffects, float)

Draw(Texture2D, Vector2, Rectangle?, Color, float, Vector2, Vector2,
SpriteEffects, float)
Average Draw(ms): 0.31193 (develop)
Average Draw(ms): 0.29355 (this PR) (94.1%)
refactor DrawString(SpriteFont, string, Vector2, Color) & DrawString(…
…SpriteFont, StringBuilder, Vector2, Color)

DrawString(SpriteFont, string, Vector2, Color)
Average Draw(ms): 0.48899 (develop)
Average Draw(ms): 0.38432 (this PR) (78.6%)
inline DrawString(SpriteFont, string, Vector2, Color, float, Vector2,…
… Vector2 scale, SpriteEffects, float) & DrawString(SpriteFont, ,,,

...StringBuilder, Vector2, Color, float, Vector2, Vector2 scale,
SpriteEffects, float)
simplify DrawString(SpriteFont, string, Vector2, Color, float, Vector…
…2,Vector2 scale, SpriteEffects, float) &

DrawString(SpriteFont, StringBuilder, Vector2, Color, float, Vector2,
Vector2 scale, SpriteEffects, float)
refactor DrawString(SpriteFont, string, Vector2, Color, float, Vector…
…2, Vector2, SpriteEffects, float) &

DrawString(SpriteFont, StringBuilder, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float)

DrawString(SpriteFont, string, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float)
Average Draw(ms): 0.69743 (develop)
Average Draw(ms): 0.60097 (this PR) (86.2%)
optimize rotation for DrawString(SpriteFont, string, Vector2, Color, …
…float, Vector2, Vector2, SpriteEffects, float) &

DrawString(SpriteFont, StringBuilder, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float)

DrawString(SpriteFont, string, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float)
Average Draw(ms): 0.69743 (develop)
Average Draw(ms): 0.41846 (this commit) (60%)
refactor DrawString(SpriteFont, string, Vector2, Color) & DrawString(…
…SpriteFont, StringBuilder, Vector2, Color)

DrawString(SpriteFont, string, Vector2, Color)
Average Draw(ms): 0.48899 (develop)
Average Draw(ms): 0.37718 (this PR) (77.1%)
refactor DrawString(SpriteFont, string, Vector2, Color, float, Vector…
…2, Vector2, SpriteEffects, float) &

DrawString(SpriteFont, StringBuilder, Vector2, Color, float, Vector2,
Vector2, SpriteEffects, float)

DrawString(SpriteFont, string, Vector2, Color, float, Vector2, Vector2,
SpriteEffects, float)
Average Draw(ms): 0.69743 (develop)
Average Draw(ms): 0.40235 (this commit) (57.7%)
@KonajuGames

This comment has been minimized.

Show comment
Hide comment
@KonajuGames

KonajuGames Feb 16, 2017

Contributor

Were @tomspilman's comments on Jan 11 addressed?

Contributor

KonajuGames commented Feb 16, 2017

Were @tomspilman's comments on Jan 11 addressed?

@KonajuGames

This comment has been minimized.

Show comment
Hide comment
@KonajuGames

KonajuGames Feb 16, 2017

Contributor

I would like to merge this, but want to make sure the raised comments were addressed first.

Contributor

KonajuGames commented Feb 16, 2017

I would like to merge this, but want to make sure the raised comments were addressed first.

@nkast

This comment has been minimized.

Show comment
Hide comment
@nkast

nkast Feb 16, 2017

Contributor

@KonajuGames

Were @tomspilman's comments on Jan 11 addressed?

Commit DrawString tests ad80453 improve code coverage of DrawString(), including newlines.
ex: https://github.com/nkast/MonoGame/blob/ad80453829c50e70c5a089166153964b12301543/Test/Framework/Graphics/SpriteFontTest.cs#L495-L503

@KonajuGames , @tomspilman
Would you like to move the Tests on another PR or is it good that way?

Contributor

nkast commented Feb 16, 2017

@KonajuGames

Were @tomspilman's comments on Jan 11 addressed?

Commit DrawString tests ad80453 improve code coverage of DrawString(), including newlines.
ex: https://github.com/nkast/MonoGame/blob/ad80453829c50e70c5a089166153964b12301543/Test/Framework/Graphics/SpriteFontTest.cs#L495-L503

@KonajuGames , @tomspilman
Would you like to move the Tests on another PR or is it good that way?

@KonajuGames

This comment has been minimized.

Show comment
Hide comment
@KonajuGames

KonajuGames Feb 16, 2017

Contributor

I think the tests here are fine as they are related. Merging. Thanks @nkast

Contributor

KonajuGames commented Feb 16, 2017

I think the tests here are fine as they are related. Merging. Thanks @nkast

@KonajuGames KonajuGames merged commit feb98d6 into MonoGame:develop Feb 16, 2017

5 checks passed

Build Mac, iOS, and Linux Finished TeamCity Build MonoGame :: Build Mac : Running
Details
Build Windows, Web, Android, and OUYA Finished TeamCity Build MonoGame :: Build Windows : Running
Details
Package Mac and Linux Finished TeamCity Build MonoGame :: Package Mac and Linux : Running
Details
Package Windows SDK Finished TeamCity Build MonoGame :: Package Windows : Running
Details
Test Windows Finished TeamCity Build MonoGame :: Test Windows : Tests passed: 1071, ignored: 10
Details
@nkast

This comment has been minimized.

Show comment
Hide comment
@nkast

nkast Feb 16, 2017

Contributor

Thanks @KonajuGames

Contributor

nkast commented Feb 16, 2017

Thanks @KonajuGames

@nkast nkast deleted the nkast:tncSpriteBatchDraw branch Feb 16, 2017

@tomspilman

This comment has been minimized.

Show comment
Hide comment
@tomspilman

tomspilman Feb 21, 2017

Member

@nkast @MichaelDePiazzi

So reviewing the current state of SpriteBatch I think the next optimization we could do would be changing SpriteBatchItem to a structure. This would require us changing things to use unsafe SpriteBatchItem* CreateBatchItem(). It would also mean sorting would change a bit... we would sort an index array that is then used to read from the _batchItemList in the correct order.

The main benefit of all this is that _batchItemList is a contiguous chunk of memory improving performance via cache coherency.

The other optimization I already did for PS4... just have to work out how to support it in the public repo. Basically using a pre-initialized IndexBuffer along with DrawUserIndexedPrimitives. This requires a new overload of DrawUserIndexedPrimitives that doesn't take an index array. The main benefit of this is avoiding the memory copy of index data to the GPU on each draw.

Any other thoughts?

Member

tomspilman commented Feb 21, 2017

@nkast @MichaelDePiazzi

So reviewing the current state of SpriteBatch I think the next optimization we could do would be changing SpriteBatchItem to a structure. This would require us changing things to use unsafe SpriteBatchItem* CreateBatchItem(). It would also mean sorting would change a bit... we would sort an index array that is then used to read from the _batchItemList in the correct order.

The main benefit of all this is that _batchItemList is a contiguous chunk of memory improving performance via cache coherency.

The other optimization I already did for PS4... just have to work out how to support it in the public repo. Basically using a pre-initialized IndexBuffer along with DrawUserIndexedPrimitives. This requires a new overload of DrawUserIndexedPrimitives that doesn't take an index array. The main benefit of this is avoiding the memory copy of index data to the GPU on each draw.

Any other thoughts?

@nkast

This comment has been minimized.

Show comment
Hide comment
@nkast

nkast Feb 21, 2017

Contributor

I think the next optimization we could do would be changing SpriteBatchItem to a structure.
The main benefit of all this is that _batchItemList is a contiguous chunk of memory improving performance via cache coherency.

That would be preferred. Having thousand of small objects is a waste of memory, put a pressure on the GC to iterate them and also are not packed close together in memory. Still I don't know how we can implement this, especially sorting. My previous attempts didn't yield good results, I have to try this again.
We already discussed the possibility where CreateBatchItem() returns a struct that's just an index to an array and a pointer to that array.
Some more thoughts on this: When we draw with Sorting, we already lose the benefit of cache coherency. When we draw with Deferred it's probably faster to write directly into the Vertex buffer. SpriteBatcher & SpriteBatchItem already provide a good abstraction/isolation, so we could try a strategy pattern with abstract classes or Interfaces and have concrete SpriteBatchers specialized for each SortMode.

Basically using a pre-initialized IndexBuffer along with DrawUserIndexedPrimitives. This requires a new overload of DrawUserIndexedPrimitives that doesn't take an index array.

I've done something similar for DX, instead of changing DrawUserIndexedPrimitives() I have a VertexBuffer inside SpriteBatcher, the code is similar to what DrawUserIndexedPrimitives() is doing internally. (Additionally, I map the buffer and write directly to the mapped memory, saving one memcopy, and also batch the entire buffer at once instead of flushing on every texture change).
The problem is that on OpenGL the DrawUserIndexedPrimitives path is faster that going through a VertexBuffer & IndexBuffer. My attempts to unify those using partial classes and PlatformX() methods results in code that's more complicated than I would like and not that good in performance.

Some more ideas for SpriteBatch:
-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph). This will save us the time spend to copy the Glyph struct from the dictionary and probably make Dictionary lookup faster. Other data structures might perform better in that scenario (SortedList<char,int>, HashTable, or a Binary search operating directly on the array).
-Cache texture coordinates TL/BR in Glyph.
-Batch up to 16 textures in one Flush, combined with a new technique in SpriteEffect. This one makes Draw ~x5.5 times faster on DX and ~x7 times faster on GL when you draw different textures with Deferred. Other SortModes including SortMode.Texture will see major boost as well. The catch is that I think it will perform terribly on GPU with SM2.0 and that it will override all texture slots after .End(). Also I don't know what happens when you push overlapping primitives with the same depth, does GPU streams somehow take into account the order of each primitive on IndexBuffer?

Contributor

nkast commented Feb 21, 2017

I think the next optimization we could do would be changing SpriteBatchItem to a structure.
The main benefit of all this is that _batchItemList is a contiguous chunk of memory improving performance via cache coherency.

That would be preferred. Having thousand of small objects is a waste of memory, put a pressure on the GC to iterate them and also are not packed close together in memory. Still I don't know how we can implement this, especially sorting. My previous attempts didn't yield good results, I have to try this again.
We already discussed the possibility where CreateBatchItem() returns a struct that's just an index to an array and a pointer to that array.
Some more thoughts on this: When we draw with Sorting, we already lose the benefit of cache coherency. When we draw with Deferred it's probably faster to write directly into the Vertex buffer. SpriteBatcher & SpriteBatchItem already provide a good abstraction/isolation, so we could try a strategy pattern with abstract classes or Interfaces and have concrete SpriteBatchers specialized for each SortMode.

Basically using a pre-initialized IndexBuffer along with DrawUserIndexedPrimitives. This requires a new overload of DrawUserIndexedPrimitives that doesn't take an index array.

I've done something similar for DX, instead of changing DrawUserIndexedPrimitives() I have a VertexBuffer inside SpriteBatcher, the code is similar to what DrawUserIndexedPrimitives() is doing internally. (Additionally, I map the buffer and write directly to the mapped memory, saving one memcopy, and also batch the entire buffer at once instead of flushing on every texture change).
The problem is that on OpenGL the DrawUserIndexedPrimitives path is faster that going through a VertexBuffer & IndexBuffer. My attempts to unify those using partial classes and PlatformX() methods results in code that's more complicated than I would like and not that good in performance.

Some more ideas for SpriteBatch:
-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph). This will save us the time spend to copy the Glyph struct from the dictionary and probably make Dictionary lookup faster. Other data structures might perform better in that scenario (SortedList<char,int>, HashTable, or a Binary search operating directly on the array).
-Cache texture coordinates TL/BR in Glyph.
-Batch up to 16 textures in one Flush, combined with a new technique in SpriteEffect. This one makes Draw ~x5.5 times faster on DX and ~x7 times faster on GL when you draw different textures with Deferred. Other SortModes including SortMode.Texture will see major boost as well. The catch is that I think it will perform terribly on GPU with SM2.0 and that it will override all texture slots after .End(). Also I don't know what happens when you push overlapping primitives with the same depth, does GPU streams somehow take into account the order of each primitive on IndexBuffer?

@tomspilman

This comment has been minimized.

Show comment
Hide comment
@tomspilman

tomspilman Feb 21, 2017

Member

Still I don't know how we can implement this, especially sorting.

You don't sort the SpriteBatchItem array. You keep a separate array which contains both an int index and the sort key. That should be a small struct of just 8 bytes. That is the thing you sort. You then use that array to index into the SpriteBatchItem array.

CreateBatchItem() returns a struct that's just an index to an array and a pointer to that array.

Actually CreateBatchItem() should just return an unsafe pointer to the item to fill in. This makes the code easier to follow and write and also eliminates extra bounds checking on the array.

When we draw with Sorting, we already lose the benefit of cache coherency.

Not at all. The struct array is way more cache friendly even if you are jumping around in that array.

Compared to an array of object pointers... you have the pointer array in one location in memory and each object possibly different places in memory. Each could be so far away it takes one cache hit per item. The second your game releases memory you make gaps which the allocator could try to fill when you allocate a new SpriteBatchItem. Each could be dozens of MBs away from each other. It is total chaos when it comes to the cache.

The it vastly better with the struct array.

When we draw with Deferred it's probably faster to write directly into the Vertex buffer.

Except we would have to flush if we run out of space in the vertex array... breaking from XNA behavior.

IMO just making things cache friendly is enough here.

Member

tomspilman commented Feb 21, 2017

Still I don't know how we can implement this, especially sorting.

You don't sort the SpriteBatchItem array. You keep a separate array which contains both an int index and the sort key. That should be a small struct of just 8 bytes. That is the thing you sort. You then use that array to index into the SpriteBatchItem array.

CreateBatchItem() returns a struct that's just an index to an array and a pointer to that array.

Actually CreateBatchItem() should just return an unsafe pointer to the item to fill in. This makes the code easier to follow and write and also eliminates extra bounds checking on the array.

When we draw with Sorting, we already lose the benefit of cache coherency.

Not at all. The struct array is way more cache friendly even if you are jumping around in that array.

Compared to an array of object pointers... you have the pointer array in one location in memory and each object possibly different places in memory. Each could be so far away it takes one cache hit per item. The second your game releases memory you make gaps which the allocator could try to fill when you allocate a new SpriteBatchItem. Each could be dozens of MBs away from each other. It is total chaos when it comes to the cache.

The it vastly better with the struct array.

When we draw with Deferred it's probably faster to write directly into the Vertex buffer.

Except we would have to flush if we run out of space in the vertex array... breaking from XNA behavior.

IMO just making things cache friendly is enough here.

@willmotil

This comment has been minimized.

Show comment
Hide comment
@willmotil

willmotil Feb 21, 2017

Contributor

-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph).

I thought about trying to do this but how does this work under localization remember chars are 2 bytes not 1. An 'e' in English is i doubt the same as 'e' in japanese '絵'. To say how are you going to assign it as a index to be looked up directly without a big array ?

Can you simply say var charindex = byte(text[i]); then glyph[charindex] or use a pointer to it.
What if the greek 'e' ends up having its defining byte in the higher order byte and the low order is just 0.

Im not familiar enough with the standard to say that all letter e's are always in the lower byte but maybe they are. Just saying this is something that gave me pause when i was entertaining the thought.

Contributor

willmotil commented Feb 21, 2017

-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph).

I thought about trying to do this but how does this work under localization remember chars are 2 bytes not 1. An 'e' in English is i doubt the same as 'e' in japanese '絵'. To say how are you going to assign it as a index to be looked up directly without a big array ?

Can you simply say var charindex = byte(text[i]); then glyph[charindex] or use a pointer to it.
What if the greek 'e' ends up having its defining byte in the higher order byte and the low order is just 0.

Im not familiar enough with the standard to say that all letter e's are always in the lower byte but maybe they are. Just saying this is something that gave me pause when i was entertaining the thought.

@KonajuGames

This comment has been minimized.

Show comment
Hide comment
@KonajuGames

KonajuGames Feb 21, 2017

Contributor
Contributor

KonajuGames commented Feb 21, 2017

@willmotil

This comment has been minimized.

Show comment
Hide comment
@willmotil

willmotil Feb 21, 2017

Contributor

I dunno but i think dictionary in this case is directly mapping via hashcode from the underlying chars base object.hashcode. So i think in this case dictionary lookup's would in fact be O(1) (if i said that right) and that's tough to beat already. Though it might be slowed a little thru its call chain im not too sure how fast it is here i would profile it first before attempting to beat it.

You don't sort the SpriteBatchItem array. You keep a separate array which contains both an int index and the sort key. That should be a small struct of just 8 bytes. That is the thing you sort. You then use that array to index into the SpriteBatchItem array.

Swapping the items in place via a indexed primitive drawing array or some sort of pointer swapping ?
Would this require the new overload your talking about ?

Contributor

willmotil commented Feb 21, 2017

I dunno but i think dictionary in this case is directly mapping via hashcode from the underlying chars base object.hashcode. So i think in this case dictionary lookup's would in fact be O(1) (if i said that right) and that's tough to beat already. Though it might be slowed a little thru its call chain im not too sure how fast it is here i would profile it first before attempting to beat it.

You don't sort the SpriteBatchItem array. You keep a separate array which contains both an int index and the sort key. That should be a small struct of just 8 bytes. That is the thing you sort. You then use that array to index into the SpriteBatchItem array.

Swapping the items in place via a indexed primitive drawing array or some sort of pointer swapping ?
Would this require the new overload your talking about ?

@theZMan

This comment has been minimized.

Show comment
Hide comment
@theZMan

theZMan Feb 22, 2017

Contributor

.NET uses 16-bit chars throughout. This covers the first 65536 character
codes in the Unicode range.

.Net strings are UTF-16 which doesn't mean a 16 bit char. Some characters could take 2 16 bit values to represent. See https://en.wikipedia.org/wiki/UTF-16#Description for the painful truth. Its unlikely many people will use encodings outside the BMP (https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane) but you can't assume 65536 characters.

Contributor

theZMan commented Feb 22, 2017

.NET uses 16-bit chars throughout. This covers the first 65536 character
codes in the Unicode range.

.Net strings are UTF-16 which doesn't mean a 16 bit char. Some characters could take 2 16 bit values to represent. See https://en.wikipedia.org/wiki/UTF-16#Description for the painful truth. Its unlikely many people will use encodings outside the BMP (https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane) but you can't assume 65536 characters.

@KonajuGames

This comment has been minimized.

Show comment
Hide comment
@KonajuGames

KonajuGames Feb 22, 2017

Contributor
Contributor

KonajuGames commented Feb 22, 2017

@willmotil

This comment has been minimized.

Show comment
Hide comment
@willmotil

willmotil Feb 22, 2017

Contributor

Edit ok my bad thats wrong thats nibbles.
So what does it break then thats the next question.

Supplementary Multilingual Plane.
Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts, and also reform orthographies like Shavian and Deseret. Symbols and notations include historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; and game symbols for playing cards, Mah Jongg, and dominoes.

Next question is it worth it ?

Symbols and notations include historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets;

People have already requested emoji support ect.
Remember you will be able to make the font just not use that text in your strings it would be culled.

Well i dunno here's dictionary though.

https://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,5f7cea043f17efa9

Contributor

willmotil commented Feb 22, 2017

Edit ok my bad thats wrong thats nibbles.
So what does it break then thats the next question.

Supplementary Multilingual Plane.
Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts, and also reform orthographies like Shavian and Deseret. Symbols and notations include historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; and game symbols for playing cards, Mah Jongg, and dominoes.

Next question is it worth it ?

Symbols and notations include historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets;

People have already requested emoji support ect.
Remember you will be able to make the font just not use that text in your strings it would be culled.

Well i dunno here's dictionary though.

https://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,5f7cea043f17efa9

@MichaelDePiazzi

This comment has been minimized.

Show comment
Hide comment
@MichaelDePiazzi

MichaelDePiazzi Feb 22, 2017

Contributor

I think the next optimization we could do would be changing SpriteBatchItem to a structure.

This definitely sounds worth trying.

-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph). This will save us the time spend to copy the Glyph struct from the dictionary and probably make Dictionary lookup faster. Other data structures might perform better in that scenario (SortedList<char,int>, HashTable, or a Binary search operating directly on the array).
-Cache texture coordinates TL/BR in Glyph.

I think spending some more time on optimising DrawString is definitely worthwhile. Text drawing can still be one of the more costly areas in my game. I've personally been meaning to spend some time on this for a while. So if I come up with something, I'll be sure to submit a PR.

Contributor

MichaelDePiazzi commented Feb 22, 2017

I think the next optimization we could do would be changing SpriteBatchItem to a structure.

This definitely sounds worth trying.

-Keep Glyphs in an array and use an index to lookup and access Glyph data instead of Dictionary<char, Glyph> and .TryGetValue(c, out currentGlyph). This will save us the time spend to copy the Glyph struct from the dictionary and probably make Dictionary lookup faster. Other data structures might perform better in that scenario (SortedList<char,int>, HashTable, or a Binary search operating directly on the array).
-Cache texture coordinates TL/BR in Glyph.

I think spending some more time on optimising DrawString is definitely worthwhile. Text drawing can still be one of the more costly areas in my game. I've personally been meaning to spend some time on this for a while. So if I come up with something, I'll be sure to submit a PR.

@theZMan

This comment has been minimized.

Show comment
Hide comment
@theZMan

theZMan Feb 22, 2017

Contributor

Other data structures might perform better in that scenario (SortedList<char,int>,
HashTable, or a Binary search operating directly on the array).

Dictionary is the generic replacement for HashTable - I hope we never consider using that. I dont know why the framework didn't deprecate it years ago.

There's no way that a SortedList or any kind of binary search is going to be faster than a hash table. A good hash table implementation gives close to O(1) time for accessing (See https://msdn.microsoft.com/en-us/library/xfhwa508.aspx) and since we know the input set at load time we should be able to preallocate the dictionary efficiently too (though it happens at Load so who cares ;-)) I suspect using Char as the Key is also super efficient - the default hash function on that should be trivial.

Really we need to profile here rather than just guessing what needs doing.

Contributor

theZMan commented Feb 22, 2017

Other data structures might perform better in that scenario (SortedList<char,int>,
HashTable, or a Binary search operating directly on the array).

Dictionary is the generic replacement for HashTable - I hope we never consider using that. I dont know why the framework didn't deprecate it years ago.

There's no way that a SortedList or any kind of binary search is going to be faster than a hash table. A good hash table implementation gives close to O(1) time for accessing (See https://msdn.microsoft.com/en-us/library/xfhwa508.aspx) and since we know the input set at load time we should be able to preallocate the dictionary efficiently too (though it happens at Load so who cares ;-)) I suspect using Char as the Key is also super efficient - the default hash function on that should be trivial.

Really we need to profile here rather than just guessing what needs doing.

@willmotil

This comment has been minimized.

Show comment
Hide comment
@willmotil

willmotil Feb 22, 2017

Contributor

I think their might even be a question of is it worth it to ditch dictionary because in my humble opinion it would have to be a big gain for losing functionality. As well as a possible breaking change to existing apps.
Not to mention as dropping the idea of any future extensiblity. Even if you did map this 1 to 1 with a array which is O(1) its not clear how much you actually gain.
Im not sure what dictionarys trick is exactly that allows it to avoid bucket collisions when it maps keys to values, but if you look into that code there appears to be some prime numbering magic going on. Along with pulling the reference id of the instance of the instances base.object i.e. in referring earlier to it using the object.hashcode opposed to a hashtable.

I agree that not only would you need a profiling that showed it was a big speed up, but also...
Then you have to justify ditching dictionary itself because it can use anything as a key you might find later on that this is something we end up regretting. As well it would be a possibly breaking change.

Contributor

willmotil commented Feb 22, 2017

I think their might even be a question of is it worth it to ditch dictionary because in my humble opinion it would have to be a big gain for losing functionality. As well as a possible breaking change to existing apps.
Not to mention as dropping the idea of any future extensiblity. Even if you did map this 1 to 1 with a array which is O(1) its not clear how much you actually gain.
Im not sure what dictionarys trick is exactly that allows it to avoid bucket collisions when it maps keys to values, but if you look into that code there appears to be some prime numbering magic going on. Along with pulling the reference id of the instance of the instances base.object i.e. in referring earlier to it using the object.hashcode opposed to a hashtable.

I agree that not only would you need a profiling that showed it was a big speed up, but also...
Then you have to justify ditching dictionary itself because it can use anything as a key you might find later on that this is something we end up regretting. As well it would be a possibly breaking change.

@theZMan

This comment has been minimized.

Show comment
Hide comment
@theZMan

theZMan Feb 22, 2017

Contributor

The prime number 'magic' is deciding how many buckets to use based on the number of items you want to store in the collection. That's how it tries to keep the number of items per bucket low.

Otherwise there is no magic, when there is a collision it does a quick loop to find the right one. That's still 'close to' O(1) unless your hashing algorithm is putting far too many in 1 bucket. If you hash function is return 0 then you will find all your accesses become O(N) because everything is in 1 bucket.

And I don't mean use profiling to show that it speeds up I mean use profiling to show what the current hot spots are for a realistic usage in a real game. It pretty easy to make fake examples that will show issues in specific places and then massive gains by fixing those issues but then it doesn't translate into real world gains because it wasn't a real problem.

Contributor

theZMan commented Feb 22, 2017

The prime number 'magic' is deciding how many buckets to use based on the number of items you want to store in the collection. That's how it tries to keep the number of items per bucket low.

Otherwise there is no magic, when there is a collision it does a quick loop to find the right one. That's still 'close to' O(1) unless your hashing algorithm is putting far too many in 1 bucket. If you hash function is return 0 then you will find all your accesses become O(N) because everything is in 1 bucket.

And I don't mean use profiling to show that it speeds up I mean use profiling to show what the current hot spots are for a realistic usage in a real game. It pretty easy to make fake examples that will show issues in specific places and then massive gains by fixing those issues but then it doesn't translate into real world gains because it wasn't a real problem.

@willmotil

This comment has been minimized.

Show comment
Hide comment
@willmotil

willmotil Feb 22, 2017

Contributor

Well in this case specifically were talking about text and the glyphs of a font so im sure if you directly convert the char to byte then map the byte to a array of values you can get a slight speed up in specific or general cases in a real application one that simply has a lot of text, which of course some do.

The only point im trying to make really ... is that there is a price for dropping dictionary beyond performance. Not only in possibly breaking changes but also for future improvements or functionality as in the case of, emoji's, word wrapping to rectangle areas, scientific notation. That's just off the top of my head im sure there are plenty more.
The alternative to this with direct array mapping which of course for a 256 index array, will in all cases be O(1). Is then later to use a very large array per font to get that functionality back.

Which would probably end up taking you back to the dictionary where you started.
Which i suppose is the point im really trying to convey, that should also be considered.

that is all i want to say on that point.

Well one more thing, I haven't actually brought this up yet, Maybe some others have thought of it and not mentioned it, But we should at some time consider creation of a drawstring overload that takes a previously calculated set of spritebatch items for reuse as this could potentially offer a enormous speed boost possibly even beyond regular sprite drawing. This might even be instanced or a set of areas in a instance. Dynamic text would still need to be drawn seperately to position but it could provide a way to have static text draw to screen at maximum speed. For this idea simply consider a spritebatch draw that returns a array of sets it has just calculated and then another draw that takes that array to actually directly draw, Consider this array to be a vertex array. While the first call need only be done one time.

Contributor

willmotil commented Feb 22, 2017

Well in this case specifically were talking about text and the glyphs of a font so im sure if you directly convert the char to byte then map the byte to a array of values you can get a slight speed up in specific or general cases in a real application one that simply has a lot of text, which of course some do.

The only point im trying to make really ... is that there is a price for dropping dictionary beyond performance. Not only in possibly breaking changes but also for future improvements or functionality as in the case of, emoji's, word wrapping to rectangle areas, scientific notation. That's just off the top of my head im sure there are plenty more.
The alternative to this with direct array mapping which of course for a 256 index array, will in all cases be O(1). Is then later to use a very large array per font to get that functionality back.

Which would probably end up taking you back to the dictionary where you started.
Which i suppose is the point im really trying to convey, that should also be considered.

that is all i want to say on that point.

Well one more thing, I haven't actually brought this up yet, Maybe some others have thought of it and not mentioned it, But we should at some time consider creation of a drawstring overload that takes a previously calculated set of spritebatch items for reuse as this could potentially offer a enormous speed boost possibly even beyond regular sprite drawing. This might even be instanced or a set of areas in a instance. Dynamic text would still need to be drawn seperately to position but it could provide a way to have static text draw to screen at maximum speed. For this idea simply consider a spritebatch draw that returns a array of sets it has just calculated and then another draw that takes that array to actually directly draw, Consider this array to be a vertex array. While the first call need only be done one time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment