Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory usage with DBString and DBArray #379

Merged
merged 37 commits into from Sep 1, 2020

Conversation

fmatthew5876
Copy link
Contributor

@fmatthew5876 fmatthew5876 commented Aug 16, 2020

Depends on: EasyRPG/Player#2280

DBString

  • Intended for storing read only database data.
  • Optimized for space, sizeof(DBString) == sizeof(char*)
  • Optimized for the case of many empty strings, no small string optimization as used in C++11 std::string.
  • Stores the size in the buffer, pascal style
  • Custom memory allocator for compact memory usage, fast loading, and optimal cache locality (coming soon)

DBarray

  • Same storage policy as DBString
  • Replaces std::vector

Performance Measurements

bench_readldb max memory usage

Game Ldb Size Master PR w/o allocator PR w/ allocator
HH3 16MB 135.7MB 94.7MB TBD
Heroes Realm 27MB 301.6MB 204.3MB TBD
Violated Heroine 11MB 75.9MB 54.6MB TBD
Yume 2kki 1.4MB 8.0MB 6.0MB TBD

bench_readldb average runtime

Game Master PR w/o allocator PR w/ allocator
HH3 0.522s 0.522s TBD
Heroes Realm 1.09s 1.09s TBD
Violated Heroine 0.323s 0.323s TBD
Yume 2kki 0.045s 0.046s TBD

Notes about Editor

DBString is optimal for read-only database usage of Player but not necessarily for Editor. For editor, maybe we want to still use std::string for all strings. Perhaps an EDITOR build mode of liblcf?

Given that Editor is not being worked on, I'm not super interested in putting too much effort on this side.

Todo

  • Write unit tests for string types
  • Retest liblcf bit-perfect read-writing
  • Final review what should use DBString and what should not.
  • Test Player
  • Write a custom memory allocator for DBString and benchmark it
  • Should we use a third party string_view like gsl or nonstd?
  • Best way to handle libfmt support / hacks?
  • Make DBString move only, or at least make copies reference same underlying string using allocator?
  • DBString from database should be read only and exist for the lifetime of the game, maps however come and go. How to reconcile with allocator? Maybe DB and Map arenas?
  • Add a "use std::string" for everything build mode for editor and other interactive tools.

@Ghabry
Copy link
Member

Ghabry commented Aug 16, 2020

Will it still be possible to allocate a new DBString and overwrite an existing DBString?
Otherwise the editor will have a huge problem because it can't write data anymore ^^'.
(lets ignore any speed impacts here, the editor will not run on slow hardware)

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 16, 2020

Will it still be possible to allocate a new DBString and overwrite an existing DBString?
Otherwise the editor will have a huge problem because it can't write data anymore ^^'.
(lets ignore any speed impacts here, the editor will not run on slow hardware)

Right now DBString just does malloc() so yes, but once I put in the custom allocator that won't be the case.

That's why I'm suggesting either a separate build mode, or some template way of creating a std::string only database, or just auto generate 2 copies of all the lcf::rpg:: classes.

Essentially we need "fast read-only player mode" and "slow read/write editor mode" for liblcf somehow.

@fmatthew5876
Copy link
Contributor Author

Maybe the editor thing should be a separate physical library + headers somehow deps the base liblcf.{a|so} library.

I don't like the idea of BUILD modes. I would prefer if people install a binary liblcf package, they get everything they need for both Player and Editor.

As much as I hate adding a crap ton more auto generated headers to liblcf this is probably the way to go. For player we should only compile against and link to the fast implementation. We don't want to pay compile or link time costs for the slow one.

For editor we don't care. One might even want the fast implementation available so we can provide expected memory usage statistics in the editor.

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 17, 2020

I pushed an experiment with deduplication. Basically each string is refcounted and we look it up in a hash table.

Unfortunately the results are not great. We get a meager savings in memory and a slower load time. For the memory, I can assume that even though a lot of strings are repeated, the memory overhead of the hash table dwarfs the savings. For the runtime, I could probably improve things by using a faster hash table.

Game Max Mem Usage Runtime
Yume 2kki 6Mb 0.73s
HH3 90.7MB 0.578s
HeroesRealm 1992.MB 1.18s

Overall looks like deduplication of strings is not worth it 👎

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 17, 2020

And now lets try taking out the hash table deduplication, but keeping the reference counting mechanism for copies.

Game Max Mem Usage
Yume 2kki 6.1Mb
HH3 95.2MB
HeroesRealm 203.8MB

Memory usage is slightly up across the board. This is a also a loser. Not surprising, since we have a lot of strings and an extra 4 bytes for each one still adds up. 👎

Pre C++11 would have had a profile like this, as std::string used to be copy on write reference counted.

@fmatthew5876
Copy link
Contributor Author

I tried playing with the custom allocator, and it didn't produce much in the way of results. Seems like malloc does a pretty good job these days.

So this DBString class just being a very naive string which is allocated each time and doesn't support modification seems to fit the bill.

This also means for now we don't need to bifurcate liblcf. While this DBString is not super convenient to use like std::string, you can read and write to it from Editor just fine.

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 17, 2020

One more important test. What are the memory usage differences if we take events out of the picture?

Max memory usage Toop::event_commands removed

Game Master PR
HH3 10.9MB 8.7MB
Heroes Realm 9.3MB 8.1MB
Violated Heroine 73.9MB 52.8MB
Yume 2kki 8.0MB 6.0MB

Max memory usage Toop::event_commands and CommonEvent::event_commands removed

Game Master PR
HH3 4.9MB 4.5MB
Heroes Realm 6.3MB 6.0MB
Violated Heroine 5.6MB 5.0MB
Yume 2kki 2.4MB 2.0MB

Event without events, this string type has memory savings for the rest of the database.

The conclusion here is that for real wins, we have to either compress or not keep all these event data into memory at once. As has been suggested in the past, I agree probably some caching mechanism would make sense.

For battles this is easy, we have several frames of battle transition where we could load the battle info from disk and keep it in an LRU cache.

For common events not so much, since they can trigger at any time and even 1 frame delay could break a game.

As was also suggested before, compressing the data would help a lot. Basically keeping the event code in it's binary form and decoding it on the fly as the events execute.

@Ghabry
Copy link
Member

Ghabry commented Aug 17, 2020

Very nice tests! From an editor pov that the allocator doesn't work is good news ;).
Working with DBString is worth the savings.

Nice work!
I would be curious about 32 bit tests as they give a better picture how most ports will behave. Though likely easier to do on Windows 🤔

@Ghabry
Copy link
Member

Ghabry commented Aug 17, 2020

How are the savings when you keep the RPG::eventcommand as is and only make the parameter vector uint8_t with the raw BER data? Should also safe some memory this way and keep the API mostly the same.

Another 4 bytes: Will it break games when the code and indent are int16_t? The values would all fit.

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 18, 2020

How are the savings when you keep the RPG::eventcommand as is and only make the parameter vector uint8_t with the raw BER data? Should also safe some memory this way and keep the API mostly the same.

I did that here in #255

But check out the new post if you replace the entire event stream by uint8_t ...

Another 4 bytes: Will it break games when the code and indent are int16_t? The values would all fit.

I think this would be fine, but I would use uint16_t.

Not sure what the indent limit is in RPG_RT, but 255 indents seems like way too many.

A better option could be 24 bytes for the code and 8 bytes for the indent. 255 is the sentinal for subcommand_path.. That leaves the code space wide open millions of commands. A complex game building lots of custom functions could get stuck on 64K and we might want to use namespace like gaps in numbers for different patchsets, giving them room to expand without collision.

Another way to reduce 16 bytes, add a DBArray class which is like a std::vector that has the size embedded in the buffer like DBString. On 64 bit platforms we would reduce from a 24 byte std::vector to an 8 byte DBArray. This data structure would also help reduce memory size of items and animations, which also contain lots of sub-arrays.

Even better, make some EventRunner class. Store the entire array of all event command parameters in a single vector. Then use a int32_t offset in EventCommand to reference into it. Not only saving 4 bytes, but also saving all the memory allocation overhead and cache misses of all the individual parameter arrays.

Optimizing the event parsing for speed could have measurable effects on script heavy games like Frozen Triggers and Smash Bros.

@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 18, 2020

I've switched StringView to use this implementation:

https://github.com/martinmoene/string-view-lite

We now distribute include/lcf/third_party for third party header files we use in liblcf. The intention is that all third party types we put in here get wrapped in lcf:: namespace.

Also wrote unit tests.

I've decided to use DBString for ldb, lmt, and lmu files for now. We can always change this later if we want to and maybe we will just use DBString for all strings.

I'm really not thrilled about this:
https://github.com/fmatthew5876/Player/blob/string/src/string_view.h#L33

Basically, enable fmtlib printing support without including more than <fmt/core.h> you need to write a to_string_view() function which returns the fmt::string_view type. Since liblcf doesn't depend on fmt, I hacked it into the player header.

It's ugly, but it works and probably necessary if we don't want to add the dependency at the lcf level.

Player PR is up to date and building against this one.

This is ready for review.

@fmatthew5876 fmatthew5876 marked this pull request as ready for review August 18, 2020 06:04
// string-view lite, a C++17-like string_view for C++98 and later.
// For more information see https://github.com/martinmoene/string-view-lite
//
// Distributed under the Boost Software License, Version 1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest mentioning it on README.md as we do with inih.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Makefile.am Outdated Show resolved Hide resolved
Terms,file,f,DBString,0x94,'',1,0,String
Terms,exit_game_message,f,DBString,0x97,'',1,0,String
Terms,yes,f,DBString,0x98,'',1,0,String
Terms,no,f,DBString,0x99,'',1,0,String
Music,name,f,String,0x01,"(OFF)",1,0,String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are music name and sound name the only non-save chunks shared with savegames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, and maybe we'll just convert all the save data too at some point


typedef Traits traits_type;
typedef CharT value_type;
using char_type = value_type; // <- LIBLCF HACK: to workaround bug in older versions of fmtlib for Player: https://github.com/fmtlib/fmt/issues/1539
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this. It's one line I had to add to this implementation to workaround bugs in old versions of fmtlib.

@fmatthew5876 fmatthew5876 changed the title Optimize memory usage with custom string types Optimize memory usage with DBString and DBArray Aug 19, 2020
@fmatthew5876 fmatthew5876 changed the title Optimize memory usage with DBString and DBArray Optimize memory usage with DBString Aug 19, 2020
@fmatthew5876 fmatthew5876 changed the title Optimize memory usage with DBString Optimize memory usage with DBString and DBArray Aug 26, 2020
@fmatthew5876
Copy link
Contributor Author

I've added DBArray and only support for EventCommand::parameters because it doesn't require refactoring liblcf and provides almost all the benefit. It also includes a refactor of DBString to unify both classes under the same allocator.

I've reproduced the performance numbers here.

Game Ldb Size std::string + std::vector DBString + std::vector DBString + DBArray
HH3 16MB 135.7MB 94.7MB 68.7 MB
Heroes Realm 27MB 301.6MB 204.3MB 143.3 MB
Violated Heroine 11MB 75.9MB 54.6MB 41.4 MB
Yume 2kki 1.4MB 8.0MB 6.0MB 4.9 MB

This is as far as this PR will go.

@fmatthew5876 fmatthew5876 marked this pull request as ready for review August 26, 2020 04:42
@fmatthew5876
Copy link
Contributor Author

fmatthew5876 commented Aug 26, 2020

Some numbers for HH3 memory usage in Player at the title screen in Windows 10 release build.

Branch Mem Usage
Master 178MB
DBString w/o empty string optimization 163MB
DBString 142MB
DBString + DBArray 120MB

One of the optimizations I do in this implementation matters a lot

That is for empty strings, you still need to create a 1 character buffer to the store the null terminator. In my implementation, I create a single const uint32_t[2] = { 0, 0} global variable which is used to store the "empty array". The first uint32_t is the 0 size, the second is to store up to 4 bytes of null terminator. Anytime you create an empty DBString or DBArray instead of allocating, it will just point it's buffer at this global.

Without this, we do a ton of 5 byte mallocs for lots of empty strings

bench_readldb on hh3's database will do 2448428 total DBString mallocs without the optimization and 309767 with the optimization. The table above shows 21MB of memory overhead from allocating all these empty strings.

This also greatly improves performance for empty strings, as they all point to the same memory and thus keeps empty strings hot in cpu cache.

This gives an easy way to disable reading some chunks,
for memory profiling
@Ghabry
Copy link
Member

Ghabry commented Aug 29, 2020

This PR adds some DBArray and #383 adds more DBArray?

@fmatthew5876
Copy link
Contributor Author

This PR adds some DBArray and #383 adds more DBArray?

Yes, this adds it only for EventCommand::parameters as that is the largest benefit and requires minimal lcf refactoring. #383 will apply it everywhere and include necessary refactoring for multiple array types.

@Ghabry Ghabry added this to the 0.6.3 milestone Sep 1, 2020
@Ghabry Ghabry merged commit b76e739 into EasyRPG:master Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants