Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Savestates: Use LZ4 algorithm for faster decompression #12217

Merged
merged 2 commits into from Oct 10, 2023

Conversation

malleoz
Copy link
Contributor

@malleoz malleoz commented Oct 1, 2023

Compared to LZO, LZ4 has near-identical compression ratio and compression time. However, LZ4 boasts substantial decompression time improvement. While compression time doesn't matter much for TASers, repeated savestate loading requires decompression to be fast, in order to improve efficiency during TAS sessions. While their website shows benchmarks that produce a 67% speedup in decompression time, in practice I found that for savestates it produces typical speedups of about 72%. This improvement is larger when the Internal Resolution is turned up higher, as this increases the typical savestate filesize.

LZ4 can compress at most LZ4_MAX_INPUT_SIZE ~= 2.11GB at a time. To handle the upper-bound, I compress LZ4_MAX_INPUT_SIZE at a time, looping each time so long as we have more bytes to compress. This scenario is very unlikely to affect anyone, as I had to increase my Internal Resolution to about 20x for this to occur on my end.

Note that Netplay also uses LZO for packet transfer of the memcard between players. Given that I have no experience with this feature, and to prevent possible regression from limited testing due to my lack of experience, I've chosen to leave Netplay untouched. Note that if we change Netplay to use LZ4, then the LZO dependency can be removed from the project.

Support has been kept for forwards/backwards compatibility. These changes support reading the version info for legacy LZO savestates in order to show the user what version the incompatible state was made on. Loading new LZ4 states on older versions of Dolphin results in the version string still being shown to the user in an error message.

These changes aim to future-proof the ability the add more uncompressed data to the header. We can adjust the size of StateExtendedHeader by incrementing StateExtendedBaseHeader.payload_offset. Thus we can support the serialization of any new fields, including char arrays.

@AdmiralCurtiss
Copy link
Contributor

Please use a submodule for lz4 and break out adding it into its own commit.

@AdmiralCurtiss
Copy link
Contributor

(I guess I could be convinced to have lz4 as not-submodule if you reduce it down to just the source + license without any of the extra files, it's pretty small and doesn't update all that often. But definitely break it out to its own commit.)

@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

(I guess I could be convinced to have lz4 as not-submodule if you reduce it down to just the source + license without any of the extra files, it's pretty small and doesn't update all that often. But definitely break it out to its own commit.)

There's a lot of tinkering in regards to the CMakeLists.txt that I wouldn't say I'm comfortable safely trimming excess code from, so I will adjust this to be a submodule.

@Pokechu22
Copy link
Contributor

Though Dolphin stores the emulator version in the savestate, so as to disallow savestate loads across different versions, the version is stored in the COMPRESSED part of the savestate. Thus, changing the compression algorithm means that this will not be read before decompression is attempted.

Does this apply to the STATE_VERSION field?

// Don't forget to increase this after doing changes on the savestate system
constexpr u32 STATE_VERSION = 162; // Last changed in PR 11767

I'd hope that that's in an uncompressed header since that's the main thing that determines incompatibility (if the exact emulator version doesn't match it'll still try to load, but if the state version doesn't match it'll reject it). I'm not sure if it actually is or not though.

@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

Though Dolphin stores the emulator version in the savestate, so as to disallow savestate loads across different versions, the version is stored in the COMPRESSED part of the savestate. Thus, changing the compression algorithm means that this will not be read before decompression is attempted.

Does this apply to the STATE_VERSION field?

// Don't forget to increase this after doing changes on the savestate system
constexpr u32 STATE_VERSION = 162; // Last changed in PR 11767

I'd hope that that's in an uncompressed header since that's the main thing that determines incompatibility (if the exact emulator version doesn't match it'll still try to load, but if the state version doesn't match it'll reject it). I'm not sure if it actually is or not though.

The check against STATE_VERSION is performed after decompression occurs sadly. Although I guess I might as well update the state version in this PR as well, for posterity's sake.

@malleoz malleoz force-pushed the savestate_lz4 branch 6 times, most recently from 42c3820 to 4d31bd4 Compare October 2, 2023 06:19
@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

Sorry for the hectic rebases. OK, this is now a submodule, split into a separate commit. I've swapped from new buffer allocs to make_unique, and I've split the compression/decompression into separate functions. So I should be up-to-date on the current feedback!

@Pokechu22
Copy link
Contributor

Hmm. I'd prefer it if there was a backwards-compatible way of doing things (in that trying to load the savestate on an older version gives a suitable error message).

I think this could theoretically be done by either LZO-compressing only the version number or by disabling compression on the version number entirely (since a size of 0 in the header is treated as not compressed). I think we could have whatever data we want after it detects an incorrect version, but I'm not 100% sure. There are two reserved fields in the header, so it would be possible to use them for a new format identifier or something. I'm not entirely sure what that would look like and it would require testing against older versions to make sure things work; maybe the format this PR currently uses gives a good enough error message.

One other thing: Dolphin currently includes Zstd since the rvz format uses it. This is supposed to be very fast for decompression; it might be better to use it instead of adding a new algorithm. If LZ4 is faster or about the same in practice though then an extra dependency is probably fine.

@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

I think this could theoretically be done by either LZO-compressing only the version number or by disabling compression on the version number entirely (since a size of 0 in the header is treated as not compressed).

If I understand correctly, lzo decompresses in chunks of 65536 bytes. To retrieve the version number you'd have to decompress that entire block, which seems wasteful.

There are two reserved fields in the header, so it would be possible to use them for a new format identifier or something. I'm not entirely sure what that would look like and it would require testing against older versions to make sure things work; maybe the format this PR currently uses gives a good enough error message.

Storing the revision in the header would be a good idea, but there'd definitely need to be discussion on how we would define this.

One other thing: Dolphin currently includes Zstd

They are in the same family of compression algorithms, but lz4 outperforms zstd in terms of decompression speeds. I don't have exact metrics, but here's one source: https://indico.fnal.gov/event/16264/contributions/36466/attachments/22610/28037/Zstd__LZ4.pdf

@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

Rebased because I realized I had applied CMake-related changes in the second commit accidentally. This should be properly separated now.

@JosJuice
Copy link
Member

JosJuice commented Oct 2, 2023

I think we could have whatever data we want after it detects an incorrect version, but I'm not 100% sure.

That should be possible, but make sure it's after the version string, not just after the version number.

If I understand correctly, lzo decompresses in chunks of 65536 bytes. To retrieve the version number you'd have to decompress that entire block, which seems wasteful.

Then you could disable compression. The part of the file that needs to be readable by old Dolphin versions is only 20 bytes or so.

CMake/FindLZ4.cmake Outdated Show resolved Hide resolved
@AdmiralCurtiss
Copy link
Contributor

AdmiralCurtiss commented Oct 2, 2023

In case you missed this in the conversation you resolved, your CMake definitions are a bit wrong in general and don't work correctly with a system-installed lz4, I've fixed this here: AdmiralCurtiss@8f82639

@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

In case you missed this in the conversation you resolved, your CMake definitions are a bit wrong in general and don't work correctly with a system-installed lz4, I've fixed this here: AdmiralCurtiss@8f82639

Thank you immensely for doing this. I'll apply these changes on my next rebase

@malleoz malleoz force-pushed the savestate_lz4 branch 3 times, most recently from 17290b4 to 745d1d2 Compare October 2, 2023 20:47
@malleoz
Copy link
Contributor Author

malleoz commented Oct 2, 2023

Updated my commits as follows:

  • Apply changes from @AdmiralCurtiss to fix CMake
  • Tuck compression/decompression logic inside helper functions, who loop while we still have bytes to compress/decompress
  • Migrate to using a different field in the header for size, leaving the previous field zero'd out
  • Leave version info uncompressed
    • Because the old size field is now zero'd out, old Dolphin will treat these new savestates as uncompressed. Thus it will read in the entire file, including the version info. Loading a new savestate format on older Dolphin versions will detect version mismatch and bail out.

I guess the last thing to debate is if we want to reserve empty space after the version info for future changes... I believe we don't need to do this. Any new data occurring after the version info and before the compressed data would result in the following:

  • lz4 tries to decompress starting at this new data
  • the compression fails, so we bail out
  • The buffer has been written to, so in State::DoState, the version check will occur
  • The version check will fail, so the state will not load

@malleoz malleoz force-pushed the savestate_lz4 branch 3 times, most recently from c395b8c to 9b75a46 Compare October 4, 2023 05:10
@malleoz
Copy link
Contributor Author

malleoz commented Oct 4, 2023

Alrighty, I've appended the version information to the original header. Note that because the version string is a char*, some arithmetic on reading/writing the header is uglier than before. I've added the extended header, factoring in future possibility of adjusting the payload_offset for more uncompressed data. I've tested the following scenarios which all work fine:

  • Loading super old savestates will access s_old_versions and correctly display the version range
  • Loading savestates with a different version string will error, showing the user that version string
  • Loading current savestates (compressed/uncompressed) succeed, including scenarios where we exceed LZ4_MAX_INPUT_SIZE
  • Loading current savestates on older versions of Dolphin shows the version string

Note that, like I said in my previous comment, I do not include the compressed size in the extended header. Instead I continuously read bytes so long as we have not read uncompressed_size bytes.

I appreciate any feedback, as I imagine with changes this large, there are a few improvements I could make that I don't immediately see.

Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.h Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Show resolved Hide resolved
@malleoz
Copy link
Contributor Author

malleoz commented Oct 4, 2023

Updated to include @AdmiralCurtiss latest suggestions.

  • Allow compressing/decompressing beyond 4GB by using u64 instead of u32
  • Validate we are serializing a 4 byte int for the length of each payload
  • Fixed illegal fmt expression
  • Broke headers into sub-structs to more clearly be able to write each header and determine flattened data length
  • Add bounds checking prior to doing some memcpy operations
  • Add more checks for failing to read from file
  • Tweak error messages to be a bit more helpful (differentiate between file not found and no more bytes to read)

@malleoz
Copy link
Contributor Author

malleoz commented Oct 4, 2023

Updated to remove <memory> import since we changed the structs to not store a unique_ptr anymore

Copy link
Contributor

@AdmiralCurtiss AdmiralCurtiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of small things... This is looking pretty good in terms of general structure to me now though.

Source/Core/Core/State.h Outdated Show resolved Hide resolved
Source/Core/Core/State.h Outdated Show resolved Hide resolved
Source/Core/Core/State.h Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
@malleoz
Copy link
Contributor Author

malleoz commented Oct 7, 2023

Updated to address all suggestions that I marked as resolved! Thanks for the thorough review. I left a reply to clarify the remaining comments from @AdmiralCurtiss

@malleoz
Copy link
Contributor Author

malleoz commented Oct 7, 2023

Updated to remove the unnecessary comment I had left. I guess the only thing I'd like to clear up that remains is whether to keep the buffer_data pointer in CompressAndDumpState() as const. I think it makes sense to store it as const and just use reinterpret_cast<char*>(const_cast<u8*>(raw_buffer)) but I don't know if this is best practice or not.

Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
Source/Core/Core/State.cpp Outdated Show resolved Hide resolved
@malleoz
Copy link
Contributor Author

malleoz commented Oct 9, 2023

Updated again! Thank you for the continued reviews. All suggestions should now be handled. Also I changed a few error messages from Core::DisplayMessage to PanicAlertFmtT. I believe anything pertaining to file read/parsing issues should use PanicAlertFmtT as this grabs the user's attention more to indicate savestate issues, although I'm not sure if this is the consensus.

Copy link
Contributor

@AdmiralCurtiss AdmiralCurtiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine now.

@crashGG
Copy link

crashGG commented Mar 24, 2024

After many years, updating zstd to the latest upstream version will become attractive enough. According to official data, the encoding and decoding performance of the latest version of zstd has been objectively improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants