Full ram disk cache for ISO decompression #1213
Comments
|
Do you think there is any relevant speed gain? I tested once with iso's on a ram disk vs a sata hdd and haven't noticed a difference. |
|
Depends on the algo (gzip vs 7z), and the size of the block. It will allow to use massive block, 32MB/128MB. Well the idea is to have a big cache. The ram disk part isn't important by itself, I guess a temporary file could work too. However it might be easier to handle a file in the ram disk. You can easily check the memory of the computer (just read a global variable). File will automatically deleted and RAM free-ed when you quit the application. |
|
I don't know how/if the CSO implementation uses caching, but for gzip, the caching config is 200M in chunks of (up to - in case end of data) 256K, and it's searched linearly where each cache hit moves the chunk to the top of the list (MRU). So I think cache misses are relatively expensive with 800 searches (and misses). Though that's most probably still way faster than actual disk access. The config is here: https://github.com/PCSX2/pcsx2/blob/master/pcsx2/CDVD/GzippedFileReader.h#L24 . GZFILE_READ_CHUNK_SIZE is the amount we're decompressing at each zlib access - and that's also the cache chunk size. As for decompressing once and storing the entire data, be it at the heap or ramdisk, the problem with this IMO is that it can take quite a lot of time to decompress the entire file. To get a rough estimation - that's the duration it takes to create the gz index (since the entire file is decompressed and memory snapshots of the decompressor are saved at regular intervals - that's the index). |
Well, bigger block size helps compression ratio, but greatly hurts random access since you typically have to decompress a full block for any data in it. Also, this is, IMO, orthogonal to the cache size. We could increase the cache easily to 512M (maybe even 1G) without any negative consequences IMO - just change the config I pointed to earlier. If it turns out measurably slower due to the linear search, then we could use hashes or some other method to speed the search. |
|
Thanks for the info. Well it isn't mandatory to unzip the full file at once. You can still unzip the used block the first read. The only difference is that cache will be infinite and directly mapped. So you never trash the content (except if you restart a new game/emulator). Actually one question, why uncompressed data aren't stored in a temp file? A bigger cache means less memory for GSdx texture cache (or others part of the emulator that need cache too). Some games still requires too much memory (well could be VRAM limit too) :( The shared memory trick allow to use the physical memory without any impact on the virtual memory. Got the idea from Nvidia driver changelog
|
|
It would be much more viable if we had only CD sized games. But typically, a PS2 game is 2 to 4GB big. So yea, such a system would have to copy/decompress into a (several) file(s) in RAM in the background. The PCSX2 file reader would first look in that cache and fetch from the source file directly, if it's not already in there. In that case the background service should stop prefetching for a short while, to keep disk access latency normal. Generally, this can be a good thing for smoother gameplay. |
|
Note: unzip the full iso was a possibility but it isn't mandatory. The global idea was to have a very huge cache (from 1GB to 8GB) without required extra virtual memory. |
|
What about other supported compressed formats? (like cso) Don't you have to emulate delays from cd seeking anyway, and doesn't that negate any regular speed gains (except for compressed isos i guess, where i'd guess it's a cpu drain if the game streams data)? |
|
Anecdotal evidence here: I was reading some threads about Vulkan support because I wanted to try and get better performance on Juiced. I had the idea of putting both PCSX2 and my .iso file into a ramdisk when I read that PCSX2 wouldn't benefit from 64 bit words because of the amount of time spent on I/O, so I tried it.
When I first created the ramdisk on Windows with ImDisk, I copied in the entire emulator folder (totaling 4.29GB) and it immediately started at 800MB/s and finished within a few seconds. So, it will be a manual process, and your memory card file should be somewhere neutral outside of what you copy into the ramdisk, or you will lose your files if you forget to copy them back out after you save. It might just be something that the user has to enable and manage, with a hearty cautionary statement up front, if they're playing any of the problematic games or are on some specific hardware combination that would benefit from what I'm about to explain.
Again, anecdotal evidence, but here is what I noticed with Juiced. Now, I understand this is one of the tougher isos to run. I tuned the settings back when I first began and got it very usable, but I usually noticed lag spikes and associated slowdowns when using nitrous. After copying everything (including the emulator) into the ramdisk, I didn't see any general speed increases, not even in the loading screens which is a little interesting, but I noticed there were far fewer lag spikes! I believe PCSX2 is still running slightly below full speed, which didn't really appear to change, but the removal of the lag spikes was a nice improvement. After typing this out, I think I may be able to squeeze out a little more power by re-tuning the graphics settings and making a profile for ramdisk usage. It could behave differently and benefit from different settings now that the I/O speed has massively increased. I also don't know if anyone has tried tuning the settings while running in a ramdisk. nor do I know if there is any actual way for the emulator to benefit from this. It doesn't seem like there are any people out there other than those in this thread who have even thought of the idea, but I just wanted to report in on what I saw when I tried it. If I remember, I'll try to tune the settings sometime and report back if I can get better performance than I could the last time I made adjustments. |
As far as I understand, ISO blocks are decompressed in a small cache (< 100MB). When the cache is full the decompressed data are deleted. The cache must remain small due to the limited virtual address space of 32 bits application. It is annoying because decompression could be costly.
On a 64 bits OS, extra physical RAM is often available (50% of the market have 8GB+). The idea will be to create a ram disk file partially memory mapped. On linux, you can create a pseudo file like
/dev/shm/PCSX2.iso. Disclaimer, potentially the file is limited to 4GB, so it might need 2 files for big ISO.When the block N is read we check the status of the block
I suspect that mmap is rather fast so we can potentially keep a single block for the cache, or at least reduce the size of the cache.
Decompression will be done once this way. Note, it would be possible to support a full decompression at startup to avoid index generation. Maybe something in background thread, so you can read the start of ISO waiting the unzip of the end.
@avih @ramapcsx2 any thought?
The text was updated successfully, but these errors were encountered: