Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full ram disk cache for ISO decompression #1213

Open
gregory38 opened this issue Mar 3, 2016 · 9 comments
Open

Full ram disk cache for ISO decompression #1213

gregory38 opened this issue Mar 3, 2016 · 9 comments

Comments

@gregory38
Copy link
Contributor

@gregory38 gregory38 commented Mar 3, 2016

As far as I understand, ISO blocks are decompressed in a small cache (< 100MB). When the cache is full the decompressed data are deleted. The cache must remain small due to the limited virtual address space of 32 bits application. It is annoying because decompression could be costly.

On a 64 bits OS, extra physical RAM is often available (50% of the market have 8GB+). The idea will be to create a ram disk file partially memory mapped. On linux, you can create a pseudo file like /dev/shm/PCSX2.iso. Disclaimer, potentially the file is limited to 4GB, so it might need 2 files for big ISO.
When the block N is read we check the status of the block

  • not valid => uncompress the data, unmmap old block, and mmap the new block
  • valid but not mapped => unmap the old block, and mmap the new block
  • valid and mapped => nothing to do

I suspect that mmap is rather fast so we can potentially keep a single block for the cache, or at least reduce the size of the cache.

Decompression will be done once this way. Note, it would be possible to support a full decompression at startup to avoid index generation. Maybe something in background thread, so you can read the start of ISO waiting the unzip of the end.

@avih @ramapcsx2 any thought?

@willkuer
Copy link
Contributor

@willkuer willkuer commented Mar 3, 2016

Do you think there is any relevant speed gain? I tested once with iso's on a ram disk vs a sata hdd and haven't noticed a difference.

@gregory38
Copy link
Contributor Author

@gregory38 gregory38 commented Mar 3, 2016

Depends on the algo (gzip vs 7z), and the size of the block. It will allow to use massive block, 32MB/128MB. Well the idea is to have a big cache. The ram disk part isn't important by itself, I guess a temporary file could work too. However it might be easier to handle a file in the ram disk. You can easily check the memory of the computer (just read a global variable). File will automatically deleted and RAM free-ed when you quit the application.

@avih
Copy link
Member

@avih avih commented Mar 3, 2016

I don't know how/if the CSO implementation uses caching, but for gzip, the caching config is 200M in chunks of (up to - in case end of data) 256K, and it's searched linearly where each cache hit moves the chunk to the top of the list (MRU). So I think cache misses are relatively expensive with 800 searches (and misses). Though that's most probably still way faster than actual disk access. The config is here: https://github.com/PCSX2/pcsx2/blob/master/pcsx2/CDVD/GzippedFileReader.h#L24 . GZFILE_READ_CHUNK_SIZE is the amount we're decompressing at each zlib access - and that's also the cache chunk size.

As for decompressing once and storing the entire data, be it at the heap or ramdisk, the problem with this IMO is that it can take quite a lot of time to decompress the entire file. To get a rough estimation - that's the duration it takes to create the gz index (since the entire file is decompressed and memory snapshots of the decompressor are saved at regular intervals - that's the index).

@avih
Copy link
Member

@avih avih commented Mar 3, 2016

Depends on the algo (gzip vs 7z), and the size of the block. It will allow to use massive block, 32MB/128MB. Well the idea is to have a big cache.

Well, bigger block size helps compression ratio, but greatly hurts random access since you typically have to decompress a full block for any data in it.

Also, this is, IMO, orthogonal to the cache size. We could increase the cache easily to 512M (maybe even 1G) without any negative consequences IMO - just change the config I pointed to earlier. If it turns out measurably slower due to the linear search, then we could use hashes or some other method to speed the search.

@gregory38
Copy link
Contributor Author

@gregory38 gregory38 commented Mar 3, 2016

Thanks for the info.

Well it isn't mandatory to unzip the full file at once. You can still unzip the used block the first read. The only difference is that cache will be infinite and directly mapped. So you never trash the content (except if you restart a new game/emulator). Actually one question, why uncompressed data aren't stored in a temp file?

A bigger cache means less memory for GSdx texture cache (or others part of the emulator that need cache too). Some games still requires too much memory (well could be VRAM limit too) :( The shared memory trick allow to use the physical memory without any impact on the virtual memory.

Got the idea from Nvidia driver changelog

Added a new system memory allocation mechanism for large allocations in the OpenGL driver. This
 mechanism allows unmapping the allocation from the process when it is not in use, making more
 virtual address space available to the application. It is enabled by default on 32 bit OpenGL
 applications with Linux 3.11+ and glibc 2.19+. Memory allocated this way will consume space in /dev/shm.
@ramapcsx2
Copy link
Member

@ramapcsx2 ramapcsx2 commented Mar 14, 2016

It would be much more viable if we had only CD sized games. But typically, a PS2 game is 2 to 4GB big.
Even a low priority background service would still hog the I/O system for minutes.
We also don't know if the user is going to play for a while, or if he's just testing a couple different games.
We can solve this by making the cache/ramdisk optional though.

So yea, such a system would have to copy/decompress into a (several) file(s) in RAM in the background. The PCSX2 file reader would first look in that cache and fetch from the source file directly, if it's not already in there. In that case the background service should stop prefetching for a short while, to keep disk access latency normal.

Generally, this can be a good thing for smoother gameplay.

@gregory38
Copy link
Contributor Author

@gregory38 gregory38 commented Mar 21, 2016

Note: unzip the full iso was a possibility but it isn't mandatory.

The global idea was to have a very huge cache (from 1GB to 8GB) without required extra virtual memory.

@i30817
Copy link
Contributor

@i30817 i30817 commented Mar 21, 2016

What about other supported compressed formats? (like cso)

Don't you have to emulate delays from cd seeking anyway, and doesn't that negate any regular speed gains (except for compressed isos i guess, where i'd guess it's a cpu drain if the game streams data)?
Isn't this a bit too slow too (uncompressing the full game before booting)

@cyleleghorn
Copy link

@cyleleghorn cyleleghorn commented Jun 30, 2020

Anecdotal evidence here: I was reading some threads about Vulkan support because I wanted to try and get better performance on Juiced. I had the idea of putting both PCSX2 and my .iso file into a ramdisk when I read that PCSX2 wouldn't benefit from 64 bit words because of the amount of time spent on I/O, so I tried it.

We also don't know if the user is going to play for a while, or if he's just testing a couple different games.

When I first created the ramdisk on Windows with ImDisk, I copied in the entire emulator folder (totaling 4.29GB) and it immediately started at 800MB/s and finished within a few seconds. So, it will be a manual process, and your memory card file should be somewhere neutral outside of what you copy into the ramdisk, or you will lose your files if you forget to copy them back out after you save. It might just be something that the user has to enable and manage, with a hearty cautionary statement up front, if they're playing any of the problematic games or are on some specific hardware combination that would benefit from what I'm about to explain.

Do you think there is any relevant speed gain? I tested once with iso's on a ram disk vs a sata hdd and haven't noticed a difference.

Again, anecdotal evidence, but here is what I noticed with Juiced. Now, I understand this is one of the tougher isos to run. I tuned the settings back when I first began and got it very usable, but I usually noticed lag spikes and associated slowdowns when using nitrous. After copying everything (including the emulator) into the ramdisk, I didn't see any general speed increases, not even in the loading screens which is a little interesting, but I noticed there were far fewer lag spikes!

I believe PCSX2 is still running slightly below full speed, which didn't really appear to change, but the removal of the lag spikes was a nice improvement. After typing this out, I think I may be able to squeeze out a little more power by re-tuning the graphics settings and making a profile for ramdisk usage. It could behave differently and benefit from different settings now that the I/O speed has massively increased. I also don't know if anyone has tried tuning the settings while running in a ramdisk. nor do I know if there is any actual way for the emulator to benefit from this. It doesn't seem like there are any people out there other than those in this thread who have even thought of the idea, but I just wanted to report in on what I saw when I tried it. If I remember, I'll try to tune the settings sometime and report back if I can get better performance than I could the last time I made adjustments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.