New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SpooledTemporaryFile in MemoryFS? #307
Comments
If you know you're going to be creating files larger than available memory, why not use a TempFS in the first place? IMHO given the way this would change the operation of MemoryFS, if it's added it should default to |
In my case, unit tests. If you've got 1000+ unit tests, holding as much as you can in RAM makes a noticeable difference in performance. If only a few overrun RAM, then you've got to use TempFS for everything and the performance hit adds up. Theoretically in our case we could use multiple fixtures but that's going to add complexity and more cognitive load on devs.
I completely agree. Fortunately,
Allow suffixes? Like default unit is bytes but the user can use "128K" instead of "131072" which is far more obvious to someone looking at the code. A little bit more work for us but I think it'd be worth it. |
Hmmm? The docs say "This function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds max_size", and .... Hmm, although checking the source, it seems to imply otherwise https://github.com/python/cpython/blob/master/Lib/tempfile.py#L650 Also, https://github.com/python/cpython/blob/master/Lib/tempfile.py#L748 does a comparison directly against
IMHO adding suffixes to allow specifying filesize might be a bit of a slippery slope - would we need to allow each of "128K", "128KB", "128k", "128kb", "128Kb", etc? I was suggesting a .... Although on second thoughts, I guess this still wouldn't work if the MemoryFS constructor specified a max_size of 5MB, and then you went ahead and created a million 4MB files? Perhaps the ideal behaviour would be for MemoryFS to track the size of its entire FS, and then do a "similar trick" to SpooledTemporaryFile to start converting its BytesIO objects to TemporaryFile objects when the entire FS's max_size got exceeded. Perhaps a SpooledMemoryFS subclass of MemoryFS makes sense? 😉 |
Rrrgh crap you're right. I'd still like to think this would be helpful, but yeah, good point.
I think case-insensitivity would make sense, but we'd only allow
That sounds really hard to do accurately, though I think you might be onto something there. |
It's not a bad idea. I probably wouldn't extend MemoryFS. It may be tidier to have another filesystem with this functionality. That said, I'm not entirely convinced its necessary. If you were to use a TempFS the OS should cache most of it in memory, it may not even touch the physical disk at all. So MemoryFS might not be much faster than TempFS. It's worth doing a little profiling before implementing this. @dargueta Are you able to switch your MemoryFS objects to TempFS objects and compare performance? |
I'll see what I can do. It's pedal to the metal at work at the moment so probably not for some time. |
Just realized #308 might've been an exacerbating factor. |
I'm going to close this since it seems to have too many holes in the idea. Might think on it some more, but I also don't want to leave a ticket hanging around. |
Currently MemoryFS maintains all files as
io.BytesIO
streams, which works fine until you have a lot of files or some particularly large ones. Eventually your code runs out of memory and crashes.To alleviate this, I was wondering if we could modify
MemoryFS
to usetempfile.SpooledTemporaryFile
which starts off as aBytesIO
, but once it reaches a predefined it size gets flushed out to an actual temporary file on the disk. The change won't have any effect on performance for files under the limit we choose, and it'll allow a MemoryFS file system to scale better.The maximum file size is configurable so I think we can either hard-code a sane limit (say 128 MiB or whatever) or allow the user to pass one in as an argument to the opener.
Thoughts?
The text was updated successfully, but these errors were encountered: