Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider opening files by mmapping a read only file into memory #212

Open
NinnOgTonic opened this issue Feb 1, 2015 · 6 comments
Open

Comments

@NinnOgTonic
Copy link

By mmapping files into memory you can optimise the IO greatly in many cases.

I wonder if there is there any reason not to do this?

@NinnOgTonic NinnOgTonic changed the title Consider opening files by mmapping a R Consider opening files by mmapping a read only file into memory Feb 1, 2015
@allinurl
Copy link
Owner

allinurl commented Feb 2, 2015

I can certainly look into this, sounds like it could give a good performance boost.

@aphorise
Copy link

I'd express that there are more reasons to not do this or consider it - particularly as logs are not processed by sample or isolated discreet sets. Furthermore, logs - in most production and practical cases will inevitably exceed available memory limits of the machine (arguable time too beyond a certain entropy).

So some recent (retrospect) logs have demonstrated themselves exceeding 20``Gbytes & at even 25% memory requirement to process + 20``Gbytes RAM to store - we can already see the unreasonable memory / space requirements.

Also with fixed / deterministic matrices it can become more a waste to map to memory for a single process / pass / phase read. Am I correct in thinking that this is currently the case? & that logs are in fact parsed in linear order and read from start-to-end without repositioning the read for another read or re-read?

The only cases I can imagine this being helpful are perhaps in cases of subjective or none identified determinants of complex matrices that can only be extrapolated with deeper comparative re-reads / re-parsing of the log.

@allinurl is this particular request still a serious consideration for future releases before Q3 of 2016? - if not then IMO this ticket ought to be closed.

@NinnOgTonic
Copy link
Author

This issue was simply a enhancement request which fit my use case where i have a large array of <2GB files which has to be parsed individually.

I believe that you might be right in you case that +20GB logs are not to be parsed this way, but perhaps this could be used below a given threshold if it is proving to be valuable in some use cases?

@aphorise
Copy link

So why not just store your log in memory as a mounted ramfs ? I am assuming you have the memory for it & are on a Linux / Unix OS?

@NinnOgTonic
Copy link
Author

Great input, I will consider that for further development. Though I am not sure if this might be both simpler to have in goaccess and also might fit other use cases better.

@allinurl
Copy link
Owner

I implemented a prototype for this request a while back, however, for some reason, performance wasn't great. Though, I have to admit that I did not look into the details of it. As @aphorise mentioned, there are probably better ways of handling this, so it's not top priority right now.

As soon as I have a chance, I'll push a quick implementation of this to a different branch where you can test it and see if it's worth adding it as a build option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants