Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v10 MAP_POPULATE speed up possibile explanation #1

Open
Pluvie opened this issue Apr 6, 2024 · 3 comments
Open

v10 MAP_POPULATE speed up possibile explanation #1

Pluvie opened this issue Apr 6, 2024 · 3 comments

Comments

@Pluvie
Copy link

Pluvie commented Apr 6, 2024

Hi @Theldus!

First of all let me thank you for the very much enjoyable read that you gave me with this repo.

You did an excellent write-up and it was very nice to see all the changes discussed and their improvements!

Speaking of that, I have an idea on what is causing the speed up on v10, with the removal of the MAP_POPULATE flag.

Taken from man page of mmap:

MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file
mapping, this causes read-ahead on the file. This will
help to reduce blocking on page faults later.

Since you are mapping a file, the MAP_POPULATE flag caused a prefault for the entire memory block. It is true that the memory will eventually be completely faulted (since the file will be read wholly), however the actual read is performed in threads, while the MAP_POPULATE flag happens in the main thread only.

So apparently, the concurrency of the page fault happening in the threads is causing the speedup.

What do you think about that?
Thanks again, wish you a wonderful day!

@Theldus
Copy link
Owner

Theldus commented Apr 7, 2024

Hi @Pluvie, thank you very much,

There are faster implementations than mine, but I gave my best in this challenge, had a lot of fun, and also tried to document what I did, both for myself and for others, and that's what matters to me after all.

Let me see if I understood: you suggest that the old approach (with MAP_POPULATE) has a bottleneck/overhead since the 'prefault/read-ahead' only occurs in the main thread?

The reason I used MAP_POPULATE was because I saw other codes also using it and thought it could be useful, but considering that other threads already read the file concurrently, perhaps there are really no gains and this feature might hinder more than help.

Anyway, there isn't much documentation on how MAP_POPULATE works (which is a shame, and honestly makes it less useful...).

But thank you very much for the feedback and I'm glad you liked the repo =).

@Pluvie
Copy link
Author

Pluvie commented Apr 8, 2024

Let me see if I understood: you suggest that the old approach (with MAP_POPULATE) has a bottleneck/overhead since the 'prefault/read-ahead' only occurs in the main thread?

Yes exactly. As far as I understand, when you call mmap, without the MAP_POPULATE flag, the Linux kernel will just map the physical memory to the process virtual address space, but will not do any actual phyisical memory allocation. This is a feature of the Linux kernel, which basically allocates memory only when the process reads or writes to it -- also known as "page faulting".

As you said, though, these features are not well documented. Also, in my opinion, programming with the kernel / syscalls is much less fun than programming with the CPU / hardware: it makes all the experience more obscure. Sometimes I wish we could just skip the OS and tap directly all the bare metal power 😄

@Theldus
Copy link
Owner

Theldus commented Apr 8, 2024

... This is a feature of the Linux kernel, which basically allocates memory only when the process reads or writes to it -- also known as "page faulting".

Yes, I know... only the obscurity of MAP_POPULATE left me in doubt...

As you said, though, these features are not well documented. Also, in my opinion, programming with the kernel / syscalls is much less fun than programming with the CPU / hardware: it makes all the experience more obscure.

Definitely... at least on Linux there is still the possibility of looking at the source and getting some idea of what is going on, or at least asking someone more experienced (who will certainly do the same).

... Sometimes I wish we could just skip the OS and tap directly all the bare metal power 😄

It is possible... just... not trivial. x86_64 is quite complicated to deal with multi-core and so on on bare metal, maybe one day I'll try to do something like that, just for fun.

However, there may not even be that much of a performance gain, the Linux kernel is surprisingly very optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants