v10 MAP_POPULATE speed up possibile explanation #1

Pluvie · 2024-04-06T23:05:37Z

First of all let me thank you for the very much enjoyable read that you gave me with this repo.

You did an excellent write-up and it was very nice to see all the changes discussed and their improvements!

Speaking of that, I have an idea on what is causing the speed up on v10, with the removal of the MAP_POPULATE flag.

Taken from man page of mmap:

MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file
mapping, this causes read-ahead on the file. This will
help to reduce blocking on page faults later.

Since you are mapping a file, the MAP_POPULATE flag caused a prefault for the entire memory block. It is true that the memory will eventually be completely faulted (since the file will be read wholly), however the actual read is performed in threads, while the MAP_POPULATE flag happens in the main thread only.

So apparently, the concurrency of the page fault happening in the threads is causing the speedup.

What do you think about that?
Thanks again, wish you a wonderful day!

The text was updated successfully, but these errors were encountered:

Theldus · 2024-04-07T01:44:47Z

Hi @Pluvie, thank you very much,

There are faster implementations than mine, but I gave my best in this challenge, had a lot of fun, and also tried to document what I did, both for myself and for others, and that's what matters to me after all.

Let me see if I understood: you suggest that the old approach (with MAP_POPULATE) has a bottleneck/overhead since the 'prefault/read-ahead' only occurs in the main thread?

The reason I used MAP_POPULATE was because I saw other codes also using it and thought it could be useful, but considering that other threads already read the file concurrently, perhaps there are really no gains and this feature might hinder more than help.

Anyway, there isn't much documentation on how MAP_POPULATE works (which is a shame, and honestly makes it less useful...).

But thank you very much for the feedback and I'm glad you liked the repo =).

Pluvie · 2024-04-08T07:41:26Z

Let me see if I understood: you suggest that the old approach (with MAP_POPULATE) has a bottleneck/overhead since the 'prefault/read-ahead' only occurs in the main thread?

Yes exactly. As far as I understand, when you call mmap, without the MAP_POPULATE flag, the Linux kernel will just map the physical memory to the process virtual address space, but will not do any actual phyisical memory allocation. This is a feature of the Linux kernel, which basically allocates memory only when the process reads or writes to it -- also known as "page faulting".

As you said, though, these features are not well documented. Also, in my opinion, programming with the kernel / syscalls is much less fun than programming with the CPU / hardware: it makes all the experience more obscure. Sometimes I wish we could just skip the OS and tap directly all the bare metal power 😄

Theldus · 2024-04-08T11:26:34Z

... This is a feature of the Linux kernel, which basically allocates memory only when the process reads or writes to it -- also known as "page faulting".

Yes, I know... only the obscurity of MAP_POPULATE left me in doubt...

As you said, though, these features are not well documented. Also, in my opinion, programming with the kernel / syscalls is much less fun than programming with the CPU / hardware: it makes all the experience more obscure.

Definitely... at least on Linux there is still the possibility of looking at the source and getting some idea of what is going on, or at least asking someone more experienced (who will certainly do the same).

... Sometimes I wish we could just skip the OS and tap directly all the bare metal power 😄

It is possible... just... not trivial. x86_64 is quite complicated to deal with multi-core and so on on bare metal, maybe one day I'll try to do something like that, just for fun.

However, there may not even be that much of a performance gain, the Linux kernel is surprisingly very optimized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v10 MAP_POPULATE speed up possibile explanation #1

v10 MAP_POPULATE speed up possibile explanation #1

Pluvie commented Apr 6, 2024

Theldus commented Apr 7, 2024

Pluvie commented Apr 8, 2024

Theldus commented Apr 8, 2024

v10 MAP_POPULATE speed up possibile explanation #1

v10 MAP_POPULATE speed up possibile explanation #1

Comments

Pluvie commented Apr 6, 2024

Theldus commented Apr 7, 2024

Pluvie commented Apr 8, 2024

Theldus commented Apr 8, 2024