diff --git a/04_Memory_Management/01_Overview.md b/04_Memory_Management/01_Overview.md index bbd6052d..5cbe95dc 100644 --- a/04_Memory_Management/01_Overview.md +++ b/04_Memory_Management/01_Overview.md @@ -43,11 +43,11 @@ Usually this is the lowest level of allocation, and only the kernel should acces ## Paging -Although Paging and VMM are strongly tied, let's split this topic into two parts: with paging we refer to the hardware paging mechanism, that usually involeves tables, and registers and address translation, while the VMM it refers to the higher level (usually architecture independant). +Although Paging and VMM are strongly tied, let's split this topic into two parts: with paging we refer to the hardware paging mechanism, that usually involves tables, and registers and address translation, while the VMM it refers to the higher level (usually architecture independent). While writing the support for paging, independently there are few future choices we need to think about now: -* Are we going to have a single or mulitple address spaces (i.e. every task will have its own address space)? If yes in this case we need to keep in mind that when mapping addresses we need to make sure they are done on the right Virtual Memory Space. So usually a good idea is to add an extra parameter to the mapping/unmapping functions that contains the pointer to the root page table (for _x86\_64 architecture is the PML4 table). +* Are we going to have a single or multiple address spaces (i.e. every task will have its own address space)? If yes in this case we need to keep in mind that when mapping addresses we need to make sure they are done on the right Virtual Memory Space. So usually a good idea is to add an extra parameter to the mapping/unmapping functions that contains the pointer to the root page table (for _x86\_64 architecture is the PML4 table). * Are we going to support User and Supervisor mode? In this case we need to make sure that the correct flag is set in the table entries. ## VMM - Virtual Memory Manager @@ -71,7 +71,7 @@ Similarly to paging there are some things we need to consider depending on our f ## Heap Allocator -There is a disntiction to be made here, between the kernel heap and the program heap. Many characteristic are similar between each other, although different algorithm can be used. +There is a distinction to be made here, between the kernel heap and the program heap. Many characteristic are similar between each other, although different algorithm can be used. Usually there is just one kernel heap, while every program will have its own userspace heap. - At least one per process/running program, and one for the kernel. @@ -99,7 +99,7 @@ char *a = alloc(5); What happens under the hood? 1. The alloc request the heap for pointer to an area of 5 bytes. -2. The heap allocator searches for a region big enough for 5 bytes, if available in the current heap. If so, no need to dig down further, just return what was found. However if the current heap doesn't contain an area of 5 bytes that can be returned, it will need to expand. So it asks for more space from the VMM. Remember: the *addresses returned by the heap are all virtual*. +2. The heap allocator searches for a region big enough for 5 bytes, if available in the current heap. If so, no need to dig down further, just return what was found. However, if the current heap doesn't contain an area of 5 bytes that can be returned, it will need to expand. So it asks for more space from the VMM. Remember: the *addresses returned by the heap are all virtual*. 3. The VMM will allocate a region of virtual memory big enough for the new heap expansion. It then asks the physical memory manager for a new physical page to map there. 4. Lastly a new physical page from the PMM will be mapped to the VMM (using paging for example). Now the VMM will provide the heap with the extra space it needed, and the heap can return an address using this new space. diff --git a/04_Memory_Management/02_Physical_Memory.md b/04_Memory_Management/02_Physical_Memory.md index fc37e388..599ba503 100644 --- a/04_Memory_Management/02_Physical_Memory.md +++ b/04_Memory_Management/02_Physical_Memory.md @@ -34,23 +34,23 @@ So marking a memory location as free or used is just matter of setting clearing ### Returning An Address -But how do we mark a page as taken or free? We need to translate row/column in an address, or the address in row/column. Let's assume that we asked fro a free page and we found the first available bit at row 0 and column 3, how we translate it to address, well for that we need few extra info: +But how do we mark a page as taken or free? We need to translate row/column in an address, or the address in row/column. Let's assume that we asked for a free page and we found the first available bit at row 0 and column 3, how we translate it to an address, well for that we need a few extra infos: -* The page size (we should know what is the size of the page we are using), Let's call it `PAGE_SIZE` -* How many bits are in a row (it's up to us to decide it, in this example we are using an unsigned char, but most probably in real life it is going to be a `uint32_t` for 32bit OS or `uint64_t` for 64bit os) let's call it `BITS_PER_ROW` +* The page size (we should know what the size of the page we are using is), Let's call it `PAGE_SIZE` +* How many bits are in a row (it's up to us to decide it, in this example we are using an unsigned char, but most probably in real life it is going to be a `uint32_t` for 32bit OS or `uint64_t` for 64bit OS) let's call it `BITS_PER_ROW` To get the address we just need to do: * `bit_number = (row * BITS_PER_ROW) + column` * `address = bit_number * PAGE_SIZE` -Let's pause for a second, and have a look at `bit_number`, what it represent? Maybe it is not straightforward what it is, but consider that the memory is just a linear space of consecutive addresses (just like a long tape of bits grouped in bytes), so when we declare an array we just reserve *NxSizeof(chosendatatype)* contiguous addresses of this space, so the reality is that our array is just something like: +Let's pause for a second, and have a look at `bit_number`, what it does it represent? Maybe it is not straightforward what it is, but consider that the memory is just a linear space of consecutive addresses (just like a long tape of bits grouped in bytes), so when we declare an array we just reserve *NxSizeof(chosendatatype)* contiguous addresses of this space, so the reality is that our array is just something like: | bit_number | 0 | 1 | 2 | ... | *8* | ... | 31 | *32* | ... | 63 | |------------|---|---|---|-----|-----|-----|----|------|-----|----| | \*bitmap | 1 | 1 | 1 | ... | *0* | ... | 0 | *0* | ... | 0 | -It just represent the offset in bit from `&bitmap` (the starting address of the bitmap). +It just represents the offset in bit from `&bitmap` (the starting address of the bitmap). In our example with *row=0 column=3* (and page size of 4k) we get: @@ -66,7 +66,7 @@ But what about the opposite way? Given an address compute the bitmap location? S $$bitmap_{location}=\frac{address}{4096}$$ -In this way we know the "page" index into an hypoteteical array of Pages. But we need row and columns, how do we compute them? That depends on the variable size used for the bitmap, let's stick to 8 bits, in this case: +In this way we know the "page" index into a hypothetical array of Pages. But we need row and columns, how do we compute them? That depends on the variable size used for the bitmap, let's stick to 8 bits, in this case: * The row is given by `bitmap_location / 8` * The column is given by: `bitmap_location % 8` diff --git a/04_Memory_Management/03_Paging.md b/04_Memory_Management/03_Paging.md index 2a20f966..4cca86f3 100644 --- a/04_Memory_Management/03_Paging.md +++ b/04_Memory_Management/03_Paging.md @@ -2,7 +2,7 @@ ## What is Paging? -Paging is a memory management scheme that introduces the concept of **_logical addresses_** (virtual address) and **_virtual memory_**. On x86_\* architectures this is achieved via hardware. Paging enables a layer of translation between virtual and physical addresses, and virtual and physical address spaces, as well as adding a few extra features (like access protection, priviledge level protection). +Paging is a memory management scheme that introduces the concept of **_logical addresses_** (virtual address) and **_virtual memory_**. On x86_\* architectures this is achieved via hardware. Paging enables a layer of translation between virtual and physical addresses, and virtual and physical address spaces, as well as adding a few extra features (like access protection, privilege level protection). It introduces a few new concepts that are explained below. @@ -18,7 +18,7 @@ These are the basic blocks of paging. Depending on the architecture (and request What are those directories and tables? Let's start from the tables: -* **Page Table** contains the information about a single page of memory, an entry in a page table represents the starting physical memory addresss for this page. +* **Page Table** contains the information about a single page of memory, an entry in a page table represents the starting physical memory address for this page. * **Page Directory** an entry in a page directory can point to depending on the page size selected: - another page directory - a page table @@ -33,7 +33,7 @@ Sometimes CR3 (although technically it's just the data from bits 12+) is referre ### Virtual (or Logical) Address -A virtual address is what a running program sees. Thats any program: a driver, user application or the kernel itself. +A virtual address is what a running program sees. That's any program: a driver, user application or the kernel itself. Sometime in the kernel, a virtual address will map to the same physical address, this scenario it is called `identity mapping`, but this is not always the case though, we can also have the same physical address that maps to different virtual addresses. @@ -41,16 +41,16 @@ A virtual address is usually a composition of entry numbers for each level of ta ![Address Translation](/Images/addrtranslation.png) -The _memory page_ in the picture refers to a physical memory page (the picture above doesn't refer to any existing hardware paging, is just an example scenario). Using logical address and paging, we can introduce a whole new address space that can be much bigger of the available physical memory. +The _memory page_ in the picture refers to a physical memory page (the picture above doesn't refer to any existing hardware paging, it is just an example scenario). Using logical addresses and paging, we can introduce a whole new address space that can be much bigger than the available physical memory. -For example we can have that: +For example, we can have that: ```c phys(0x123'456) = virt(0xFFF'F234'5235) ``` -Meaning that the virtual address `0xFFFF2345235` refers to the phyisical address `0x123456`. +Meaning that the virtual address `0xFFFF2345235` refers to the physical address `0x123456`. This mapping is usually achieved through the usage of several hierarchical tables, with each item in one level pointing to the next level table. As already mentioned above a virtual address is a composition of _entry numbers_ for each level of the tables. Now let's assume for example that we have 3 levels paging, 32 bits addressing and the address translation mechanism used is the one in the picture above, and we have the virtual address below: @@ -77,7 +77,7 @@ The above example is just an imaginary translation mechanism, we'll discuss the ## Paging in Long Mode In 64 bit mode we have up to 4 levels of page tables. The number depends on the size we want to assign to each page. -It's worth noting that newer cpus do support a feature called _la57_ (large addressing using 57-bits), this just adds another layer of page tables on top the existing 4 to allow for a larger address space. It's a cool feature, but not really required unless we're using crazy amounts of memory. +It's worth noting that newer CPUs do support a feature called _la57_ (large addressing using 57-bits), this just adds another layer of page tables on top the existing 4 to allow for a larger address space. It's a cool feature, but not really required unless we're using crazy amounts of memory. There are 3 possible scenarios: @@ -85,7 +85,7 @@ There are 3 possible scenarios: * 2Mib Pages: in this case we only need 3 page levels. * 1Gib Pages: Only 2 levels are needed. -To implement paging, is strongly reccomended to have already implemented interrupts too, specifically handling #PF (vector 0xd). +To implement paging, is strongly recommended to have already implemented interrupts too, specifically handling #PF (vector 0xd). The 4 levels of page directories/tables are: @@ -106,13 +106,13 @@ But before proceeding with the details let's see some of the characteristics com * The size of all table type is fixed and is 4k. * Every table has exactly 512 entries. * Every entry has the size of 64 bits. -* The tables have a hierarchy, and every item in a table higher in the hierachy point to a lower hierachy one (with some exceptions explained later). The page table points to a memory area. +* The tables have a hierarchy, and every item in a table higher in the hierarchy point to a lower hierarchy one (with some exceptions explained later). The page table points to a memory area. The hierarchy of the tables is: * PML4 is the root table (this is the one that is contained in the PDBR register) and is loaded for the actual address translation (see the next paragraph). Each of its entries point a PDPR table. * PDPR, the next level down. Each entry points to a single page directory. -* Page directory (PD): depending of the value of the PS bit (page size) an entry in this table can point to: +* Page directory (PD): depending on the value of the PS bit (page size) an entry in this table can point to: * a page table if the PS bit is clear (this means we are using 4k pages) * 2 MB memory area if the PS bit is set * Page table (PT): every entry in the page table points to a 4k memory page. @@ -123,7 +123,7 @@ In the following paragraphs we will have a look with more detail at how the pagi ### Loading the root table and enable paging -Until now we have explained how address translation works now let's see how the Root Table is loaded (in `x86_64` is PML4), this is done by loading the special register `CR3`, also known as `PDBR`, we introduced it at the beginning of the chapter, and is contents is basically the base address of our PML4 table. This can be easily done with two lines of assembly: +Until now, we have explained how address translation works now let's see how the Root Table is loaded (in `x86_64` is PML4), this is done by loading the special register `CR3`, also known as `PDBR`, we introduced it at the beginning of the chapter, and is content is basically the base address of our PML4 table. This can be easily done with two lines of assembly: ```x86asm mov eax, PML4_BASE_ADDRESS @@ -193,20 +193,20 @@ In the next section we will go through the fields of an entry. Below is a list of all the fields present in the table entries, with an explanation of the most commonly used. -* **P** (Present): If set this tells the CPU that this entry is valid, and can be used for translation. Otherwise translation stops here, and results in a page fault. -* **R/W** (Read/Write): Pages are always readable, setting this flag allows writing to memory via this virtual address. Otherwise an attempt to write to memory while this bit is cleared results in a page fault. Reminder that these bits also affect the child tables. So if a pml4 entry is marked as read-only, any address that gets translated through that will be read only, even if the entries in the tables below it have this bit set. +* **P** (Present): If set this tells the CPU that this entry is valid, and can be used for translation. Otherwise, translation stops here, and results in a page fault. +* **R/W** (Read/Write): Pages are always readable, setting this flag allows writing to memory via this virtual address. Otherwise, an attempt to write to memory while this bit is cleared results in a page fault. Reminder that these bits also affect the child tables. So if a pml4 entry is marked as read-only, any address that gets translated through that will be read only, even if the entries in the tables below it have this bit set. * **User/Supervisor**: It describes the privilege level required to access this address. If clear the page has the supervisor level, while if it is set the level is user. The cpu identifies supervisor/user level by checking the CPL (current protection level, set by the segment registers). If it is less than 3 then the accesses are made in supervisor mode, if it's equal to 3 they are made in user mode. * **PWT** (Page Level Write Through): Controls the caching policy (write-through or write-back). I usually leave it to 0, for more information refer to the Intel Developer Manuals. * **PCD** (Page Level Cache Disable): Controls the caching of individual pages or tables. I usually leave it to 0, for more information refer to the Intel Developer Manuals. * **A** (Accessed): This value is set by the CPU, if is 0 it means the page hasn't been accessed yet. It's set when the page (or page teble) has been accessed since this bit was last cleared. -* **D** (Dirty): If set, indicates that a page has been written to since last cleared. This flag is supposed to only apply to page tables, but some emulators will set it on other levels as well. This flag and the accessed flag are provided for being use by the memory management software, the CPU only set it when its value is 0. Otherwise is up to the operating system's memory manager to decide if it has to be cleared or not. Ignoring them is also fine. +* **D** (Dirty): If set, indicates that a page has been written to since last cleared. This flag is supposed to only apply to page tables, but some emulators will set it on other levels as well. This flag and the accessed flag are provided for being use by the memory management software, the CPU only set it when its value is 0. Otherwise, is up to the operating system's memory manager to decide if it has to be cleared or not. Ignoring them is also fine. * **PS** (Page Size): Reserved in the pml4, if set on the PDPR it means address translation stops at this level and is mapping a 1GB page. Check for 1gb page support before using this. More commonly this can be set on the PD entry to stop translation at that level, and map a 2MB page. * **PAT** (Page Attribute Table Index) only for the page table: It selects the PAT entry (in combination with the PWT and PCD bits above), refer to the Intel Manual for a more detailed explanation. * **G** (Global): If set it indicates that when CR3 is loaded or a task switch occurs that this particular entry should not be ejected. This feature is not architectural, and should be checked for before using. * **PK** (Protection Key): A 4-bit value used to control supervisor & user level accesses for a virtual address. If bit 22 (PKE) is set in CR4, the PKRU register will be used to control access rights for user level accesses based on the PK, and if bit 24 (PKS) is set, same will happen but for supervisor level accesses with the PKRS register. **Note**: This value is ignored on older CPUs, which means those bits are marked as available on them. If you want to use the protection key, make sure to check for its existence using CPUID, and of course to set the corresponding bits for it in the CR4 register. * **XD**: Also known as NX, the execute disable bit is only available if supported by the CPU (can be checked wit CPUID), otherwise reserved. If supported, and after enabling this feature in EFER (see the intel manual for this), attempting to execute code from a page with this bit set will result in a page fault. -Note about PWT and PCD, the definiton of those bits depends on whether PAT (page attribute tables) are in use or not. For a better understanding of those two bits please refer to the most updated intel documentation (is in the Paging section of the intel Software Developer Manual vol.3) +Note about PWT and PCD, the definition of those bits depends on whether PAT (page attribute tables) are in use or not. For a better understanding of those two bits please refer to the most updated intel documentation (is in the Paging section of the intel Software Developer Manual vol.3) ## Address translation @@ -250,10 +250,10 @@ Every table has 512 elements, so we have an address space of: $2^{512}*2^{512}*2 ## Page Fault -A page fault (exception 14, triggers the interrupt of the same number) is raised when address translation fails for any reason. An error code is pushed on to the stack before calling the interrupt handler describing the situation when the fault occured. Note that these bits describe was what was happening, not why the fault occured. If the user bit is set, it does not necessarily mean it was a priviledge violation. The `CR2` register also contains the address that caused the fault. +A page fault (exception 14, triggers the interrupt of the same number) is raised when address translation fails for any reason. An error code is pushed on to the stack before calling the interrupt handler describing the situation when the fault occurred. Note that these bits describe was what was happening, not why the fault occurred. If the user bit is set, it does not necessarily mean it was a privilege violation. The `CR2` register also contains the address that caused the fault. The idea of the page fault handler is to look at the error code and faulting address, and do one of several things: -- If the program is accessing memory that it should have, but hasnt been mapped: map that memory as initially requested. +- If the program is accessing memory that it should have, but hasn't been mapped: map that memory as initially requested. - If the program is attempting to access memory it should not, terminate the program. The error code has the following structure: @@ -267,10 +267,10 @@ The meanings of these bits are expanded below: * Bits 31...4 are reserved. * Bit 4: set if the fault was an instruction fetch. -* Bit 3: set if the attempted translation encuntered a reserved bit being set to 1 (at *some* level in the paging structure). +* Bit 3: set if the attempted translation encountered a reserved bit being set to 1 (at *some* level in the paging structure). * Bit 2: set if the access was a user mode access, otherwise it was supervisor mode. * Bit 1: set if the false was caused by a write, otherwise it was a read. -* Bit 0: set if a protection violation caused the fault, otherwise it means translation failed due to a non present page. +* Bit 0: set if a protection violation caused the fault, otherwise it means translation failed due to a non-present page. ## Accessing Page Tables and Physical Memory @@ -280,12 +280,12 @@ One of the problems that we face while enabling _paging_ is of how to access the There are two ways to achieve it: -* Having all the phyisical memory mapped somewhere in the virtual addressing space (probably in the _Higher Half_, in this case we should be able to retrieve all the tables easily, by just adding a prefix to the physical address of the table. -* Using a tecnique called _recursion_, where access the tables using special virtual addresses. +* Having all the physical memory mapped somewhere in the virtual addressing space (probably in the _Higher Half_, in this case we should be able to retrieve all the tables easily, by just adding a prefix to the physical address of the table). +* Using a technique called _recursion_, where access the tables using special virtual addresses. -To use the recursion the only thing we need to do, is reserve an entry in the _root_ page directory (`PML4` in our case) and make its base address to point to the directory itsef. +To use the recursion the only thing we need to do, is reserve an entry in the _root_ page directory (`PML4` in our case) and make its base address to point to the directory itself. -A good idea is to pick a number high enough, that will not interfer with other kernel/hardware special addresses. For example let's use the entry `510` for the recurisve item +A good idea is to pick a number high enough, that it will not interfere with other kernel/hardware special addresses. For example let's use the entry `510` for the recursive item. Creating the self reference is pretty straightforward, we just need to use the directory physical address as the base address for the entry being created: @@ -301,12 +301,12 @@ Now as we have seen above address translation will split the `virtual address` i virt_addr = 0xff7f80005000 ``` -The entries in this address are: 510 for PML4, 510 for PDPR, 0 for PD and 5 for PT (we are using 4k pages for this example). Now let's see what appens from the point of view of the address translation: +The entries in this address are: 510 for PML4, 510 for PDPR, 0 for PD and 5 for PT (we are using 4k pages for this example). Now let's see what happens from the point of view of the address translation: * First the `510th` PML4 entry is loaded, that is the pointer to the PDPR, and in this case its content is PML4 itself. * Now it get the next entry from the address, to load the PD, that is again the `510th`, and is again PML4 itself, so it is loaded as PD too. * It is time for the third entry the PT, and in this case we have `0`, so it loads the first entry from the Page Directory loaded, that in this case is still PML4, so it loads the PDPR table -* Finally the PT entry is loaded, that is `5`, and since the current PD loaded for translation is actually a PDPR we are going to get the `5th` item of the page directory. +* Finally, the PT entry is loaded, that is `5`, and since the current PD loaded for translation is actually a PDPR we are going to get the `5th` item of the page directory. * Now the last part of the address is the offset, this can be used then to access the entries of the directory/table loaded. This means that by carefully using the recursive item from PML4 we can access all the tables. @@ -315,22 +315,22 @@ Few more examples of address translation: * PML4: 511 (hex: 1ff) - PDPR: 510 (hex: 1fe) - PD 0 (hex: 0) using 2mb pages translates to: `0xFFFF'FFFF'8000'0000`. * Let's assume we mapped PML4 into itself at entry 510, - - If we want to access the content of the PML4 page itself, using the recursion we need to build a special address using the entries: _PML4: 510, PDPR: 510, PD: 510, PT: 510_, now keep in mind that the 510th entry of PML4 is PML4 itself, so this means that when the processor loads that entry, it loads PML4 itself instead of PDPR, but now the value for the PDPR entry is still 510, that is still PML4 then, the table loaded is PML4 again, repat this process for PD and PT with page number equals to 510, and we got access to the PML4 table. - - Now using a similar approach we can get acces to other tables, for example the following values: _PML4: 510, PDPR:510, PD: 1, PT: 256_, will give access at the Page Directory PD at entry number 256 in PDPR that is contained in the first PML4 entry. + - If we want to access the content of the PML4 page itself, using the recursion we need to build a special address using the entries: _PML4: 510, PDPR: 510, PD: 510, PT: 510_, now keep in mind that the 510th entry of PML4 is PML4 itself, so this means that when the processor loads that entry, it loads PML4 itself instead of PDPR, but now the value for the PDPR entry is still 510, that is still PML4 then, the table loaded is PML4 again, repeat this process for PD and PT with page number equals to 510, and we got access to the PML4 table. + - Now using a similar approach we can get access to other tables, for example the following values: _PML4: 510, PDPR:510, PD: 1, PT: 256_, will give access at the Page Directory PD at entry number 256 in PDPR that is contained in the first PML4 entry. This technique makes it easy to access page tables in the current address space, but it falls apart for accessing data in other address spaces. For that purpose, we'll need to either use a different technique or switch to that address space, which can be quite costly. ### Direct Map -Another technique for modifying page tables is a 'direct map' (similar to an identity map). As we know an identity map is when a page's physical address is the same as its virtual address, and we could describe it as: `paddr = vaddr`. A direct map is sometimes referred to as an _offset map_ because it introduces an offset, which gives us some flexibility. We're using to have a global variable containing the offset for our map called `dmap_base`. Typically we'll set this to some address in the higher half so that the lower half of the address space is completely free for userspace programs. This also makes other parts of the kernel easier later on. +Another technique for modifying page tables is a 'direct map' (similar to an identity map). As we know an identity map is when a page's physical address is the same as its virtual address, and we could describe it as: `paddr = vaddr`. A direct map is sometimes referred to as an _offset map_ because it introduces an offset, which gives us some flexibility. We're using to have a global variable containing the offset for our map called `dmap_base`. Typically, we'll set this to some address in the higher half so that the lower half of the address space is completely free for userspace programs. This also makes other parts of the kernel easier later on. -How does the direct map actually work though? It's simple enough, we just map all of physical memory at the same virtual address *plus the dmap_base offset*: `paddr = vaddr - dmap_base`. Now in order to access a physical page (from our PMM for example) we just add `dmap_base` to it and we can read and write to it as normal. +How does the direct map actually work though? It's simple enough, we just map all of physical memory at the same virtual address *plus the dmap_base offset*: `paddr = vaddr - dmap_base`. Now in order to access a physical page (from our PMM for example) we just add `dmap_base` to it, and we can read and write to it as normal. The direct map does require a one-time setup early in your kernel, as you do need to map all usable physical memory starting at `dmap_base`. This is no more work than creating an identity map though. What address should you use for the base address of the direct map? Well you can put it at the lowest address in the higher half, which depends on how many levels of page tables you have. For 4 level paging this will `0xffff'8000'0000'0000`. -While recursive paging only requires using a single page table entry at the highest level, a direct map consumes a decent chunk of address space. A direct map is also more flexible as it allows the kernel to access arbitrary parts of physical memory as needed, . Direct mapping is only really possible in 64-bit kernels due to the large address space made available, 32-bit kernels should opt to use recursive mapping to reduce the amount of address space used. +While recursive paging only requires using a single page table entry at the highest level, a direct map consumes a decent chunk of address space. A direct map is also more flexible as it allows the kernel to access arbitrary parts of physical memory as needed. Direct mapping is only really possible in 64-bit kernels due to the large address space made available, 32-bit kernels should opt to use recursive mapping to reduce the amount of address space used. The real potential of this technique will unveil when we have multiple address spaces to handle, when the kernel may need to update data in different address spaces (especially the paging data structures), in this case using the direct map it can access any data in any address space, by only knowing its physical address. It will also help when we will start to work on device drivers (out of the scope of this book) where the kernel may need to access the DMA buffers, that are stored by their physical addresses. diff --git a/04_Memory_Management/04_Virtual_Memory_Manager.md b/04_Memory_Management/04_Virtual_Memory_Manager.md index 019078ec..3aa9f6ff 100644 --- a/04_Memory_Management/04_Virtual_Memory_Manager.md +++ b/04_Memory_Management/04_Virtual_Memory_Manager.md @@ -4,7 +4,7 @@ At first a virtual memory manager might not seem like necessary when we have paging, but the VMM serves as an abstraction on top of paging (or whatever memory management hardware our platform has), as well as abstracting away other things like memory mapping files or even devices. -As mentioned before, a simple kernel only requires a simple VMM which may end up being a glorified page-table manager. However as our kernel grows more complex, so will the VMM. +As mentioned before, a simple kernel only requires a simple VMM which may end up being a glorified page-table manager. However, as our kernel grows more complex, so will the VMM. ### Virtual Memory @@ -19,10 +19,10 @@ Virtual memory can be imagined as how the program views memory, as opposed to ph Now that we have a layer between how a program views memory and how memory is actually laid out, we can do some interesting things: - Making all of physical memory available as virtual memory somewhere is a common use. You'll need this to be able to modify page tables. The common ways are to create an identity map, or to create an identity map but shift it into the higher half (so the lower half is free for userspace later on). -- Place things in memory at near-impossible addresses. Higher half kernels are commonly placed at -2GB as this allows for certain compiler optimizations. On a 64-bit machine -2GB is `0xFFFF'FFFF'8000'0000`. Placing the kernel at that address without virtual memory would require an insane amount of physical memory to be present. This can also be extended to do things like place MMIO at more convinient locations. -- We can protect regions of memory. Later on once we reach userspace, we'll still need the kernel loaded in virtual memory to handle interrupts and provide system calls, but we don't want the user program to arbitarily access kernel memory. +- Place things in memory at near-impossible addresses. Higher half kernels are commonly placed at -2GB as this allows for certain compiler optimizations. On a 64-bit machine -2GB is `0xFFFF'FFFF'8000'0000`. Placing the kernel at that address without virtual memory would require an insane amount of physical memory to be present. This can also be extended to do things like place MMIO at more convenient locations. +- We can protect regions of memory. Later on once we reach userspace, we'll still need the kernel loaded in virtual memory to handle interrupts and provide system calls, but we don't want the user program to arbitrarily access kernel memory. -We can also add more advanced features later on, like demand paging. Typically when a program (including the kernel) asks the VMM for memory, and the VMM can successfully allocate it, physical memory is mapped there right away. *Immediately backing* like this has advantages in that it's very simple to implement, and can be very fast. The major downside is that we trust the program to only allocate what it needs, and if it allocates more (which is very common) that extra physical memory is wasted. In contrast, *demand paging* does not back memory right away, instead relying on the program to cause a page fault when it accesses the virtual memory it just allocated. At this point the VMM now backs that virtual memory with some physical memory, usually a few pages at a time (to save overhead on page-faults). The benefits of demand-paging are that it can reduce physical memory usage, but it can slow down programs if not implemented carefully. It also requires a more complex VMM, and the ability to handle page faults properly. +We can also add more advanced features later on, like demand paging. Typically, when a program (including the kernel) asks the VMM for memory, and the VMM can successfully allocate it, physical memory is mapped there right away. *Immediately backing* like this has advantages in that it's very simple to implement, and can be very fast. The major downside is that we trust the program to only allocate what it needs, and if it allocates more (which is very common) that extra physical memory is wasted. In contrast, *demand paging* does not back memory right away, instead relying on the program to cause a page fault when it accesses the virtual memory it just allocated. At this point the VMM now backs that virtual memory with some physical memory, usually a few pages at a time (to save overhead on page-faults). The benefits of demand-paging are that it can reduce physical memory usage, but it can slow down programs if not implemented carefully. It also requires a more complex VMM, and the ability to handle page faults properly. On the topic of advanced VMM features, it can also do other things like caching files in memory, and then mapping those files into the virtual address space somewhere (this is what the `mmap` system call does). @@ -33,7 +33,7 @@ A lot of these features are not needed in the beginning, but hopefully the uses As it might be expected, there are many VMM designs out there. We're going to look at a simple one that should provide all the functionality needed for now. First we'll need to introduce a new concept: a *virtual memory object*, sometimes called a *virtual memory range*. This is just a struct that represents part of the virtual address space, so it will need a base address and length, both of these are measured in bytes and will be page-aligned. This requirement to be page-aligned comes from the mechanism used to manage virtual memory: paging. On `x86` the smallest page we can manage is `4K`, meaning that all of our VM objects must be aligned to this. -In addition we might want to store some flags in the *vm object*, they are like the flags used in the page tables, we could technically just store them there, but having them as part of the object makes looking them up faster, since we don't need to manually traverse the paging structure. It also allows us to store flags that the are not relevant to paging. +In addition, we might want to store some flags in the *vm object*, they are like the flags used in the page tables, we could technically just store them there, but having them as part of the object makes looking them up faster, since we don't need to manually traverse the paging structure. It also allows us to store flags that are not relevant to paging. Here's what our example virtual memory object looks like: @@ -53,7 +53,7 @@ typedef struct { The `flags` field is actually a bitfield, and we've defined some macros to use with it. -These don't correspond to the bits in the page table, but having them separate like this means they are platform-agnostic. We can port our kernel to any cpu architecture that supports some kind of MMU and most of the code won't need to change, we'll just need a short function that converts our vm flags into page table flags. This is especially convenient for oddities like `x86` and its ` nx-bit`, where all memory is executable by default, and it must specified if the memory *don't* want to be executable. +These don't correspond to the bits in the page table, but having them separate like this means they are platform-agnostic. We can port our kernel to any cpu architecture that supports some kind of MMU and most of the code won't need to change, we'll just need a short function that converts our vm flags into page table flags. This is especially convenient for oddities like `x86` and its ` nx-bit`, where all memory is executable by default, and it must be specified if the memory *shouldn't* be to be executable. Having it like this allows that to be abstracted away from the rest of our kernel. For `x86_64` our translation function would look like the following: @@ -90,12 +90,12 @@ This is where design and reality collide, because our high level VMM needs to pr void* vmm_pt_root; ``` -This variable can be placed anywhere, this depend on our design decisions, there is not correct answer, but a good idea is to reserve some space in the VM space to be used by the VMM to store its data. Usually a good idea is to place this space somewhere in the higher half area probably anywhere below the kernel. +This variable can be placed anywhere, this depends on our design decisions, there is no correct answer, but a good idea is to reserve some space in the VM space to be used by the VMM to store its data. Usually a good idea is to place this space somewhere in the higher half area probably anywhere below the kernel. Once we got the address, this needs to be mapped to an existing physical address, so we will need to do two things: * Allocate a physical page for the `vmm_pt_root` pointer (at this point a function to do that should be present) -* Map the phyiscal address into the virtual address `vmm_pt_root`. +* Map the physical address into the virtual address `vmm_pt_root`. It is important to keep in mind that the all the addresses must be page aligned. @@ -115,7 +115,7 @@ The `length` field is how many bytes we want. Internally we will round this **up The final argument is unused for the moment, but will be used to pass data for more exotic allocations. We'll look at an example of this later on. -The function will return a virtual address, it doesn't have necessarily to be already mapped and present, it just need to be an available address. Again the question is: where is that address? The answer again is that it depends on the design decisions. So we need to decide where we want the virtual memory range to be returned is, and use it as starting address. It can be the same space used for the vmm data strutctures, or another area, that is up to us, of course this decision will have an impact on the design of the algorithm. +The function will return a virtual address, it doesn't have necessarily to be already mapped and present, it just needs to be an available address. Again the question is: where is that address? The answer again is that it depends on the design decisions. So we need to decide where we want the virtual memory range to be returned is, and use it as starting address. It can be the same space used for the vmm data structures, or another area, that is up to us, of course this decision will have an impact on the design of the algorithm. For the example code we're going to assume we have a function to modify page tables that looks like the following: @@ -173,7 +173,7 @@ else latest->next = current; ``` -What happens next depends on the design of the VMM. We're going to use immediate backing to keep things simple, meaning we will immedately map some physical memory to the virtual memory we've allocated. +What happens next depends on the design of the VMM. We're going to use immediate backing to keep things simple, meaning we will immediately map some physical memory to the virtual memory we've allocated. ```c //immediate backing: map physical pages right away. @@ -190,9 +190,9 @@ We're not handling errors here to keep the focus on the core code, but they shou What about that extra argument that's gone unused? Right now it serves no purpose, but we've only looked at one use of the VMM: allocating working memory. -Working memory is called anonymous memory in the unix world and refers to what most programs think of as just 'memory'. It's a temporary data store for while the program is running, it's not persistent between runs and only the current program can access it. Currently this is all our VMM supports. +Working memory is called anonymous memory in the unix world and refers to what most programs think of as just 'memory'. It's a temporary data store for while the program is running, it's not persistent between runs and only the current program can access it. Currently, this is all our VMM supports. -The next thing we should add support for is mapping MMIO (memory mapped I/O). Plenty of modern devices will expose their interfaces via mmio, like the APICs, PCI config space or NVMe controllers. MMIO is usually some physical addresses we can interact with, that are redirected to the internal registers of the device. The trick is that MMIO requires us to access *specific* physical addresses, see the issue with our current design? +The next thing we should add support for is mapping MMIO (memory mapped I/O). Plenty of modern devices will expose their interfaces via MMIO, like the APICs, PCI config space or NVMe controllers. MMIO is usually some physical addresses we can interact with, that are redirected to the internal registers of the device. The trick is that MMIO requires us to access *specific* physical addresses, see the issue with our current design? This is easily solved however! We can add a new `VM_FLAG` that specifies we're allocating a virtual object for MMIO, and pass the physical address in the extra argument. If the VMM sees this flag, it will know not to allocate (and later on, not to free) the mapped physical address. This is important because there's not any physical memory there, so we don't want to try free it. @@ -214,7 +214,7 @@ else map_memory(vmm_pt_root, phys, (void*)obj->base, convert_x86_64_vm_flags(flags)); ``` -Now we have check for whether an object is MMIO or not. If it is, we don't allocate physical memory to back it. Instead we just modify the page tables to point to the physical address we want it too. +Now we have to check for whether an object is MMIO or not. If it is, we don't allocate physical memory to back it. Instead, we just modify the page tables to point to the physical address we want it too. At this point our VMM can allocate any object types we'll need for now, and hopefully we can start to see the purpose of the VMM. @@ -224,7 +224,7 @@ As mentioned previously a more advanced design could allow for memory mapping fi We've looked at allocating virtual memory, how about freeing it? This is quite simple! To start with, we'll need to find the VM object that represents the memory we want to free: this can be done by searching through the list of objects until we find the one we want. -If we don't find a VM object with a matching base address, something has gone wrong and error for debugging should be emitted. Otherwise the VM object can be safely removed from the linked list. +If we don't find a VM object with a matching base address, something has gone wrong and error for debugging should be emitted. Otherwise, the VM object can be safely removed from the linked list. At this point the object's flags need to be inspected to determine how to handle the physical addresses that are mapped. If the object represents MMIO, it will only need to remove the mappings from the page tables. If the object is working (anonymous) memory, which is indicated by the `VM_FLAG_MMIO` bit being cleared, the physical addresses in the page tables are page frames. The physical memory manager should be informed that these frames are now free after removing the mappings. @@ -240,7 +240,7 @@ Now that we have a virtual memory manager, let's take a look at how we might use ### Example 1: Allocating A Temporary Buffer -Traditionally `malloc()` or a variable-length array for something like this should be used. However there isn't a heap yet (see the next chapter), and allocating from the VMM directly like this gives few guarentees, we might want, like the memory always being page-aligned. +Traditionally `malloc()` or a variable-length array for something like this should be used. However, there isn't a heap yet (see the next chapter), and allocating from the VMM directly like this gives few guarantees, we might want, like the memory always being page-aligned. ```c void* buffer = vmm_alloc(buffer_length, VM_FLAG_WRITE, NULL); @@ -274,7 +274,7 @@ We've looked a basic VMM implementation, and discussed some advanced concepts to - A function to get the physical address (if any) of a virtual address. This is essentially just walking the page tables in software, with extra logic to ensure a VM object exists at that address. We could add the ability to check if a VM object has specific flags as well. - A way to copy data between separate VMMs (with separate address spaces). There are a number of ways to do this, it can be an interesting problem to solve. We'll actually look at some roundabout ways of doing this later on when we look at IPC. - Cleaning up a VMM that is no longer in use. When a program exits, we'll want to destroy the VMM associated with it to reclaim some memory. -- Adding upper and lower bounds to where `vmm_alloc` will search. This can be useful for debugging, or it want to a split higher half VMM/lower half VMM design like mentioned previously. +- Adding upper and lower bounds to where `vmm_alloc` will search. This can be useful for debugging, or it wants to a split higher half VMM/lower half VMM design like mentioned previously. ## Final Notes diff --git a/04_Memory_Management/05_Heap_Allocation.md b/04_Memory_Management/05_Heap_Allocation.md index 2b909029..7a3d7d3c 100644 --- a/04_Memory_Management/05_Heap_Allocation.md +++ b/04_Memory_Management/05_Heap_Allocation.md @@ -2,14 +2,14 @@ ## Introduction -Welcome to the last layer of memory allocation, the heap, this is where usually the various alloc functions are implemented. This layer is usually built on top of the other layers of memory management (PMM and VMM), but a heap can be built on top of anything, even another heap! Since different imeplementations have different charactistics, they may be favoured for certain things. We will describe a way of building a heap allocator that is easy to understand, piece by piece. The final form will be a linked list. +Welcome to the last layer of memory allocation, the heap, this is where usually the various alloc functions are implemented. This layer is usually built on top of the other layers of memory management (PMM and VMM), but a heap can be built on top of anything, even another heap! Since different implementations have different characteristics, they may be favoured for certain things. We will describe a way of building a heap allocator that is easy to understand, piece by piece. The final form will be a linked list. We'll focus on three things: allocating memory (`alloc()`), freeing memory (`free()`) and the data structure needed for those to work. ### To Avoid Confusion -The term 'heap' has a few meanings, and if coming from a computer science course the first though might be the data structure (specialized tree). That can be used to implement a heap allocator (hence the name), but its not what we're talking about here. +The term 'heap' has a few meanings, and if coming from a computer science course the first though might be the data structure (specialized tree). That can be used to implement a heap allocator (hence the name), but it's not what we're talking about here. This term when used in a memory management/osdev environment has a different meaning, and it usually refers to the code where memory is _dynamically allocated_ (`malloc()` and friends). @@ -41,7 +41,7 @@ A heap allocator exposes two main functions: * `void *alloc(size_t size);` To request memory of size bytes. * `void free(void *ptr);` To free previously allocated memory. -In user space these are the well known `malloc()/free()` functions. However the kernel will also need its own heap (we don't want to put data where user programs can access it!). The kernel heap usually exposes functions called `kmalloc()/kfree()`. Functionally these heaps can be the same. +In user space these are the well known `malloc()/free()` functions. However, the kernel will also need its own heap (we don't want to put data where user programs can access it!). The kernel heap usually exposes functions called `kmalloc()/kfree()`. Functionally these heaps can be the same. So let's get started with describing the allocation algorithm. @@ -139,7 +139,7 @@ Now we're going to build a new allocator based on the one we just implemented. T Now the problem is: how do we keep track of this information? -For this example let's keep things extermely simple: place the size just before the pointer. Whenever we make an allocation we write the size to the address pointed by `cur_heap_position`, increment the pointer and return that address. The updated code should look like the following: +For this example let's keep things extremely simple: place the size just before the pointer. Whenever we make an allocation we write the size to the address pointed by `cur_heap_position`, increment the pointer and return that address. The updated code should look like the following: ```c uint8_t *heap_start = 0; @@ -162,7 +162,7 @@ This new function potentially fixes one of the problems we listed above: it can *Authors note: just a reminder that the pointer is a uint8_t pointer, so when we are storing the size, the memory cell pointed by cur_heap_position will be of type *uint8_t*, that means that in this example and the followings, the size stored can be maximum 255. In a real allocator we want to support bigger allocations, so using at least a `uint32_t` or even `size_t` is recommended.* -In this example, the number indicates the size of the allocated block. There have already been 2 memory allocations, with the first of 2 bytes and the second of 7 bytes. Now if we want to iterate from the first to the last item allocated the code will looks like: +In this example, the number indicates the size of the allocated block. There have already been 2 memory allocations, with the first of 2 bytes and the second of 7 bytes. Now if we want to iterate from the first to the last item allocated the code will look like: ```c uint8_t *cur_pointer = start_pointer; @@ -176,14 +176,14 @@ But are we able to reclaim unused memory with this approach? The answer is no. Y ### Part 3: Actually Adding Free() -So to solve this issue we need to keep track of a new information: whether a chunk of memory is used or free. +So to solve this issue we need to keep track of the new information: whether a chunk of memory is used or free. So now everytime we will make an allocation we will keep track of: * the allocated size * the status (free or used) -At this point our new heap allocation will looks like: +At this point our new heap allocation will look like: | 0000 | 0001 | 0002 | 0003 | 0004 | ... | 0011 | 0011 | 0013 | ... | 00100 | |------|------|------|-------|-------|-----|-------|------|------|-----|-------| @@ -232,8 +232,8 @@ Yeah, that's it! We just need to change the status, and the allocator will be ab Now that we can free, we should add support for returning from this freed memory. How the new `alloc()` works is as follows: -* Alloc will start from the beginning of the heap, traversing it until the latest address allocated (the current end of the heap) looking for a chunk who's size is bigger than the requested size. -* If found mark that chunk as USED. The size doesn't need to be updated since it's not changing, so assuming that `cur_pointer` is pointing to the first metatata byte of the location to be returned (the size in our example) the code to update and return the current block will be pretty simple: +* Alloc will start from the beginning of the heap, traversing it until the latest address allocated (the current end of the heap) looking for a chunk whose size is bigger than the requested size. +* If found mark that chunk as USED. The size doesn't need to be updated since it's not changing, so assuming that `cur_pointer` is pointing to the first metadata byte of the location to be returned (the size in our example) the code to update and return the current block will be pretty simple: ```c cur_pointer = cur_pointer + 1; //remember cur_pointer is pointing to the size byte, and is different from current_heap end @@ -288,14 +288,14 @@ void *third_alloc(size_t size) { } ``` -If we are returning a previously allocated address, we don't need move `cur_heap_position`, since we are reusing an area of memory that is before the end of the heap. +If we are returning a previously allocated address, we don't need to move `cur_heap_position`, since we are reusing an area of memory that is before the end of the heap. Now we have a decent and working function that can free previously allocated memory, and is able to reuse it. It is still not perfect and there are several major problems: * There is a lot of potential waste of space, for example if we are allocating 10 bytes, and the heap has two holes big enough the first is 40 bytes, the second 14, the algorithm will pick the first one free so the bigger one with a waste of 26 bytes. There can be different solution to this issue, but is out of the purpose of this tutorial (and eventually left as an exercise) -* It can suffer of fragmentation. Basically there can be a lot of small freed areas that the allocator will not be able to use because of their size. A partial solution to this problem is described in the next paragraph. +* It can suffer from fragmentation. Basically there can be a lot of small freed areas that the allocator will not be able to use because of their size. A partial solution to this problem is described in the next paragraph. -Another thing worth doing to improve readability of the code is replace the direct pointer access with a more elegant data structure. This lets us add more fields (as we will in the next paragraph) as needed. +Another thing worth doing is to improve readability of the code by replacing the direct pointer access with a more elegant data structure. This lets us add more fields (as we will in the next paragraph) as needed. So far our allocator needs to keep track of just the size of the block returned and its status The data structure for this could look like the following: @@ -312,12 +312,12 @@ That's it! That's what we need to clean up the code and replace the pointers in So now we have a basic memory allocator (woo hoo), and we are nearing the end of our memory journey. -In this part we'll see how to help mitigate the *fragmentation* problem. It is not a definitive solution, but this let us to reuse memory in a more efficient way. Before proceeding let's recap what we've done so far. +In this part we'll see how to help mitigate the *fragmentation* problem. It is not a definitive solution, but this lets us reuse memory in a more efficient way. Before proceeding let's recap what we've done so far. We started from a simple pointer to the latest allocated location, and added information in order to keep track of what was previously allocated and how big it was, needed to reuse the freed memory. We've basically created a list of memory regions that we can traverse to find the next/prev region. -Lets look at fragmentation a little more closely, in the following example. We assume that we have a heap limited to 25 bytes: +Let's look at fragmentation a little more closely, in the following example. We assume that we have a heap limited to 25 bytes: ```c a = third_alloc(6); @@ -335,25 +335,25 @@ What the heap will look like after the code above? | 6 | F | X | .. | X | 6 | F | X | .. | X | 6 | F | .. | X | | | -Now, all of the memory in the heap is available to allocate (except for the overhead used to store the status of each chunk), and everything looks perfectly fine. But now the code keeps executing and it will arrive at the following instruction: +Now, all of the memory in the heap is available to allocate (except for the overhead used to store the status of each chunk), and everything looks perfectly fine. But now the code keeps executing, and it will arrive at the following instruction: ```c alloc(7); ``` -Pretty small allocation and we have plenty of space... no wait. The heap is mostly empty but we can't allocate just 7 bytes because all the free blocks are too small. That is _fragmentation_ in a nutshell. +Pretty small allocation, and we have plenty of space... no wait. The heap is mostly empty, but we can't allocate just 7 bytes because all the free blocks are too small. That is _fragmentation_ in a nutshell. How do we solve this issue? The idea is pretty straightforward, every time a memory location is being freed, we do the following: -* First check if it is adjacent to to other free locations (both directions: previous and next) - * If `ptr_to_free + ptr_to_free_size == next_node` then merge the two nodes and create a single node of `ptr_to_free_size + next_node_size` (notice we don't ned to add the size of `Heap_node` because `ptr` should be the address immediately after the struct). +* First check if it is adjacent to other free locations (both directions: previous and next) + * If `ptr_to_free + ptr_to_free_size == next_node` then merge the two nodes and create a single node of `ptr_to_free_size + next_node_size` (notice we don't need to add the size of `Heap_node` because `ptr` should be the address immediately after the struct). * If `prev_node_address + prev_node_size + sizeof(Heap_Node) == ptr_to_free` then merge the two nodes and create a single node of `prev_node_size + ptr_to_free_size` * If not just mark this location as free. There are different ways to implement this: -* Adding a `next` and `prev` pointer to the node structure. This is the way we'll use in the rest of this chapter. This makes checking the next and previous nodes for mergability very easy. It does dramatically increase the memeory overhead. Checking if a node can be merged can be done via `(cur_node->prev).status = FREE` and `(next_node->next).status = FREE`. -* Otherwise without adding the next and prev pointer to the node, we can scan the heap from the start until the node before `ptr_to_free`, and if is free we can merge. For the next node instead things are easier: we just need to check if the node starting at `ptr_to_free + ptr_size` if it is free is possible to merge. By comparison this increases the runtime overhead of `free()`. +* Adding a `next` and `prev` pointer to the node structure. This is the way we'll use in the rest of this chapter. This makes checking the next and previous nodes for merge-ability very easy. It does dramatically increase the memory overhead. Checking if a node can be merged can be done via `(cur_node->prev).status = FREE` and `(next_node->next).status = FREE`. +* Otherwise, without adding the next and prev pointer to the node, we can scan the heap from the start until the node before `ptr_to_free`, and if is free we can merge. For the next node instead things are easier: we just need to check if the node starting at `ptr_to_free + ptr_size` if it is free is possible to merge. By comparison this increases the runtime overhead of `free()`. Both solutions have their own pros and cons, like previously mentioned we'll go with the first one for these examples. Adding the `prev` and `next` pointers to the heap node struct leaves us with: @@ -373,7 +373,7 @@ So now our heap node will look like the following in memory: |----|------|-------|------|-----| | 6 | F/U | PREV | NEXT | X | -As mentioned earlier using the double linked list the check for mergeability is more straightforward. For example to check if we can merge with the left node we just need to check the status of the node pointed by the prev field, if it is free than they can be merged. To merge with the previous node would apply the logic below to `node->prev`: +As mentioned earlier using the double linked list the check for merge-ability is more straightforward. For example to check if we can merge with the left node we just need to check the status of the node pointed by the prev field, if it is freer than they can be merged. To merge with the previous node would apply the logic below to `node->prev`: * Update the `size` its, adding to it the size of cur_node * Update the `next` pointer to point to cur_node->next @@ -386,7 +386,7 @@ Of course merging with the right node is the opposite (update the size and the p **Important note:** We always want to merge in the order of `current + next` and then `prev + current` as if the prev node absorbs current, what happens to the memory owned by the next node when merged with it? Nothing, it's simply lost. It can be avoided with clever and careful logic, but the simpler solution is to simply merge in the right order. -Below a pseudo-code example of how to merge left: +Below a pseudocode example of how to merge left: ```c Heap_Node *prev_node = cur_node->prev //cur_pointer is the node we want to check if can be merged @@ -405,7 +405,7 @@ What we're describing here is the left node being "swallowed" by the right one, ![Heap initial status](/Images/heapexample.png) -Basically the heap starts from address 0, the first node is marked as free and the next two nodes are both used. Now imagine that `free()` is called on the second address (for this exammple we consider size of the heap node structure to be just of 2 bytes): +Basically the heap starts from address 0, the first node is marked as free and the next two nodes are both used. Now imagine that `free()` is called on the second address (for this example we consider size of the heap node structure to be just of 2 bytes): ```c free(0x27); //Remember the overhead @@ -422,7 +422,7 @@ The fields in bold are the fields that are changed. The exact implementation of Now we have a way to help reduce fragmentation, on to the next major issue: wasted memory from allocating chunks that are too big. In this part we will see how to mitigate this. -Imagine our memory manager is allocating and freeing memory for a while and we arrive at a moment in time where we have just three nodes: +Imagine our memory manager is allocating and freeing memory for a while, and we arrive at a moment in time where we have just three nodes: * The first node Free, size of 150 bytes (the heap start). * The second node Used size of 50 bytes. @@ -440,10 +440,10 @@ The allocator is going to look for the first node it can return that is at least The workflow will be the following: * Find the first node that is big enough to contain the incoming request. -* Create a new node at the address `(uintptr_t)cur_node + requested_bytes`. Set this node's size to `cur_node->size - requested_bytes - sizeof(Heap_Node)`, we're substracting the size of the Heap_Node struct here because we're going to use some memory in the heap to store this new node. This is the process of inserting into the heap. +* Create a new node at the address `(uintptr_t)cur_node + requested_bytes`. Set this node's size to `cur_node->size - requested_bytes - sizeof(Heap_Node)`, we're subtracting the size of the Heap_Node struct here because we're going to use some memory in the heap to store this new node. This is the process of inserting into the heap. * `cut_node->size` should now be the requested size. * In our example we're using a doubly-linked list (i.e. both forward and back), so we'll need to update the current node and the next node's pointers to include this new node (update its pointers too). -* One edge case to be aware of here is if node that was split was the last node of the heap, The `heap_tail` variable should be updated as well, if it is being used (this depend on design decisions). +* One edge case to be aware of here is if node that was split was the last node of the heap, The `heap_tail` variable should be updated as well, if it is being used (this depends on design decisions). After that the allocator can compute the address to return using `(uintptr_t)cur_node + sizeof(Heap_node)`, since we want to return the memory *after* the node, not the node itself (otherwise the program would put data there and overwrite what we've stored there!). @@ -451,7 +451,7 @@ After that the allocator can compute the address to return using `(uintptr_t)cur Before wrapping up there's a few things worth pointing out about implementing splitting: * Remember that every node has some overhead, so when splitting we shouldn't have nodes smaller (or equal to) than `sizeof(Heap_Node)`, because otherwise they will never be allocated. -* It's a good idea to have a minimum size for the memory a chunk can contain, to avoid having a large number of nodes and for easy alignment later on. For example if the minimum_allocable_size is 0x20 bytes, and we want to allocate 5 bytes, we will still receive a memory block of `0x20` bytes. The program may not know it was returned `0x20` bytes, but that is okay. What exactly value should be used for it is implementatin specific, values of `0x10` and `0x20` are popular. +* It's a good idea to have a minimum size for the memory a chunk can contain, to avoid having a large number of nodes and for easy alignment later on. For example if the `minimum_allocatable_size` is 0x20 bytes, and we want to allocate 5 bytes, we will still receive a memory block of `0x20` bytes. The program may not know it was returned `0x20` bytes, but that is okay. What exactly value should be used for it is implementation specific, values of `0x10` and `0x20` are popular. * Always remember that there is the memory footprint of `sizeof(Heap_Node)` bytes while computing sizes that involve multiple nodes. If we decide to include the overhead size in the node's size, remember to also subtract it when checking for suitable nodes. And that's it! @@ -479,26 +479,26 @@ void initialize_heap() { Now the question is, how do we choose the starting address? This really is arbitrary. We can pick any address that we like, but there are a few constraints that we should follow: * Some memory is used by the kernel, we don't want to overwrite anything with our heap, so let's keep sure that the area we are going is free. -* Usually when paging is enabled, in many case the kernel is moved to one half of the memory space (usually referred as to HIGHER_HALF and LOWER_HALF) so when deciding the initial address we should place it in the correct half, so if the kernel is placed in the HIGHER and we are implementing the kernel heap it should go on the HIGHER Half and if it is for the user space heap it will goes on the LOWER half. +* Usually when paging is enabled, in many case the kernel is moved to one half of the memory space (usually referred as to HIGHER_HALF and LOWER_HALF) so when deciding the initial address we should place it in the correct half, so if the kernel is placed in the HIGHER half, and we are implementing the kernel heap it should go on the HIGHER half and if it is for the user space heap it will go to the LOWER half. -For the kernel heap, a good place for it to start is immediately following the kernel binary in memory. If the kernel is loaded at `0xFFFFFFFF80000000` as is common for higher half kernels, and the kernel is `0x4321` bytes long. It round up to the nearest page and then add another page (`0x4321` gets rounded to `0x5000`, add `0x1000` now we're at `0x6000`). Therefore our kernel heap would start at `0xFFFFFFFF80006000`. +For the kernel heap, a good place for it to start is immediately following the kernel binary in memory. If the kernel is loaded at `0xFFFFFFFF80000000` as is common for higher half kernels, and the kernel is `0x4321` bytes long. It round up to the nearest page and then add another page (`0x4321` gets rounded to `0x5000`, add `0x1000` now we're at `0x6000`). Therefore, our kernel heap would start at `0xFFFFFFFF80006000`. -The reason for the empty page is that it can be left unmapped, and then any buggy code that attempts to access memory *before* the heap will likely cause a page fault, rather then returning bits of the kernel. +The reason for the empty page is that it can be left unmapped, and then any buggy code that attempts to access memory *before* the heap will likely cause a page fault, rather than returning bits of the kernel. And that's it, that is how the heap is initialized with a single node. The first allocation will trigger a split from that node... and so on... ### Part 8: Heap Expansion -One final part that we will explained briefly, is what happens when we reach the end of the heap. Imagine the following scenario we have done a lot of allocations, most of the heap nodes are used and the few usable nodes are small. The next allocation request will fail to find a suitable node because the requested size is bigger than any free node available. Now the allocator has searched through the heap, and reached the end without success. What happens next? Time to expand the heap by adding more memory to the end of it. +One final part that we will be explained briefly, is what happens when we reach the end of the heap. Imagine the following scenario we have done a lot of allocations, most of the heap nodes are used and the few usable nodes are small. The next allocation request will fail to find a suitable node because the requested size is bigger than any free node available. Now the allocator has searched through the heap, and reached the end without success. What happens next? Time to expand the heap by adding more memory to the end of it. Here is where the virtual memory manager will join the game. Roughly what will is: * The heap allocator will first check if we have reached the end of the address space available (unlikely). -* If not it will ask to the VMMmanager to map a number of pages (exact number depends on implementation) at the address starting from `heap_end + heap_end->size + sizeof(heap_node)`. +* If not it will ask the VMManager to map a number of pages (exact number depends on implementation) at the address starting from `heap_end + heap_end->size + sizeof(heap_node)`. * If the mapping fail, the allocation will fail as well (i.e. out of memory/OOM. This is an issue to solve in its own right). -* If the mapping is succesfull, then we have just created a new node to be appended to the current end of the heap. Once this is done we can proceed with the split if needed. +* If the mapping is successfull, then we have just created a new node to be appended to the current end of the heap. Once this is done we can proceed with the split if needed. And with that we're just written a fairly complete heap allocator. A final note: in these examples we're not zeroing the memory returned by the heap, which languages like C++ may expect when `new` and `delete` operators are used. This can lead to non-deterministic bugs where objects may be initialized with left over values from previous allocations (if the memory has been used before), and suddenly default construction is not doing what is expected. -Doing a `memset()` on each block of memory returned does cost cpu time, so its a trade off, a decision to be made for your specific implementation. +Doing a `memset()` on each block of memory returned does cost cpu time, so it's a trade off, a decision to be made for your specific implementation. diff --git a/04_Memory_Management/README.md b/04_Memory_Management/README.md index 4caaf54b..6881b822 100644 --- a/04_Memory_Management/README.md +++ b/04_Memory_Management/README.md @@ -4,7 +4,7 @@ This part will cover all the topic related on how to build a memory management m Below the list of chapters: -* [Overview](01_Overview.md) It introduces the basic concepts of memory management, and provide an high level overview of all the layers that are part of it. +* [Overview](01_Overview.md) It introduces the basic concepts of memory management, and provide a high level overview of all the layers that are part of it. * [Physical Memory Manager](02_Physical_Memory.md) The lowest layer, the physical memory manager, it deals with "real memory". * [Paging](03_Paging.md) Paging will provide a separation between a physical memory address and a virtual address. This mean that the kernel we will be able to access much more addresses than the ones available. * [Virtual Memory Manager](04_Virtual_Memory_Manager.md) It sits between the heap and the physical memory manager, it is similar to the Physical Memory Manager, but for the virtual space. diff --git a/99_Appendices/I_Acknowledgments.md b/99_Appendices/I_Acknowledgments.md index 475d3d34..ec4ca821 100644 --- a/99_Appendices/I_Acknowledgments.md +++ b/99_Appendices/I_Acknowledgments.md @@ -19,3 +19,4 @@ In no particular order: - @Moldytzu ([https://github.com/Moldytzu](https://github.com/Moldytzu)) - @AnErrupTion ([https://github.com/AnErrupTion](https://github.com/AnErrupTion)) - @MRRcode979 ([https://github.com/MRRcode979](https://github.com/MRRcode979)) +- @Hqnnqh ([https://github.com/Hqnnqh](https://github.com/Hqnnqh))