Skip to content

Commit

Permalink
drm/ttm: new TT backend allocation pool
Browse files Browse the repository at this point in the history
This replaces the spaghetti code in the two existing page pools.

First of all depending on the allocation size it is between 3 (1GiB) and
5 (1MiB) times faster than the old implementation.

It makes better use of buddy pages to allow for larger physical contiguous
allocations which should result in better TLB utilization at least for amdgpu.

Instead of a completely braindead approach of filling the pool with one CPU
while another one is trying to shrink it we only give back freed pages.

This also results in much less locking contention and a trylock free MM
shrinker callback, so we can guarantee that pages are given back to the system
when needed.

Downside of this is that it takes longer for many small allocations until the
pool is filled up. We could address this, but I couldn't find an use case
where this actually matters. And we don't bother freeing large chunks of pages
any more.

The sysfs files are replaced with a single module parameter, allowing users to
override how many pages should be globally pooled in TTM. This unfortunately
breaks the UAPI slightly, but as far as we know nobody ever depended on this.

Zeroing memory coming from the pool was handled inconsistently. The
alloc_pages() based pool was zeroing it, the dma_alloc_attr() based one wasn't.
The new implementation isn't zeroing pages from the pool either and only sets
the __GFP_ZERO flag when necessary.

The implementation has only 753 lines of code compared to the over 2600 of the
old one, and also allows for saving quite a bunch of code in the drivers since
we don't need specialized handling there any more based on kernel config.

Additional to all of that there was a neat bug with IOMMU, coherent DMA
mappings and huge pages which is now fixed in the new code as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
  • Loading branch information
Christian König authored and intel-lab-lkp committed Oct 25, 2020
1 parent fa3eefc commit 618def1
Show file tree
Hide file tree
Showing 5 changed files with 759 additions and 1 deletion.
2 changes: 1 addition & 1 deletion drivers/gpu/drm/ttm/Makefile
Expand Up @@ -5,7 +5,7 @@
ttm-y := ttm_memory.o ttm_tt.o ttm_bo.o \
ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
ttm_execbuf_util.o ttm_page_alloc.o ttm_range_manager.o \
ttm_resource.o
ttm_resource.o ttm_pool.o
ttm-$(CONFIG_AGP) += ttm_agp_backend.o
ttm-$(CONFIG_DRM_TTM_DMA_PAGE_POOL) += ttm_page_alloc_dma.o

Expand Down
3 changes: 3 additions & 0 deletions drivers/gpu/drm/ttm/ttm_memory.c
Expand Up @@ -38,6 +38,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/swap.h>
#include <drm/ttm/ttm_pool.h>

#define TTM_MEMORY_ALLOC_RETRIES 4

Expand Down Expand Up @@ -453,6 +454,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
}
ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
ttm_pool_mgr_init(glob->zone_kernel->max_mem/(2*PAGE_SIZE));
return 0;
out_no_zone:
ttm_mem_global_release(glob);
Expand All @@ -467,6 +469,7 @@ void ttm_mem_global_release(struct ttm_mem_global *glob)
/* let the page allocator first stop the shrink work. */
ttm_page_alloc_fini();
ttm_dma_page_alloc_fini();
ttm_pool_mgr_fini();

flush_workqueue(glob->swap_queue);
destroy_workqueue(glob->swap_queue);
Expand Down

0 comments on commit 618def1

Please sign in to comment.