Skip to content

Commit

Permalink
vfio: do not merge contiguous areas
Browse files Browse the repository at this point in the history
[ upstream commit 016763c ]

In order to save DMA entries limited by kernel both for external
memory and hugepage memory, an attempt was made to map physically
contiguous memory in one go. This cannot be done as VFIO IOMMU type1
does not support partially unmapping a previously mapped memory
region while Heap can request for multi page mapping and
partial unmapping.
Hence for going back to old method of mapping/unmapping at
memseg granularity, this commit reverts
commit d1c7c0c ("vfio: map contiguous areas in one go")

Also add documentation on what module parameter needs to be used
to increase the per-container dma map limit for VFIO.

Fixes: d1c7c0c ("vfio: map contiguous areas in one go")

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
  • Loading branch information
nithind1988 authored and cpaelzer committed May 11, 2021
1 parent 78bc278 commit b92fd56
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 51 deletions.
10 changes: 10 additions & 0 deletions doc/guides/linux_gsg/linux_drivers.rst
Expand Up @@ -72,6 +72,16 @@ Note that in order to use VFIO, your kernel must support it.
VFIO kernel modules have been included in the Linux kernel since version 3.6.0 and are usually present by default,
however please consult your distributions documentation to make sure that is the case.

For DMA mapping of either external memory or hugepages, VFIO interface is used.
VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is
mapped in hugepage granularity or system page granularity. Number of DMA
mappings is limited by kernel with user locked memory limit of a process (rlimit)
for system/hugepage memory. Another per-container overall limit applicable both
for external memory and system memory was added in kernel 5.1 defined by
VFIO module parameter ``dma_entry_limit`` with a default value of 64K.
When application is out of DMA entries, these limits need to be adjusted to
increase the allowed limit.

Also, to use VFIO, both kernel and BIOS must support and be configured to use IO virtualization (such as Intel® VT-d).

.. note::
Expand Down
59 changes: 8 additions & 51 deletions lib/librte_eal/linux/eal/eal_vfio.c
Expand Up @@ -514,11 +514,9 @@ static void
vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
void *arg __rte_unused)
{
rte_iova_t iova_start, iova_expected;
struct rte_memseg_list *msl;
struct rte_memseg *ms;
size_t cur_len = 0;
uint64_t va_start;

msl = rte_mem_virt2memseg_list(addr);

Expand Down Expand Up @@ -547,63 +545,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
#endif
/* memsegs are contiguous in memory */
ms = rte_mem_virt2memseg(addr, msl);

/*
* This memory is not guaranteed to be contiguous, but it still could
* be, or it could have some small contiguous chunks. Since the number
* of VFIO mappings is limited, and VFIO appears to not concatenate
* adjacent mappings, we have to do this ourselves.
*
* So, find contiguous chunks, then map them.
*/
va_start = ms->addr_64;
iova_start = iova_expected = ms->iova;
while (cur_len < len) {
bool new_contig_area = ms->iova != iova_expected;
bool last_seg = (len - cur_len) == ms->len;
bool skip_last = false;

/* only do mappings when current contiguous area ends */
if (new_contig_area) {
if (type == RTE_MEM_EVENT_ALLOC)
vfio_dma_mem_map(default_vfio_cfg, va_start,
iova_start,
iova_expected - iova_start, 1);
else
vfio_dma_mem_map(default_vfio_cfg, va_start,
iova_start,
iova_expected - iova_start, 0);
va_start = ms->addr_64;
iova_start = ms->iova;
}
/* some memory segments may have invalid IOVA */
if (ms->iova == RTE_BAD_IOVA) {
RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
ms->addr);
skip_last = true;
goto next;
}
iova_expected = ms->iova + ms->len;
if (type == RTE_MEM_EVENT_ALLOC)
vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
ms->iova, ms->len, 1);
else
vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
ms->iova, ms->len, 0);
next:
cur_len += ms->len;
++ms;

/*
* don't count previous segment, and don't attempt to
* dereference a potentially invalid pointer.
*/
if (skip_last && !last_seg) {
iova_expected = iova_start = ms->iova;
va_start = ms->addr_64;
} else if (!skip_last && last_seg) {
/* this is the last segment and we're not skipping */
if (type == RTE_MEM_EVENT_ALLOC)
vfio_dma_mem_map(default_vfio_cfg, va_start,
iova_start,
iova_expected - iova_start, 1);
else
vfio_dma_mem_map(default_vfio_cfg, va_start,
iova_start,
iova_expected - iova_start, 0);
}
}
#ifdef RTE_ARCH_PPC_64
cur_len = 0;
Expand Down

0 comments on commit b92fd56

Please sign in to comment.