Skip to content

accel/amdxdna: Support BO usage query#1197

Merged
maxzhen merged 2 commits intoamd:mainfrom
maxzhen:bo
Mar 20, 2026
Merged

accel/amdxdna: Support BO usage query#1197
maxzhen merged 2 commits intoamd:mainfrom
maxzhen:bo

Conversation

@maxzhen
Copy link
Copy Markdown
Collaborator

@maxzhen maxzhen commented Mar 19, 2026

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for querying per-process buffer-object (BO) memory usage via the amdxdna GET_ARRAY UAPI, and wires up driver-side accounting to report heap/total/internal usage.

Changes:

  • Introduces a new UAPI query (DRM_AMDXDNA_BO_USAGE) and struct (amdxdna_drm_bo_usage) for BO usage reporting.
  • Adds per-client BO usage counters and updates GEM open/close/heap alloc/free paths to maintain those counters.
  • Hooks the new query into the AIE2 get_array implementation and adjusts one shim test’s driver-version gating.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/shim_test/shim_test.cpp Updates the “create and free internal bo” test’s driver version gating.
include/uapi/drm/amdxdna_accel.h Adds UAPI struct for BO usage plus a new GET_ARRAY param ID and documentation.
drivers/accel/amdxdna/amdxdna_pci_drv.h Adds per-client usage counters used for BO usage aggregation.
drivers/accel/amdxdna/amdxdna_pci_drv.c Adjusts client cleanup ordering around mm_lock destruction and heap release.
drivers/accel/amdxdna/amdxdna_gem.h Extends GEM object state for usage tracking (open_ref, internal) and declares usage query helper.
drivers/accel/amdxdna/amdxdna_gem.c Implements accounting updates and the BO usage GET_ARRAY handler.
drivers/accel/amdxdna/aie2_pci.c Wires new BO usage query into aie2_get_array and includes heap usage in hwctx status.
Comments suppressed due to low confidence (1)

drivers/accel/amdxdna/amdxdna_gem.c:599

  • amdxdna_gem_obj_open() increments open_ref and may add BO usage before amdxdna_iommu_map_bo(). If mapping fails, the function returns without rolling back open_ref/client attachment and without undoing the usage update, leaving inconsistent state and leaked accounting. Roll back the refcount/usage (and client pointer if this was the first open) on error before returning.
	guard(mutex)(&abo->lock);
	abo->open_ref++;

	if (abo->open_ref == 1) {
		/* Attached to the client when first opened by it. */
		abo->client = filp->driver_priv;
		amdxdna_gem_add_bo_usage(abo);
	}
	if (amdxdna_iova_on(xdna)) {
		ret = amdxdna_iommu_map_bo(xdna, abo);
		if (ret)
			return ret;
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1069 to +1076
if (args->num_element != 1)
return -EINVAL;

if (copy_from_user(&tmp, buf, min_sz))
return -EFAULT;

if (!tmp.pid)
return -EINVAL;
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdxdna_drm_get_bo_usage() requires args->num_element == 1 and doesn't update args->element_size/num_element on return, unlike other DRM_AMDXDNA_GET_ARRAY handlers (e.g., async error sets both). Consider accepting num_element >= 1, setting args->num_element = 1, and setting args->element_size to the actual element size returned. Also validate element_size is at least sizeof(pid) before reading tmp.pid, otherwise tmp.pid may be partially uninitialized when element_size < sizeof(__s64).

Copilot uses AI. Check for mistakes.
* Returns last async error.
*
* %DRM_AMDXDNA_BO_USAGE:
* Returns usage of heap/internal/external BOs.
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation says "%DRM_AMDXDNA_BO_USAGE: Returns usage of heap/internal/external BOs", but struct amdxdna_drm_bo_usage doesn't include an explicit external_usage field. Either add the missing field or adjust the wording to clarify that external usage is derived (e.g., total_usage - internal_usage) rather than returned directly.

Suggested change
* Returns usage of heap/internal/external BOs.
* Returns usage of heap and internal BOs. External usage can be
* derived from the returned data (for example, total_usage -
* internal_usage).

Copilot uses AI. Check for mistakes.
Comment thread include/uapi/drm/amdxdna_accel.h Outdated
* AMDXDNA_BO_CMD - is counted for internal.
* AMDXDNA_BO_DEV - is counted by heap_usage only, not internal
* or external. It does not add to the total memory
* foot print since its mem comes from heap which is
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in UAPI comment: "foot print" should be "footprint".

Suggested change
* foot print since its mem comes from heap which is
* footprint since its mem comes from heap which is

Copilot uses AI. Check for mistakes.
Comment on lines +543 to +553
static void
amdxdna_gem_del_bo_usage(struct amdxdna_gem_obj *abo)
{
struct amdxdna_client *client = abo->client;

guard(mutex)(&client->mm_lock);

client->total_bo_usage -= abo->mem.size;
if (abo->internal)
client->total_int_bo_usage -= abo->mem.size;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdxdna_gem_del_bo_usage() unconditionally subtracts abo->mem.size, but amdxdna_gem_add_bo_usage() skips imported BOs (and AMDXDNA_BO_DEV). For imported BOs this will underflow total_bo_usage/total_int_bo_usage because they were never incremented. Make del follow the same exclusion rules as add (and/or only call del if the BO was actually accounted).

Copilot uses AI. Check for mistakes.
Comment thread drivers/accel/amdxdna/amdxdna_gem.c Outdated
mutex_unlock(&tmp_client->mm_lock);
}

return copy_to_user(buf, &tmp, min_sz);
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdxdna_drm_get_bo_usage() returns copy_to_user()'s raw return value. copy_to_user returns the number of bytes not copied (>=0), so partial copies would surface as a positive ioctl return instead of -EFAULT. Convert nonzero copy_to_user() to -EFAULT and return 0 on success (like other ioctls here).

Suggested change
return copy_to_user(buf, &tmp, min_sz);
if (copy_to_user(buf, &tmp, min_sz))
return -EFAULT;
return 0;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 595 to +623
@@ -562,6 +609,20 @@ static int amdxdna_gem_obj_open(struct drm_gem_object *gobj, struct drm_file *fi
return 0;
}

static void amdxdna_gem_obj_close(struct drm_gem_object *gobj, struct drm_file *filp)
{
struct amdxdna_gem_obj *abo = to_xdna_obj(gobj);

guard(mutex)(&abo->lock);
abo->open_ref--;

if (abo->open_ref == 0) {
amdxdna_gem_del_bo_usage(abo);
/* Detach from the client when last closed by it. */
abo->client = NULL;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BO usage accounting is tied to GEM handle open/close (open_ref) and is decremented when the last handle is closed. A GEM object can remain alive without any handles (e.g., after PRIME export where the dma-buf keeps the object referenced), which will cause total_bo_usage/total_int_bo_usage to under-report while memory is still allocated. Consider accounting based on object lifetime (create/free) or otherwise handling PRIME-exported references so usage stays accurate until the GEM object is actually freed.

Copilot uses AI. Check for mistakes.
Comment on lines 870 to +873
tmp->command_submissions = hwctx->priv->seq;
tmp->command_completions = hwctx->priv->completed;
tmp->pasid = hwctx->client->pasid;
tmp->heap_usage = hwctx->client->heap_usage;
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp->heap_usage = hwctx->client->heap_usage; reads heap_usage without holding client->mm_lock, while updates happen under that mutex. This can race (and can be a torn read on 32-bit). Consider taking mm_lock around the read or using READ_ONCE() with corresponding WRITE_ONCE()/atomic updates.

Copilot uses AI. Check for mistakes.
Comment on lines 40 to +46
* 0.5: Support getting telemetry data
* 0.6: Support preemption
* 0.7: Support getting power and utilization data
* 0.8: Support BO usage query
*/
#define AMDXDNA_DRIVER_MAJOR 0
#define AMDXDNA_DRIVER_MINOR 7
#define AMDXDNA_DRIVER_MINOR 8
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver version history comment lists BO usage query as "0.7", but the minor version is being bumped to 8. Update the history entry to "0.8" to match the actual version change.

Copilot uses AI. Check for mistakes.
TEST_POSITIVE, dev_filter_is_aie4_or_npu4, TEST_io_gemm, {}
},
test_case{ "create and free internal bo", {~0U, ~0U},
test_case{ "create and free internal bo", {},
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case’s drv_ver was changed from {~0U, ~0U} to {} (0.0), which makes it treated as expected-to-pass for any driver version when -k is used. Since this PR bumps the driver minor to 0.8 for BO usage support, consider setting drv_ver to {0, 8} so older drivers are still treated as not expected to pass.

Suggested change
test_case{ "create and free internal bo", {},
test_case{ "create and free internal bo", {0, 8},

Copilot uses AI. Check for mistakes.
Comment on lines +599 to +622
/**
* struct amdxdna_drm_bo_usage - all types of BO usage
* BOs managed by XRT/SHIM/driver is counted as internal.
* Others are counted as external which are managed by applications.
*
* Among all types of BOs:
* AMDXDNA_BO_DEV_HEAP - is counted for internal.
* AMDXDNA_BO_SHARE - is counted for external.
* AMDXDNA_BO_CMD - is counted for internal.
* AMDXDNA_BO_DEV - is counted by heap_usage only, not internal
* or external. It does not add to the total memory
* footprint since its mem comes from heap which is
* already counted as internal.
*/
struct amdxdna_drm_bo_usage {
/** @pid: The ID of the process to query from. */
__s64 pid;
/** @total_usage: Total BO size used by process. */
__u64 total_usage;
/** @internal_usage: Total internal BO size used by process. */
__u64 internal_usage;
/** @heap_usage: Total device BO size used by process. */
__u64 heap_usage;
};
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UAPI comment says external BOs are tracked and that %DRM_AMDXDNA_BO_USAGE returns usage of heap/internal/external BOs, but struct amdxdna_drm_bo_usage doesn’t include an external_usage field. Either clarify in the comment that external usage is derivable (e.g., total_usage - internal_usage), or add an explicit field so the API matches the documentation. Also, minor grammar nit: "BOs ... is counted" -> "are counted".

Copilot uses AI. Check for mistakes.
maxzhen added 2 commits March 19, 2026 19:42
Add support for querying per-process buffer object (BO) memory
usage through the amdxdna GET_ARRAY UAPI.

Introduce a new query type, DRM_AMDXDNA_BO_USAGE, along with
struct amdxdna_drm_bo_usage to report BO memory usage statistics,
including heap, total, and internal usage.

Track BO memory usage on a per-client basis by maintaining counters
in GEM open/close and heap allocation/free paths. This ensures the
reported statistics reflect the current memory footprint of each
process.

Wire the new query into the GET_ARRAY implementation to expose
the usage information to userspace.

Signed-off-by: Max Zhen <max.zhen@amd.com>
Signed-off-by: Max Zhen <max.zhen@amd.com>
Copilot AI review requested due to automatic review settings March 20, 2026 02:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1081 to +1083
if (args->num_element != 1)
return -EINVAL;

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET_ARRAY ioctls treat element_size/num_element as in/out and typically return the actual values (e.g., other handlers set args->num_element and args->element_size). amdxdna_drm_get_bo_usage() currently leaves args->{element_size,num_element} unchanged; please set them to the returned element size (sizeof(struct amdxdna_drm_bo_usage)) and 1.

Copilot uses AI. Check for mistakes.
Comment on lines +605 to 609
amdxdna_gem_add_bo_usage(abo);
}
if (amdxdna_iova_on(xdna)) {
ret = amdxdna_iommu_map_bo(xdna, abo);
if (ret)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdxdna_gem_obj_open() updates BO usage/open_ref before the IOMMU map, but returns immediately on amdxdna_iommu_map_bo() failure. That can leave usage counters/open_ref/client attachment in an inconsistent state if the open/handle-create path fails. Please add rollback on the error path or reorder so accounting happens only after the map succeeds.

Copilot uses AI. Check for mistakes.
char __user *buf = u64_to_user_ptr(args->buffer);
struct amdxdna_dev *xdna = to_xdna_dev(dev);
struct amdxdna_client *tmp_client;
struct amdxdna_drm_bo_usage tmp;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp is an uninitialized stack struct and min(args->element_size, sizeof(...)) allows userspace to provide an element_size smaller than sizeof(tmp.pid). In that case copy_from_user() won’t fully initialize tmp.pid, and the later pid checks/comparisons can use uninitialized bytes. Please zero-init tmp and/or require args->element_size >= sizeof(tmp.pid) before reading pid.

Suggested change
struct amdxdna_drm_bo_usage tmp;
struct amdxdna_drm_bo_usage tmp = { 0 };

Copilot uses AI. Check for mistakes.
@maxzhen
Copy link
Copy Markdown
Collaborator Author

maxzhen commented Mar 20, 2026

retest this please

@maxzhen
Copy link
Copy Markdown
Collaborator Author

maxzhen commented Mar 20, 2026

retest this please

@maxzhen maxzhen merged commit ea62ff8 into amd:main Mar 20, 2026
5 checks passed
@maxzhen maxzhen deleted the bo branch March 23, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants