Skip to content

Conversation

@maxzhen
Copy link
Collaborator

@maxzhen maxzhen commented Sep 29, 2025

In virtio environment, root user in the VM is not allowed to set global configuration on the device. So, force preemption cannot be enabled by the test cases. These test cases can only be run in host.

Signed-off-by: Max Zhen <max.zhen@amd.com>
@maxzhen maxzhen requested a review from NishadSaraf September 29, 2025 22:52
@maxzhen maxzhen merged commit 7c0fc72 into amd:main Sep 29, 2025
1 check passed
@maxzhen maxzhen deleted the preempt branch October 1, 2025 22:45
xdavidz added a commit that referenced this pull request Oct 9, 2025
* Unify DPT (Debug/Profile/Trace) firmware debug across generations (#733)

- Introduce a single DPT-based infrastructure for firmware
  debug/profile/trace across current and future devices. Remove legacy
  event-trace/DRAM logging to cut redundancy.
- Improve ring handling by extending 32-bit FW pointers to 64-bit
  in-driver, making wrap and tail tracking robust and transparent
- Enable firmware logging by default at ERROR level. Simplify
  usage via a boolean debugfs node (dump_fw_log) to toggle printing
  to dmesg.
- Provide module params for advanced control (fw_log_level, fw_log_size,
  poll_fw_log).
- Establish common management-DMA helpers for buffer allocation/handling.
  More consolidation to follow.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Fix coverity use after free (#744)

Fix coverity use after free.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* shim changes for dump log pr (#738)


Signed-off-by: advanaik <advanaik@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Co-authored-by: advanaik <advanaik@amd.com>

* VTD submodule removal (#745)

* VTD submod removal

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* subdir removal

Signed-off-by: Akshay Tondak <aktondak@amd.com>

---------

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* Fix hangs while quering HW context report (#741)

Fix hangs while quering HW context report on platforms that do not
support app health.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* XRT, VTD submodule update and shim changes (#737)

* XRT, VTD submodule update and shim changes

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* VTD update

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* use last firmware for verbose in-memory log (#739)

Signed-off-by: David Zhang <yidong.zhang@amd.com>

* Unify DPT (Debug/Profile/Trace) firmware debug across generations (#733)

- Introduce a single DPT-based infrastructure for firmware
  debug/profile/trace across current and future devices. Remove legacy
  event-trace/DRAM logging to cut redundancy.
- Improve ring handling by extending 32-bit FW pointers to 64-bit
  in-driver, making wrap and tail tracking robust and transparent
- Enable firmware logging by default at ERROR level. Simplify
  usage via a boolean debugfs node (dump_fw_log) to toggle printing
  to dmesg.
- Provide module params for advanced control (fw_log_level, fw_log_size,
  poll_fw_log).
- Establish common management-DMA helpers for buffer allocation/handling.
  More consolidation to follow.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Fix coverity use after free (#744)

Fix coverity use after free.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* shim changes for dump log pr (#738)


Signed-off-by: advanaik <advanaik@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Co-authored-by: advanaik <advanaik@amd.com>

* VTD submodule removal (#745)

* VTD submod removal

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* subdir removal

Signed-off-by: Akshay Tondak <aktondak@amd.com>

---------

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* Report addition

Signed-off-by: Akshay Tondak <aktondak@amd.com>

---------

Signed-off-by: Akshay Tondak <aktondak@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Signed-off-by: Nishad Saraf <nishads@amd.com>
Signed-off-by: advanaik <advanaik@amd.com>
Co-authored-by: David Zhang <50243230+xdavidz@users.noreply.github.com>
Co-authored-by: Nishad Saraf <nishads@amd.com>
Co-authored-by: advanaik <advanaik@amd.com>

* fix ubuf failure when iommu_mode=1 (#742)

Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>

* Validate changes required for telluride (#736)

Signed-off-by: Manoj Takasi <mtakasi@amd.com>

* Added proper cleanup function for imported bos (#743)

Signed-off-by: Manoj Takasi <mtakasi@amd.com>

* General housekeeping (#749)

General housekeeping.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* App health test wait forever and expect TDR (#751)

App health test wait forever and expect TDR.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Make FW log parser device specific (#752)

FW log buffer format may vary based on the device generation. Make the
parser logic device specific.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* add dbg bo sync (#753)

Signed-off-by: Max Zhen <max.zhen@amd.com>

* Added bo export support in VE2 (#757)

Signed-off-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>
Co-authored-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>

* Update XRT package to 202520.2.20.152 (#756)

* Add <iostream> header when it is required

Due to XRT package update, <iostream> include is removed from some
XRT header files, we should not rely on XRT header to include <iostream>
whenever it is required, we include <iostream>

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* xrt: update xrt version to 202520.2.20.152

Update XRT package version to 202520.2.20.152

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

---------

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* Add xrt_test option for vf concurrency test (#732)

Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>

* XDNA driver cache last async error and provide ioctl to enable user to get the async error (#740)

* amdxdna: aie2_smu: remove busy wait until SMU_RESP_REG before submit commands

As the aie2_smu_exec() function is the only funciton in XDNA driver to submit
SMU commands and wait until it has finished. And the function does lock around
the SMU registers access. And the NPU SMU is only used by xdna. Removed the
need to poll the SMU_RESP_REG until it is cleared before commiting new SMU command.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* xdna: driver: cache last async error and add ioctl to get last async error

Cache the last async error received from device, and implement ioctl to
returns the get last async error with the encoded error code defined in XRT
layer and the timestamp in micro seconds on when driver received the event.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* shim: implemnt xocl_errors query to get last async error

Implement xocl_errors query to get the last async error from device through
XDNA driver ioctl.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* test: shim: add async error verification

Add async error verification to test async event generated
from hardware, and we can use get array ioctl to get the last
async error.

as phx firmware behaves differently on when to clear the async
errors, limit the test to npu4.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

---------

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* Change print to debug level (#758)

Change print to debug level.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* driver: amdxdna: error: get last error add missing unlock (#760)

Add missing unlock when there is no cached error.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* moving to latest xrt for import/export bo issues across xdna & zocl (#761)

Co-authored-by: Ch Vamshi Krishna <chvamshi@xcochvamshi40x.xlnx.xilinx.com>

* Create support for FLR (#746)

Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>

* switch to latest umq version (#759)

Signed-off-by: David Zhang <yidong.zhang@amd.com>

* Updated bo allocation with AMDXDNA_BO_SHARE type (#763)

Signed-off-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>
Co-authored-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>

* Fix return value for aie-partitions query (#766)

Fix return value for aie-partitions query.


(cherry picked from commit 86e76e6)

Signed-off-by: Nishad Saraf <nishads@amd.com>

* create cma bo with carvedout fashion for ve2 (#748)

Signed-off-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>
Co-authored-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>

* [XRT-SMI] Archive migration (#764)

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* fix uninitialized varible (#769)

Signed-off-by: Max Zhen <max.zhen@amd.com>

* Removing redundant files (#768)

Signed-off-by: Akshay Tondak <aktondak@amd.com>

* Remove device specific mgmt buffer APIs (#771)

Remove device specific mgmt buffer APIs.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* test: shim: add read async error multi times in multi threads (#772)

Add test case to read async errors multiple times in multiple
threads.

To test async error read when there is no errors, we will need
to run the test after xdna module is probed and before any runs
launched on the hardware.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* Updated xdna_bo.cpp to maintain bo ref count if we creae on the same process (#774)

Signed-off-by: Manoj Takasi <mtakasi@amd.com>

* Fix uninitialized app health report pointer (#775)

Fix uninitialized app health report pointer.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Free buffer on failure and minor fixes (#776)

Free buffer on failure and minor fixes.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* test: shim: add instruction code invalid address access test (#773)

Add test case to have instruction code access invalid address
we should expect timeout and then, if we start a good run, the
good run should finish properly.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* Telluride opensrc (#770)

* Adding temporal_sharing code and other latest chenges into xdna repo

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed the review comments

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed the review comments

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixing the codeing style issue

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2_mgmt.c file

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2_mgmt.c file 1

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2_hwctx.c file v1

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2  file v1

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2  file v1

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2  file v2

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed code style issue for ve2  file v3

Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Fixed coding style issues

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed issues after amdxdna_cma memory allocation introduced

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed review commets

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Fixed one codingsty issue

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Just a dummy changes in amdxdna files to force to build the driver again

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Remove volatile keyword from ve2 driver

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

* Removed another codingsty_check

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>

---------

Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>
Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>

* Properly support exporting and importing BO in the same process (#778)

Signed-off-by: Max Zhen <max.zhen@amd.com>

* fixing qos property in ve2 (#779)

Co-authored-by: Ch Vamshi Krishna <chvamshi@xcochvamshi40x.xlnx.xilinx.com>

* disable force preemption test in virtio environment (#782)

Signed-off-by: Max Zhen <max.zhen@amd.com>

* Add debug prints and fix typo (#784)

Add debug prints and fix typo.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Update get async error ioctl to return only 0 for success case (#783)

* amdxdna: get async error returns only 0 for success

This patch has two changes:
* returns 0 only in when get async error success
* restruct aie2 get array ioctl implementation to split get async
  error information implementation and get other hardware context
  information implementation.
* remove aie2_error get async error implementation as it is just
  a very thin wrapper which is not necessary.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* shim: specify only one elment to get async error information

As there is only last element in the information array for getting
the last async error. Change the number of element to 1.

---------

Signed-off-by: Wendy Liang <wendy.liang@amd.com>

* Add debugfs node to dump raw fw log buffer (#785)

Add debugfs node to dump raw fw log buffer.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Fix NULL pointer dereference for invalid sequence number (#789)

Fix NULL pointer dereference for invalid sequence number.

Signed-off-by: Nishad Saraf <nishads@amd.com>

* Fix SMU power off issue (#787)

* Fix SMU power off issue

* Fix SMU power off issue

* Fix SMU power off issue

* Revert "Fix SMU power off issue"

This reverts commit af1867a.

* Fix SMU power off issue

* update testcase timeout value and check (#777)

* timeout value change; compare and dump ofm after timeout

Signed-off-by: advanaik <advanaik@amd.com>

* dump ofm to file after timeout

Signed-off-by: advanaik <advanaik@amd.com>

* yolov3 host code

Signed-off-by: advanaik <advanaik@amd.com>

---------

Signed-off-by: advanaik <advanaik@amd.com>

* dophine pass

Signed-off-by: David Zhang <yidong.zhang@amd.com>

* pasid fix

Signed-off-by: David Zhang <yidong.zhang@amd.com>

---------

Signed-off-by: Nishad Saraf <nishads@amd.com>
Signed-off-by: advanaik <advanaik@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Signed-off-by: Akshay Tondak <aktondak@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Manoj Takasi <mtakasi@amd.com>
Signed-off-by: Max Zhen <max.zhen@amd.com>
Signed-off-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>
Signed-off-by: Wendy Liang <wendy.liang@amd.com>
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Signed-off-by: Saifuddin Kaijar <saifuddin.kaijar@amd.com>
Signed-off-by: Kaijar, Saifuddin <saifuddin.kaijar@amd.com>
Co-authored-by: Nishad Saraf <nishads@amd.com>
Co-authored-by: advanaik <advanaik@amd.com>
Co-authored-by: Akshay Tondak <aktondak@amd.com>
Co-authored-by: Lizhi Hou <36547078+houlz0507@users.noreply.github.com>
Co-authored-by: Manoj Takasi <133196374+ManojTakasi@users.noreply.github.com>
Co-authored-by: Max Zhen <40219623+maxzhen@users.noreply.github.com>
Co-authored-by: Bikash Singha <138746529+bisingha-xilinx@users.noreply.github.com>
Co-authored-by: Bikash Singha <bisingha@xcobisingha40x.xlnx.xilinx.com>
Co-authored-by: Wendy Liang <wendy.liang@amd.com>
Co-authored-by: Hayden Laccabue <hlaccabu@amd.com>
Co-authored-by: Ch Vamshi Krishna <40261882+chvamshi-xilinx@users.noreply.github.com>
Co-authored-by: Ch Vamshi Krishna <chvamshi@xcochvamshi40x.xlnx.xilinx.com>
Co-authored-by: Saifuddin Kaijar <54270703+saifuddin-xilinx@users.noreply.github.com>
Co-authored-by: amd-kirkirov <kirkirov@amd.com>
Co-authored-by: AdvaitNaik <65186453+AdvaitNaik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants