Skip to content

Commit

Permalink
Merge tag 'misc-habanalabs-next-2021-08-19' of https://git.kernel.org…
Browse files Browse the repository at this point in the history
…/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.15:

- Add a new uAPI (under the memory ioctl) to request from the driver
  to export a DMA-BUF object that represents a memory region on
  the device's DRAM. This is needed to enable peer-to-peer over PCIe
  between habana device and an RDMA adapter (e.g. mlnx5 adapter).

- Add a new uAPI (under the cs ioctl) to enable to user to reserve
  signals and signal them from within its workloads, while the driver
  performs the waiting. This allows finer granularity of pipelining
  between the different engines and resource utilization.

- Add a new uAPI (under the wait_for_cs ioctl) to allow waiting
  on multiple command submissions (workloads) at the same time. This
  is an optimization for the user process so it won't need to call
  multiple times to the wait_for_cs ioctl.

- Add new feature of "state dump", which can be triggered through new
  debugfs node. This is a similar concept to the kernel panic dump.
  This new mechanism retrieves information from the device in case
  one of the workloads that was sent by the user got stuck. This is
  very helpful for debugging the hang.

- Add a new debugfs node to perform lookup of user pointers that are
  mapped to habana device's pmmu.

- Fix to the tracking of user process when running inside a container.

- Allow user to map more than 4GB of memory to the device MMU in single
  IOCTL call.

- Minimize number of register reads done in GAUDI during user operation.

- Allow user to retrieve the device's server type that the device is
  connected to.

- Several fixes to the code of waiting on interrupts on behalf of the
  user.

- Fixes and improvements to the hint mechanism in our VA allocation.

- Update the firmware header files to the latest version while
  maintaining backward compatibility with older firmware versions.

- Multiple fixes to various bugs.

* tag 'misc-habanalabs-next-2021-08-19' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (55 commits)
  habanalabs/gaudi: invalidate PMMU mem cache on init
  habanalabs/gaudi: size should be printed in decimal
  habanalabs/gaudi: define DC POWER for secured PMC
  habanalabs/gaudi: unmask out of bounds SLM access interrupt
  habanalabs: add userptr_lookup node in debugfs
  habanalabs/gaudi: fetch TPC/MME ECC errors from F/W
  habanalabs: modify multi-CS to wait on stream masters
  habanalabs/gaudi: add monitored SOBs to state dump
  habanalabs/gaudi: restore user registers when context opens
  habanalabs/gaudi: increase boot fit timeout
  habanalabs: update to latest firmware headers
  habanalabs/gaudi: minimize number of register reads
  habanalabs: fix mmu node address resolution in debugfs
  habanalabs: save pid per userptr
  habanalabs/gaudi: move scrubbing to late init
  habanalabs/gaudi: scrub HBM to a specific value
  habanalabs: add validity check for event ID received from F/W
  habanalabs: clear msg_to_cpu_reg to avoid misread after reset
  habanalabs: make set_pci_regions asic function
  habanalabs: add support for dma-buf exporter
  ...
  • Loading branch information
gregkh committed Aug 19, 2021
2 parents b215918 + a3f369d commit 1bd35e7
Show file tree
Hide file tree
Showing 27 changed files with 4,370 additions and 759 deletions.
19 changes: 19 additions & 0 deletions Documentation/ABI/testing/debugfs-driver-habanalabs
Expand Up @@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of
"0" means device will be reset in case some CS has timed out,
otherwise it will not be reset.

What: /sys/kernel/debug/habanalabs/hl<n>/state_dump
Date: Oct 2021
KernelVersion: 5.15
Contact: ynudelman@habana.ai
Description: Gets the state dump occurring on a CS timeout or failure.
State dump is used for debug and is created each time in case of
a problem in a CS execution, before reset.
Reading from the node returns the newest state dump available.
Writing an integer X discards X state dumps, so that the
next read would return X+1-st newest state dump.

What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
Date: Mar 2020
KernelVersion: 5.6
Expand All @@ -230,6 +241,14 @@ Description: Displays a list with information about the currently user
pointers (user virtual addresses) that are pinned and mapped
to DMA addresses

What: /sys/kernel/debug/habanalabs/hl<n>/userptr_lookup
Date: Aug 2021
KernelVersion: 5.15
Contact: ogabbay@kernel.org
Description: Allows to search for specific user pointers (user virtual
addresses) that are pinned and mapped to DMA addresses, and see
their resolution to the specific dma address.

What: /sys/kernel/debug/habanalabs/hl<n>/vm
Date: Jan 2019
KernelVersion: 5.1
Expand Down
1 change: 1 addition & 0 deletions drivers/misc/habanalabs/Kconfig
Expand Up @@ -8,6 +8,7 @@ config HABANA_AI
depends on PCI && HAS_IOMEM
select GENERIC_ALLOCATOR
select HWMON
select DMA_SHARED_BUFFER
help
Enables PCIe card driver for Habana's AI Processors (AIP) that are
designed to accelerate Deep Learning inference and training workloads.
Expand Down
3 changes: 2 additions & 1 deletion drivers/misc/habanalabs/common/Makefile
Expand Up @@ -10,4 +10,5 @@ HL_COMMON_FILES := common/habanalabs_drv.o common/device.o common/context.o \
common/asid.o common/habanalabs_ioctl.o \
common/command_buffer.o common/hw_queue.o common/irq.o \
common/sysfs.o common/hwmon.o common/memory.o \
common/command_submission.o common/firmware_if.o
common/command_submission.o common/firmware_if.o \
common/state_dump.o
2 changes: 1 addition & 1 deletion drivers/misc/habanalabs/common/command_buffer.c
Expand Up @@ -552,7 +552,7 @@ int hl_cb_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)

vma->vm_private_data = cb;

rc = hdev->asic_funcs->cb_mmap(hdev, vma, cb->kernel_address,
rc = hdev->asic_funcs->mmap(hdev, vma, cb->kernel_address,
cb->bus_address, cb->size);
if (rc) {
spin_lock(&cb->lock);
Expand Down

0 comments on commit 1bd35e7

Please sign in to comment.