Skip to content

fixup! drm/asahi: Add the Asahi driver for Apple AGX GPUs#135

Closed
jannau wants to merge 562 commits intoAsahiLinux:gpu/rust-wipfrom
jannau:gpu-gem-bind-log-fix
Closed

fixup! drm/asahi: Add the Asahi driver for Apple AGX GPUs#135
jannau wants to merge 562 commits intoAsahiLinux:gpu/rust-wipfrom
jannau:gpu-gem-bind-log-fix

Conversation

@jannau
Copy link
Copy Markdown
Member

@jannau jannau commented Apr 11, 2023

No description provided.

ojeda and others added 30 commits March 5, 2023 23:16
Upgrade the Rust version from 1.62.0 to 1.66.0.

The overwhelming majority of the commit is about upgrading our `alloc`
fork to the new version from upstream [1]. License has not changed [2][3]
(there were changes in the `COPYRIGHT` file, but unrelated to `alloc`).

As in the previous version upgrades (done out of tree so far), upgrading
`alloc` requires checking that our small additions (`try_*`) still match
their original (non`-try_*`) versions.

With this version upgrade, the following unstable Rust features were
stabilized: `bench_black_box` (1.66.0), `const_ptr_offset_from` (1.65.0),
`core_ffi_c` (1.64.0) and `generic_associated_types` (1.65.0). Thus
remove them.

This also implies that only two unstable features remain allowed for
non-`rust/` code: `allocator_api` and `const_refs_to_cell`.

There are some new Clippy warnings that we are triggering (i.e.
introduced since 1.62.0), as well as a few others that were not
triggered before, thus allow them in this commit and clean up or
remove them as needed later on:

  - `borrow_deref_ref` (new in 1.63.0).
  - `explicit_auto_deref` (new in 1.64.0).
  - `bool_to_int_with_if` (new in 1.65.0).
  - `needless_borrow`.
  - `type_complexity`.
  - `unnecessary_cast` (allowed only on `CONFIG_ARM`).

Furthermore, `rustdoc` lint `broken_intra_doc_links` is triggering on
links pointing to `macro_export` `macro_rules` defined in the same
module (i.e. appearing in the crate root).

However, even if the warning appears, the link still gets generated
like in previous versions, thus it is a bit confusing. An issue has
been opened upstream [4], and it appears that the link still being
generated was a compatibility measure, thus we will need to adapt
to the new behavior (done in the next patch).

In addition, there is an added `#[const_trait]` attribute in
`RawDeviceId`, due to 1.66.0's PR "Require `#[const_trait]` on `Trait`
for `impl const Trait`") [5].

Finally, the `-Aunused-imports` was added for compiling `core`. This was
fixed upstream for 1.67.0 in PR "Fix warning when libcore is compiled
with no_fp_fmt_parse" [6], and prevented for the future for that `cfg`
in PR "core: ensure `no_fp_fmt_parse builds` are warning-free" [7].

Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Tested-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Alice Ferrazzi <alice.ferrazzi@miraclelinux.com>
Tested-by: Alice Ferrazzi <alice.ferrazzi@miraclelinux.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Tested-by: Neal Gompa <neal@gompa.dev>
Link: Rust-for-Linux#947
Link: https://github.com/rust-lang/rust/tree/1.66.0/library/alloc/src [1]
Link: https://github.com/rust-lang/rust/blob/1.66.0/library/alloc/Cargo.toml#L4 [2]
Link: https://github.com/rust-lang/rust/blob/1.66.0/COPYRIGHT [3]
Link: rust-lang/rust#106142 [4]
Link: rust-lang/rust#100982 [5]
Link: rust-lang/rust#105434 [6]
Link: rust-lang/rust#105811 [7]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Commit reference: 3dfc5eb

Signed-off-by: Asahi Lina <lina@asahilina.net>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Commit reference: 3dfc5eb

Signed-off-by: Asahi Lina <lina@asahilina.net>
This abstraction enables Rust drivers to walk Device Tree nodes and
query their properties.

Signed-off-by: Asahi Lina <lina@asahilina.net>
…ation

In order for modpost to work and correctly generate module aliases from
device ID tables, it needs those tables to exist as global symbols with
a specific name. Additionally, modpost checks the size of the symbol, so
it cannot contain trailing data.

To support this, split IdArrayIds out of IdArray. The former contains
just the IDs. Then split out the device table definition macro from the
macro that defines the device table for a given bus driver, and add
another macro to declare a device table as a module device table.
Drivers can now define their ID table once, and then specify that it
should be used for both the driver and the module:

// Generic OF Device ID table.
kernel::define_of_id_table! {ASAHI_ID_TABLE, &'static hw::HwConfig, [
    (of::DeviceId::Compatible(b"apple,agx-t8103"), Some(&hw::t8103::HWCONFIG)),
    (of::DeviceId::Compatible(b"apple,agx-t8112"), Some(&hw::t8112::HWCONFIG)),
    // ...
]}

/// Platform Driver implementation for `AsahiDriver`.
impl platform::Driver for AsahiDriver {
    /// Data associated with each hardware ID.
    type IdInfo = &'static hw::HwConfig;

    // Assign the above OF ID table to this driver.
    kernel::driver_of_id_table!(ASAHI_ID_TABLE);

    // ...
}

// Export the OF ID table as a module ID table, to make modpost/autoloading work.
kernel::module_of_id_table!(MOD_TABLE, ASAHI_ID_TABLE);

Signed-off-by: Asahi Lina <lina@asahilina.net>
This patch adds a logic similar to `devm_platform_ioremap_resource`
function adding:
  - `IoResource` enumerated type that groups the `IORESOURCE_*` macros.
  - `get_resource()` method that is a binding of `platform_get_resource`
  - `ioremap_resource` that is newly written method similar to
    `devm_platform_ioremap_resource`.

Lina: Removed `bit` dependency and rebased

Co-developed-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Maciej Falkowski <m.falkowski@samsung.com>
Allows drivers to configure the DMA masks for a device. Implemented
here, not in device, because it requires a mutable platform device
reference this way (device::Device is a safely clonable reference).

Signed-off-by: Asahi Lina <lina@asahilina.net>
Apple SoCs require non-posted mappings for MMIO, and this is
automatically handled by devm_ioremap_resource() and friends via a
resource flag. Implement the same logic in kernel::io_mem, so it can
work the same way.

Signed-off-by: Asahi Lina <lina@asahilina.net>
TODO: This isn't abstracted properly yet

Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Apple Silicon SoCs (M1, M2, etc.) have a GPU with an ARM64 firmware
coprocessor. The firmware and the GPU share page tables in the standard
ARM64 format (the firmware literally sets the base as its TTBR0/1
registers). TTBR0 covers the low half of the address space and is
intended to be per-GPU-VM (GPU user mappings and kernel-managed
buffers), while TTBR1 covers the upper half and is global (firmware
code, data, management structures shared with the AP, and a few
GPU-accessible data structures).

In typical Apple fashion, the permissions are interpreted differently
from traditional ARM PTEs. By default, firmware mappings use Apple SPRR
permission remapping. The firmware only uses that for its own
code/data/MMIO mappings, and those pages are not accessible by the GPU
hardware. We never need to touch/manage these mappings, so this patch
does not support them.

When a specific bit is set in the PTEs, permissions switch to a
different scheme which supports various combinations of firmware/GPU
access. This is the mode intended to be used by AP GPU drivers, and what
we implement here.

The prot bits are interpreted as follows:

- IOMMU_READ and IOMMU_WRITE have the usual meaning.

- IOMMU_PRIV creates firmware-only mappings (no GPU access)
- IOMMU_NOEXEC creates GPU-only structures (no FW access)
- Otherwise structures are accessible by both GPU and FW

- IOMMU_MMIO creates Device mappings for firmware
- IOMMU_CACHE creates Normal-NC mappings for firmware (cache-coherent
  from the point of view of the AP, but slower)
- Otherwise creates Normal mappings for firmware (this requires manual
  cache management on the firmware side, as it is not coherent with the
  SoC fabric)

GPU-only mappings (textures/etc) are expected to use IOMMU_CACHE and are
seemingly coherent with the CPU (or otherwise the firmware/GPU already
issue the required cache management operations when correctly
configured).

There is a GPU-RO/FW-RW mode, but it is not currently implemented (it
doesn't seem to be very useful for the driver). There seems to be no
real noexec control (i.e. for shaders) on the GPU side. All of these
mappings are implicitly noexec for the firmware.

Drivers are expected to fully manage per-user (TTBR0) page tables, but
ownership of shared kernel (TTBR1) page tables is shared between the
firmware and the AP OS. We handle this by simply using a smaller IAS to
drop down one level of page tables, so the driver can install a PTE in
the top-level (firmware-initialized) page table directly and just add an
offset to the VAs passed into the io_pgtable code. This avoids having to
have any special handling for this here. The firmware-relevant data
structures are small, so we do not expect to ever require more VA space
than one top-level PTE covers (IAS=36 for the next level, 64 GiB).

Only 16K page mode is supported. The coprocessor MMU supports huge pages
as usual for ARM64, but the GPU MMU does not, so we do not enable them.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Asahi Lina <lina@asahilina.net>
DRM drivers need to be able to declare which driver-specific ioctls they
support. This abstraction adds the required types and a helper macro to
generate the ioctl definition inside the DRM driver.

Note that this macro is not usable until further bits of the
abstraction are in place (but it will not fail to compile on its own, if
not called).

Signed-off-by: Asahi Lina <lina@asahilina.net>
Add the initial abstractions for DRM drivers and devices. These go
together in one commit since they are fairly tightly coupled types.

A few things have been stubbed out, to be implemented as further bits of
the DRM subsystem are introduced.

Signed-off-by: Asahi Lina <lina@asahilina.net>
A DRM File is the DRM counterpart to a kernel file structure,
representing an open DRM file descriptor. Add a Rust abstraction to
allow drivers to implement their own File types that implement the
DriverFile trait.

Signed-off-by: Asahi Lina <lina@asahilina.net>
The DRM GEM subsystem is the DRM memory management subsystem used by
most modern drivers. Add a Rust abstraction to allow Rust DRM driver
implementations to use it.

Signed-off-by: Asahi Lina <lina@asahilina.net>
There doesn't seem to be a way for the Rust bindings to get a
compile-time constant reference to drm_gem_shmem_vm_ops, so we need to
duplicate that structure in Rust... this isn't nice...

Signed-off-by: Asahi Lina <lina@asahilina.net>
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from b3002e4 to 5c5368d Compare August 3, 2023 10:24
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from 77166d0 to 9c6bcb8 Compare September 9, 2023 03:33
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from 499c582 to 5ccaf76 Compare January 17, 2024 08:25
marcan pushed a commit that referenced this pull request Jan 19, 2024
syzkaller report:

 kernel BUG at net/core/skbuff.c:3452!
 invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135
 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
 Call Trace:
 icmp_glue_bits (net/ipv4/icmp.c:357)
 __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
 ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
 icmp_push_reply (net/ipv4/icmp.c:370)
 __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
 ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
 __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
 ip_output (net/ipv4/ip_output.c:427)
 __ip_queue_xmit (net/ipv4/ip_output.c:535)
 __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
 __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
 tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
 tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
 tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)

The panic issue was trigered by tcp simultaneous initiation.
The initiation process is as follows:

      TCP A                                            TCP B

  1.  CLOSED                                           CLOSED

  2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...

  3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT

  4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED

  5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...

  // TCP B: not send challenge ack for ack limit or packet loss
  // TCP A: close
	tcp_close
	   tcp_send_fin
              if (!tskb && tcp_under_memory_pressure(sk))
                  tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
           TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag

  6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...

  // TCP B: send challenge ack to SYN_FIN_ACK

  7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack

  // TCP A:  <SND.UNA=101>

  8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic

	__tcp_retransmit_skb  //skb->len=0
	    tcp_trim_head
		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
		    __pskb_trim_head
			skb->data_len -= len // skb->len=-1, wrap around
	    ... ...
	    ip_fragment
		icmp_glue_bits //BUG_ON

If we use tcp_trim_head() to remove acked SYN from packet that contains data
or other flags, skb->len will be incorrectly decremented. We can remove SYN
flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
can fix the problem mentioned above.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 3 times, most recently from 22f3e5c to 1213cf2 Compare May 10, 2024 12:25
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from 38f6258 to 01f6f6f Compare July 17, 2024 10:22
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from 3c4cf32 to 0b31a16 Compare December 9, 2024 21:40
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 2 times, most recently from 4915ecf to f6f35c0 Compare January 20, 2025 20:06
@hoshinolina hoshinolina force-pushed the gpu/rust-wip branch 5 times, most recently from 387ff31 to 9598175 Compare February 3, 2025 00:12
@furkanmustafa
Copy link
Copy Markdown

furkanmustafa commented Mar 18, 2025

@jannau is this PR outdated and contents has already shipped to this tree in another way?
or are there still reasons for this to be left alive here?

I have a feeling about the latter, since it seems to be rebased just last month. (it was the target branch, my mistake)

Maybe it would be nice to reflect the current status and context to the description.

Thanks.

@jannau jannau closed this Apr 11, 2025
jannau pushed a commit that referenced this pull request Mar 1, 2026
commit 87e4b04 upstream.

If the role change while we are suspended, the cdns3 driver switches to the
new mode during resume. However, switching to host mode in this context
causes a NULL pointer dereference.

The host role's start() operation registers a xhci-hcd device, but its
probe is deferred while we are in the resume path. The host role's resume()
operation assumes the xhci-hcd device is already probed, which is not the
case, leading to the dereference. Since the start() operation of the new
role is already called, the resume operation can be skipped.

So skip the resume operation for the new role if a role switch occurs
during resume. Once the resume sequence is complete, the xhci-hcd device
can be probed in case of host mode.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000208
Mem abort info:
...
Data abort info:
...
[0000000000000208] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 146 Comm: sh Not tainted
6.19.0-rc7-00013-g6e64f4aabfae-dirty #135 PREEMPT
Hardware name: Texas Instruments J7200 EVM (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usb_hcd_is_primary_hcd+0x0/0x1c
lr : cdns_host_resume+0x24/0x5c
...
Call trace:
 usb_hcd_is_primary_hcd+0x0/0x1c (P)
 cdns_resume+0x6c/0xbc
 cdns3_controller_resume.isra.0+0xe8/0x17c
 cdns3_plat_resume+0x18/0x24
 platform_pm_resume+0x2c/0x68
 dpm_run_callback+0x90/0x248
 device_resume+0x100/0x24c
 dpm_resume+0x190/0x2ec
 dpm_resume_end+0x18/0x34
 suspend_devices_and_enter+0x2b0/0xa44
 pm_suspend+0x16c/0x5fc
 state_store+0x80/0xec
 kobj_attr_store+0x18/0x2c
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x130/0x1dc
 vfs_write+0x240/0x370
 ksys_write+0x70/0x108
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x10c
 el0_svc_common.constprop.0+0x40/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0x108
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c
Code: 52800003 f9407ca5 d63f00a0 17ffffe4 (f9410401)
---[ end trace 0000000000000000 ]---

Cc: stable <stable@kernel.org>
Fixes: 2cf2581 ("usb: cdns3: add power lost support for system resume")
Signed-off-by: Thomas Richard (TI) <thomas.richard@bootlin.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://patch.msgid.link/20260130-usb-cdns3-fix-role-switching-during-resume-v1-1-44c456852b52@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jannau pushed a commit that referenced this pull request Mar 1, 2026
commit 87e4b04 upstream.

If the role change while we are suspended, the cdns3 driver switches to the
new mode during resume. However, switching to host mode in this context
causes a NULL pointer dereference.

The host role's start() operation registers a xhci-hcd device, but its
probe is deferred while we are in the resume path. The host role's resume()
operation assumes the xhci-hcd device is already probed, which is not the
case, leading to the dereference. Since the start() operation of the new
role is already called, the resume operation can be skipped.

So skip the resume operation for the new role if a role switch occurs
during resume. Once the resume sequence is complete, the xhci-hcd device
can be probed in case of host mode.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000208
Mem abort info:
...
Data abort info:
...
[0000000000000208] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 146 Comm: sh Not tainted
6.19.0-rc7-00013-g6e64f4aabfae-dirty #135 PREEMPT
Hardware name: Texas Instruments J7200 EVM (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usb_hcd_is_primary_hcd+0x0/0x1c
lr : cdns_host_resume+0x24/0x5c
...
Call trace:
 usb_hcd_is_primary_hcd+0x0/0x1c (P)
 cdns_resume+0x6c/0xbc
 cdns3_controller_resume.isra.0+0xe8/0x17c
 cdns3_plat_resume+0x18/0x24
 platform_pm_resume+0x2c/0x68
 dpm_run_callback+0x90/0x248
 device_resume+0x100/0x24c
 dpm_resume+0x190/0x2ec
 dpm_resume_end+0x18/0x34
 suspend_devices_and_enter+0x2b0/0xa44
 pm_suspend+0x16c/0x5fc
 state_store+0x80/0xec
 kobj_attr_store+0x18/0x2c
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x130/0x1dc
 vfs_write+0x240/0x370
 ksys_write+0x70/0x108
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x10c
 el0_svc_common.constprop.0+0x40/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0x108
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c
Code: 52800003 f9407ca5 d63f00a0 17ffffe4 (f9410401)
---[ end trace 0000000000000000 ]---

Cc: stable <stable@kernel.org>
Fixes: 2cf2581 ("usb: cdns3: add power lost support for system resume")
Signed-off-by: Thomas Richard (TI) <thomas.richard@bootlin.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://patch.msgid.link/20260130-usb-cdns3-fix-role-switching-during-resume-v1-1-44c456852b52@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.