rust: sync: Make Lock::locked_data() return raw pointer #166

fbq · 2021-04-04T07:20:45Z

@wedsonaf This PR is more for discussion purpose.

Currently Lock::locked_data returns &UnsafeCell<T>, I think we can change it to *mut T, reasons as follow:

The solely purpose of &UnsafeCell<T> is to get a *mut T, why not directly return *mut T?
Returning a &UnsafeCell<T> means we store underlying data in a UnsafeCell, and I think this is more an implementation detail, which we don't want to expose to API level.
Besides, say we want to access data outside Rust world (e.g. a static C variable), we cannot put it in a Rust UnsafeCell, right? Because of that, we cannot implement a proper Lock struct for it.

An example for the third point: __cpu_online_mask is protected by cpus_write_lock() and cpus_write_unlock(), if we are about to implement a CPUOnlineLock, we could code as follow:

pub struct CPUOnlineLock {}

impl Lock for CPUOnlineLock {
    type Inner = bindings::cpumask;
    fn lock_noguard(&self) {
        unsafe {
            bindings::cpus_write_lock();
        }
    }
    unsafe fn unlock(&self) {
         bindings::cpus_write_unlock();
    }

    fn locked_data(&self) -> &UnsafeCell<Self::Inner> {
        // what we are going to return here?
        // if the API return `* mut T`, we can return
        // unsafe { &__cpu_online_mask }
        // here.
    }
}

Thoughts?

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

jabedude · 2021-04-06T15:21:05Z

Besides, say we want to access data outside Rust world (e.g. a static C variable), we cannot put it in a Rust UnsafeCell, right?

I think that is true, but I don't see that as a limitation on UnsafeCell. If a driver author has data that is accessed from C they would simply get the internal mutable pointer

fbq · 2021-04-06T16:41:12Z

Besides, say we want to access data outside Rust world (e.g. a static C variable), we cannot put it in a Rust UnsafeCell, right?

I think that is true, but I don't see that as a limitation on UnsafeCell. If a driver author has data that is accessed from C they would simply get the internal mutable pointer

What I mean is we have a static variable in C world:

static struct cpumask __cpu_online_mask;

And we cannot wrap it into a Rust UnsafeCell, whose ref is returned by some locked_data() implemention.

wedsonaf · 2021-04-07T13:43:13Z

@fbq I think you're right in that there's no harm in returning a raw pointer, in addition to enabling the scenario you describe where C controls the data storage.

Let me think about it tonight to try and convince myself. The thing that isn't clear yet is whether removing UnsafeCell may enable the compiler to do optimisations that shouldn't be allowed.

197g · 2021-04-07T15:15:43Z

The solely purpose of &UnsafeCell is to get a *mut T, why not directly return *mut T?

There is a semantic gap between &UnsafeCell<T> and a raw pointer. The latter guarantees nothing other than being an address while the former is a way to say, there surely is a valid T here that does not overlap with any other unique reference; even though different code may still share access to the inner contents and it might get mutated through different references at the same time and it is totally up to the programmer to ensure that all access to it is correct. Being a reference and thus having a lifetime also nudges the programmer into the direction of borrow checking where possible, such as ensuring that lock guards borrow from the original lock.

And we cannot wrap it into a Rust UnsafeCell, whose ref is returned by some locked_data() implemention.

Rust's memory model isn't typed and UnsafeCell doesn't change the ABI nor layout of the wrapped type—this is due to UnsafeCell being declared with the repr(transparent) attribute—so that declaring the static as an UnsafeCell instead of the contained type must work correctly:

extern "C" {
    static __cpu_online_mask: UnsafeCell<cpumask>;
}

It might also be fine to cast a pointer to the static to a &UnsafeCell<cpumask> if a declaration as such an extern static isn't feasilble for some reason.

fbq · 2021-04-07T16:26:21Z

Rust's memory model isn't typed and UnsafeCell doesn't change the ABI nor layout of the wrapped type—this is due to UnsafeCell being declared with the repr(transparent) attribute—so that declaring the static as an UnsafeCell instead of the contained type must work correctly:
extern "C" {
    static __cpu_online_mask: UnsafeCell<cpumask>;
}

Cool! I must admit I didn't know this works, thanks a lot!

Soveu · 2021-04-07T20:00:27Z

Also dereferencing *mut into &mut is very dangerous, because LLVM may assume that it is an unique reference that no other can modify (via noalias attribute)

fbq · 2021-04-08T03:39:03Z

Also dereferencing *mut into &mut is very dangerous, because LLVM may assume that it is an unique reference that no other can modify (via noalias attribute)

Yeah, @wedsonaf mentioned the similar concern too, and actually, I was also worried about this. Now seems we have a solution (thanks to @HeroicKatora), I will close this. Thank you all.

The IPA BCM resource ("IP0") on sc7180 was moved to the clk-rpmh driver in commit bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180") and modeled as a clk, but this interconnect driver still had it modeled as an interconnect. This was mostly OK because nobody used the interconnect definition, until the interconnect framework started dropping bandwidth requests on interconnects that aren't used via the sync_state callback in commit 7d3b0b0 ("interconnect: qcom: Use icc_sync_state"). Once that patch was applied the IP0 resource was going to be controlled from two places, the clk framework and the interconnect framework. Even then, things were probably going to be OK, because commit b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate") was needed to actually drop bandwidth requests on unused interconnects, of which the IPA was one of the interconnect that wasn't getting dropped to zero. Combining the three commits together leads to bad behavior where the interconnect framework is disabling the IP0 resource because it has no users while the clk framework thinks the IP0 resource is on because the only user, the IPA driver, has turned it on via clk_prepare_enable(). Depending on when sync_state is called, we can get into a situation like below: IPA driver probes IPA driver gets notified modem started runtime PM get() IPA clk enabled -> IP0 resource is ON sync_state runs interconnect zeroes out the IP0 resource -> IP0 resource is off IPA driver tries to access a register and blows up The crash is an unclocked access that manifest as an SError. SError Interrupt on CPU0, code 0xbe000011 -- SError CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mutex_lock+0x4c/0x80 lr : mutex_lock+0x30/0x80 sp : ffffffc00da9b9c0 x29: ffffffc00da9b9c0 x28: 0000000000000000 x27: 0000000000000000 x26: ffffffc00da9bc90 x25: ffffff80c2024010 x24: ffffff80c2024000 x23: ffffff8083100000 x22: ffffff80831000d0 x21: ffffff80831000a8 x20: ffffff80831000a8 x19: ffffff8083100070 x18: 00000000ffff0a00 x17: 000000002f7254f1 x16: 0000000000000100 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 000000000001f0b8 x10: ffffffc00931f0b8 x9 : 0000000000000000 x8 : 0000000000000000 x7 : fefefefefeff2f60 x6 : 0000808080808080 x5 : 0000000000000000 x4 : 8080808080800000 x3 : ffffff80d2d4ee28 x2 : ffffff808c1d6e40 x1 : 0000000000000000 x0 : ffffff8083100070 Kernel panic - not syncing: Asynchronous SError Interrupt CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) Call trace: dump_backtrace+0xf4/0x114 show_stack+0x24/0x30 dump_stack_lvl+0x64/0x7c dump_stack+0x18/0x38 panic+0x150/0x38c nmi_panic+0x88/0xa0 arm64_serror_panic+0x74/0x80 do_serror+0x0/0x80 do_serror+0x58/0x80 el1h_64_error_handler+0x34/0x4c el1h_64_error+0x78/0x7c mutex_lock+0x4c/0x80 __gsi_channel_start+0x50/0x17c gsi_channel_start+0x54/0x90 ipa_endpoint_enable_one+0x34/0xc0 ipa_open+0x4c/0x120 Remove all IP0 resource management from the interconnect driver so that clk-rpmh is the sole owner. This fixes the issue by preventing the interconnect driver from overwriting the IP0 resource data that the clk-rpmh driver wrote. Cc: Alex Elder <elder@linaro.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Taniya Das <quic_tdas@quicinc.com> Cc: Mike Tipton <quic_mdtipton@quicinc.com> Fixes: b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate") Fixes: bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180") Fixes: 7d3b0b0 ("interconnect: qcom: Use icc_sync_state") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Alex Elder <elder@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220412220033.1273607-2-swboyd@chromium.org Signed-off-by: Georgi Djakov <djakov@kernel.org>

When running with return thunks enabled under 32-bit EFI, the system crashes with: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: 000000005bc02900 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0011) - permissions violation PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063 Oops: 0011 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:0x5bc02900 Code: Unable to access opcode bytes at RIP 0x5bc028d6. RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048 RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28 R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710 R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0 Call Trace: ? efi_set_virtual_address_map+0x9c/0x175 efi_enter_virtual_mode+0x4a6/0x53e start_kernel+0x67c/0x71e x86_64_start_reservations+0x24/0x2a x86_64_start_kernel+0xe9/0xf4 secondary_startup_64_no_verify+0xe5/0xeb That's because it cannot jump to the return thunk from the 32-bit code. Using a naked RET and marking it as safe allows the system to proceed booting. Fixes: aa3d480 ("x86: Use return-thunk in asm code") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: <stable@vger.kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

If a relocatable kernel is loaded at an address that is not 2MB aligned and told not to relocate to zero, the kernel can crash due to mark_rodata_ro() incorrectly changing some read-write data to read-only. Scenarios where the misalignment can occur are when the kernel is loaded by kdump or using the RELOCATABLE_TEST config option. Example crash with the kernel loaded at 5MB: Run /sbin/init as init process BUG: Unable to handle kernel data access on write at 0xc000000000452000 Faulting instruction address: 0xc0000000005b6730 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380 REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000 CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0 ... NIP memset+0x68/0x104 LR zero_user_segments.constprop.0+0xa8/0xf0 Call Trace: ext4_mpage_readpages+0x7f8/0x830 ext4_readahead+0x48/0x60 read_pages+0xb8/0x380 page_cache_ra_unbounded+0x19c/0x250 filemap_fault+0x58c/0xae0 __do_fault+0x60/0x100 __handle_mm_fault+0x1230/0x1a40 handle_mm_fault+0x120/0x300 ___do_page_fault+0x20c/0xa80 do_page_fault+0x30/0xc0 data_access_common_virt+0x210/0x220 This happens because mark_rodata_ro() tries to change permissions on the range _stext..__end_rodata, but _stext sits in the middle of the 2MB page from 4MB to 6MB: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec) The logic that changes the permissions assumes the linear mapping was split correctly at boot, so it marks the entire 2MB page read-only. That leads to the write fault above. To fix it, the boot time mapping logic needs to consider that if the kernel is running at a non-zero address then _stext is a boundary where it must split the mapping. That leads to the mapping being split correctly, allowing the rodata permission change to take happen correctly, with no spillover: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec) radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec) If the kernel is loaded at a 2MB aligned address, the mapping continues to use 2MB pages as before: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages Fixes: c55d7b5 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au

rust: sync: Make Lock::locked_data() return raw pointer

af0cd05

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

fbq closed this Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rust: sync: Make Lock::locked_data() return raw pointer #166

rust: sync: Make Lock::locked_data() return raw pointer #166

Uh oh!

fbq commented Apr 4, 2021

Uh oh!

jabedude commented Apr 6, 2021

Uh oh!

fbq commented Apr 6, 2021

Uh oh!

wedsonaf commented Apr 7, 2021

Uh oh!

197g commented Apr 7, 2021

Uh oh!

fbq commented Apr 7, 2021

Uh oh!

Soveu commented Apr 7, 2021

Uh oh!

fbq commented Apr 8, 2021

Uh oh!

Uh oh!

rust: sync: Make Lock::locked_data() return raw pointer #166

rust: sync: Make Lock::locked_data() return raw pointer #166

Uh oh!

Conversation

fbq commented Apr 4, 2021

Uh oh!

jabedude commented Apr 6, 2021

Uh oh!

fbq commented Apr 6, 2021

Uh oh!

wedsonaf commented Apr 7, 2021

Uh oh!

197g commented Apr 7, 2021

Uh oh!

fbq commented Apr 7, 2021

Uh oh!

Soveu commented Apr 7, 2021

Uh oh!

fbq commented Apr 8, 2021

Uh oh!

Uh oh!