IORING_OP_PROVIDE_BUFFERS range validation issue #726

lano1106 · 2022-11-10T13:40:20Z

the problem is definitely in my code but I think that this should definitely be addressed as this might be a security issue allowing someone to make the kernel write pretty much anywhere.

I provide to io_uring an 8 buffers long GID. So bid valid range should be 0-7.

My code is storing -1 into a the cqe associated user structure unsigned short bid var (65535) when it is processing a recv() error. It should not return the invalid bid value back to io_uring in that situation but it did mistakenly.

io_uring did accept back this invalid bid but more worrisome, it did handle another recv() by writing into this reused invalid bid...
I am not sure if the kernel did succeed in writing there but my app did SEGV when trying to access the memory...

I assume that the kernel did succeed to write there fine or else recv call would have failed but I caught the problem by analyzing a core dump following a SEGV while processing a successful recv() cqe containing the bid 65535!

buffers base address was: 0x556c3a9f5000
invalid bid buffer address outside the process virtual space: 0x556c8a9f0000
individual bufsz: 20480

IMHO, this is clearly hacking material...

lano1106 · 2022-11-10T13:41:16Z

ps. it did happen with kernel v6.0.7 but this is certainly there for every io_uring version...

also, this is through the legacy buffer provisioning setup. Not with the newer buffer ring system...

axboe · 2022-11-10T15:09:48Z

I'll take a look, do you have the code you used? Will save me some time. In general, we should be doing access_ok() for any provided buffer. If the index is missed and we allowed the recv or write, then presumably that buffer index ended up with valid memory for that process. But I'll double check.

lano1106 · 2022-11-10T15:45:17Z

very trivial to reproduce IMHO... I initialized buffers with those parameters:
[2022-11-09 12:56:44] INFO WSCONT/initGroupBuffer Provide 8 20480 bytes buffers to gid 2 from address 0x0x556c3a9f5000 (163840)

I believe that you do not even need to start using the buffers to do:
io_uring_prep_provide_buffers(sqe, 0x0x556c3a9f5000, 20480, 1, 2, 65535);

next and check what io_uring will do with it...

My confidence level that there is a real problem concerning this is very high...

axboe · 2022-11-10T15:56:52Z

Can you please describe in detail what you are doing. It may be obvious to you since you've been looking at it, but snippets of details do not provide a full picture for me.

Looking at provide buffers, the first thing we do is:

        if (check_mul_overflow((unsigned long)p->len, (unsigned long)p->nbufs,
				&size))
		return -EOVERFLOW;
	if (check_add_overflow((unsigned long)p->addr, size, &tmp_check))
		return -EOVERFLOW;

	size = (unsigned long)p->len * p->nbufs;
	if (!access_ok(u64_to_user_ptr(p->addr), size))
		return -EFAULT;

which will check if that range is valid for the process. This is just a basic check, once you do recv/read later on to use the buffer, then we may indeed fail writing to it. Which should trigger an -EFAULT cqe->res or similar.

It's very possible that I'm missing something here, which is why I asked for the details on the issue.

lano1106 · 2022-11-10T16:00:17Z

You are really near the problem. Notice that the offset (aka seq->off or bid) is not applied before calling access_ok() !!

axboe · 2022-11-10T16:07:29Z

What offset?

Edit: the offset is just the buffer group ID, that doesn't factor into the range at all. If you disagree, please be explicit and let's avoid guessing games on my part...

lano1106 · 2022-11-10T16:09:56Z

seq->off

the buffer address becomes:
sqe->addr+seq->len*seq->off

io_provide_buffers_prep() is testing the base address of the group buffers. not the bid buffer address

axboe · 2022-11-10T16:14:30Z

The range provided is addr + nbufs * len_of_each_buf. sqe->off is the buffer group ID where the buffers will be stored, selected by the program setting IOSQE_BUFFER_SELECT and picking that ->buf_group for it.

For the provided buffer, sqe->off has nothing to do with the address, it's just the group indexing.

lano1106 · 2022-11-10T16:17:23Z

seq->off contains the bid.

p->bgid = READ_ONCE(sqe->buf_group);
tmp = READ_ONCE(sqe->off);
	if (tmp > USHRT_MAX)
		return -E2BIG;
	p->bid = tmp;

maybe I did misuse the API but I have always provided the group base address when returning individual BID buffer and it does work perfectly fine...

axboe · 2022-11-10T16:20:50Z

When you issue a read/recv, you just provide the buffer group, but the buffer itself. And yes, sqe->off will be the address + offset for the operation on the selected buffer. For the provide buffer operation itself, sqe->off is which group to provide the buffer in.

Since we kind of keep dancing around what the issue is here (not sure why you're not just stating it?!), here's what I did:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <liburing.h>

int main(int argc, char *argv[])
{
	struct io_uring ring;
	struct io_uring_sqe *sqe;
	struct io_uring_cqe *cqe;
	unsigned long *ptr;
	char foo[64];
	int fds[2];

	io_uring_queue_init(8, &ring, 0);
	sqe = io_uring_get_sqe(&ring);
	io_uring_prep_provide_buffers(sqe, 0x556c3a9f5000, 20480, 1, 2, 65535);
	io_uring_submit(&ring);
	io_uring_wait_cqe(&ring, &cqe);
	printf("pbuf cqe res %d\n", cqe->res);
	io_uring_cqe_seen(&ring, cqe);

	if (pipe(fds) < 0) {
		perror("pipe");
		return 1;
	}

	sqe = io_uring_get_sqe(&ring);
	io_uring_prep_read(sqe, fds[0], NULL, 20480, 0);
	sqe->flags |= IOSQE_BUFFER_SELECT;
	sqe->buf_group = 2;

	io_uring_submit(&ring);

	memset(foo, 0x89, sizeof(foo));
	write(fds[1], foo, sizeof(foo));

	io_uring_wait_cqe(&ring, &cqe);
	printf("read cqe res %d\n", cqe->res);
	io_uring_cqe_seen(&ring, cqe);

	return 0;
}

which returns:

axboe@m1pro-kvm ~> ./ol
pbuf cqe res 0
read cqe res -14

which is what I'd expect. The access_ok() is just a basic check, the actual read hits -EFAULT when trying to use a buffer in that group. In the spirit of not wasting time, please be explicit in a) what you are running exactly, and b) what you see in terms of output.

lano1106 · 2022-11-10T16:43:04Z

First, your test will obviously not work since you provide a random address. I got 0x556c3a9f5000 from posix_memalign().

Next, I do pass a valid but wrong address. I started to mix recv() with recvmsg() to use kTLS with TLSv1.2 and I didn't know if recvmsg() could use provided buffers too. So I have static buffer for recvmsg and I set bid to 65535 when handling those recvmsg to differentiate them from recv() results. The mess up the code did was to return the recvmsg buffer with bid 65535 in a very specific context.

but what is not validated by io_provide_buffers_prep() is the BID value. So when io_uring does return me the invalid BID value and when I recalculate the BID address from the group base address, I get the SEGV address.

IOW, the returned buffer, yet valid, did not belong to the group buffer.

this is totally optional but maybe a max_bid value could stored when the buffer list is created so that the bid value could be validated when individual buffers are returned...

Beside that io_uring is all fine.

axboe · 2022-11-10T16:48:52Z

If allocate memory and write beyond that memory in the application, that may not be an immediate segfault. But it's undefined behavior, depends on what the heap looks like. Is it possible that this is what you are seeing? Doesn't really matter if you or the kernel writes to it in that regard.

I agree that if BID rolls around for provide buffer prep, that should be an EINVAL condition. If you pass 65535 then you should only be able to add a single buffer, which would then have ID 65535. I don't think we need to store that, since 64k-1 is always going to be the max value.

But I don't think there are any security concerns here, just confusing since we don't EINVAL if you attempt to add something where "start ID + nbufs" > 64k-1.

axboe · 2022-11-10T16:52:36Z

Something like this:

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 25cd724ade18..e2c46889d5fa 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -346,6 +346,8 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 	tmp = READ_ONCE(sqe->off);
 	if (tmp > USHRT_MAX)
 		return -E2BIG;
+	if (tmp + p->nbufs >= USHRT_MAX)
+		return -EINVAL;
 	p->bid = tmp;
 	return 0;
 }

Can you confirm this fixes the case for you? If so, I'm going to queue it up and add attribution to you as the reporter. Is Olivier Langlois <olivier@trillion01.com> the right name and email to use?

lano1106 · 2022-11-10T16:55:58Z

you are 100% right about finally not being a security problem.

I know that sometimes my explanations are not 100% clear. Let me try to reword the situation. io_uring does not validate the provided bid value.

So if I create 8 buffers group.

If I return to io_uring a bogus individual buffer with a bid value outside the valid range, io_uring will accept it as long as the buffer address is valid... This will in turn make the application explode when the invalid bid is returned...

This is totally arguable whether or not io_uring should/could shield the user about this but that was surprising to me... I just wanted to report this discovery...

Yes, the email address you have is good

axboe · 2022-11-10T16:56:20Z

I guess you could argue that the wrap is expected, so when you do:

Add 128 buffers starting at 65535

You get the following range of BIDs: [65535, 0, 1, 2, ...] and this patch would break that use case, which very well might be a valid use case.

lano1106 · 2022-11-10T17:14:27Z

you are right again about breaking a valid use case.

I am not sure whether or not it would be a good thing to validate the bid value.

if it is, you cannot validate it solely from the info passed along the request imho. Some info about the group would need to be stored when the group is created. Not much would be needed. Only the group base address and the max_bid would be required.

With this info, you could validate that

The passed address is inside the group space
bid is within the group

axboe · 2022-11-10T17:28:50Z

You can validate the the ID is unique, as that's the only problematic case. Having the values wrap might be confusing, but it could also be a valid use case. There's no max group buffer ID being different, they are all 64k-1 max. That's irrespective of what you first asked to store in there with your first IORING_OP_PROVIDE_BUFFERS command.

This leaves me confused about your 1 and 2 points here. There's no need for the buffers within a group to have any relation to each other, and bid is always within the group as it's 0..64k-1 always.

lano1106 · 2022-11-10T17:38:02Z

I see that there is still a misunderstanding about what we each are saying. It is not impossible that it is caused by me not using correctly the API.

To me, there are 2 steps performed by the IORING_OP_PROVIDE_BUFFERS request.

An initial call to create a buffer group - io_uring_prep_provide_buffers(sqe, 0x556c3a9f5000, 20480, 8, 2, 0);
Return an individual buffer to an existing group io_uring_prep_provide_buffers(sqe, 0x556c3a9f5000+3*20480, 20480, 1, 2, 3)

If calling io_uring_prep_provide_buffers() in context no.2 you could do:

io_uring_prep_provide_buffers(sqe, any_random_valid_address, 20480, 1, 2, 65535)

and this would currently work.

I guess that what you are saying is that you could technically add new additional buffers to an existing group and that would be perfectly a valid operation... I have never used the API that way... My problem is when I return a buffer to the group after io_uring did return it to its user... Maybe there is no easy way to deal wit that...

axboe · 2022-11-10T17:43:17Z

I think your misunderstanding is in thinking that when the buffer group is created in step 1, that defines the limit of that group. There is no such thing, the limits are ALWAYS 0..65535 for the BID in any group. This is why the group creation isn't a separate thing from adding buffers, the fact that it gets done automatically when you add buffers to a group that doesn't currently exist is purely implementation detail and in no way defines the limits of the group.

Not sure where this came from, it's surely not documented in any way that an initial buffer add would state anything about future limits of that group.

axboe · 2022-11-10T17:43:45Z

That said, clearly it would be a good idea to note this in the man page explicitly.

lano1106 · 2022-11-10T17:46:44Z

AFAIK, it is stated nowhere. At least when I did write that code. The doc has tremendously improved since then.

Buffer range is always 0..65535 regardless of how the group was created, and the kernel may error any value or range that wraps. Link: #726 Signed-off-by: Jens Axboe <axboe@kernel.dk>

axboe · 2022-11-10T18:01:19Z

Sent out the patch and added a man page comment as well on the range, hope that helps. I'll close this one.

We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit 3851d25 upstream. We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit be8b93b5cc7d533eb8c9b0590cdac055ecafe13a Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Wed Nov 16 10:04:15 2022 +0100 Linux 6.0.9 Link: https://lore.kernel.org/r/20221114124458.806324402@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Zan Aziz <zanaziz313@gmail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Rudi Heitbaum <rudi@heitbaum.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Allen Pais <apais@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit fca8b392a9b7fefbf501b94e0239ccc82bbd8cfa Author: Borislav Petkov <bp@suse.de> Date: Mon Nov 14 12:44:01 2022 +0100 x86/cpu: Restore AMD's DE_CFG MSR after resume commit 2632daebafd04746b4b96c2f26a6021bc38f6209 upstream. DE_CFG contains the LFENCE serializing bit, restore it on resume too. This is relevant to older families due to the way how they do S3. Unify and correct naming while at it. Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction") Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit d583257e94596fbb62347da7cf63d4d30db72f58 Author: Takashi Iwai <tiwai@suse.de> Date: Sat Nov 12 09:47:18 2022 +0100 ALSA: memalloc: Try dma_alloc_noncontiguous() at first commit 9d8e536d36e75e76614fe09ffab9a1df95b8b666 upstream. The latest fix for the non-contiguous memalloc helper changed the allocation method for a non-IOMMU system to use only the fallback allocator. This should have worked, but it caused a problem sometimes when too many non-contiguous pages are allocated that can't be treated by HD-audio controller. As a quirk workaround, go back to the original strategy: use dma_alloc_noncontiguous() at first, and apply the fallback only when it fails, but only for non-IOMMU case. We'll need a better fix in the fallback code as well, but this workaround should paper over most cases. Fixes: 9736a325137b ("ALSA: memalloc: Don't fall back for SG-buffer with IOMMU") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wgSH5ubdvt76gNwa004ooZAEJL_1Q-Fyw5M2FDdqL==dg@mail.gmail.com Link: https://lore.kernel.org/r/20221112084718.3305-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 128e284c6cccf5875261569fa3bb07558870c17f Author: Philip Yang <Philip.Yang@amd.com> Date: Thu Sep 8 17:56:09 2022 -0400 drm/amdkfd: Migrate in CPU page fault use current mm commit 3a876060892ba52dd67d197c78b955e62657d906 upstream. migrate_vma_setup shows below warning because we don't hold another process mm mmap_lock. We should use current vmf->vma->vm_mm instead, the caller already hold current mmap lock inside CPU page fault handler. WARNING: CPU: 10 PID: 3054 at include/linux/mmap_lock.h:155 find_vma Call Trace: walk_page_range+0x76/0x150 migrate_vma_setup+0x18a/0x640 svm_migrate_vram_to_ram+0x245/0xa10 [amdgpu] svm_migrate_to_ram+0x36f/0x470 [amdgpu] do_swap_page+0xcfe/0xec0 __handle_mm_fault+0x96b/0x15e0 handle_mm_fault+0x13f/0x3e0 do_user_addr_fault+0x1e7/0x690 Fixes: e1f84eef313f ("drm/amdkfd: handle CPU fault on COW mapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 1ae939320a6fc5e9a9b07ca11e86383dc950c794 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:49 2022 +0300 dmaengine: at_hdmac: Check return code of dma_async_device_register commit c47e6403fa099f200868d6b106701cb42d181d2b upstream. dma_async_device_register() can fail, check the return code and display an error. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-16-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit d2012bfd6612948c731d4283770b251479c4a531 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:48 2022 +0300 dmaengine: at_hdmac: Fix impossible condition commit 28cbe5a0a46a6637adbda52337d7b2777fc04027 upstream. The iterator can not be greater than ATC_MAX_DSCR_TRIALS, as the for loop will stop when i == ATC_MAX_DSCR_TRIALS. While here, use the common "i" name for the iterator. Fixes: 93dce3a6434f ("dmaengine: at_hdmac: fix residue computation") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-15-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit de7493265bd065b6d3ad519ce72daf12d439618d Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:47 2022 +0300 dmaengine: at_hdmac: Don't allow CPU to reorder channel enable commit 580ee84405c27d6ed419abe4d2b3de1968abdafd upstream. at_hdmac uses __raw_writel for register writes. In the absence of a barrier, the CPU may reorder the register operations. Introduce a write memory barrier so that the CPU does not reorder the channel enable, thus the start of the transfer, without making sure that all the pre-required register fields are already written. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-14-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 833da33da9e52252fc15de8bbc2ad0b950e2e341 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:46 2022 +0300 dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors commit ef2cb4f0ce479f77607b04c4b0414bf32f863ee8 upstream. In case the controller detected an error, the code took the chance to move all the queued (submitted) descriptors to the active (issued) list. This was wrong as if there were any descriptors in the submitted list they were moved to the issued list without actually issuing them to the controller, thus a completion could be raised without even fireing the descriptor. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-13-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7d533e5a2192a361a9d87078b9e2f953e426b20a Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:45 2022 +0300 dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware commit ba2423633ba646e1df20e30cb3cf35495c16f173 upstream. As it was before, the descriptor was issued to the hardware without adding it to the active (issued) list. This could result in a completion of other descriptor, or/and in the descriptor never being completed. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-12-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ece84823eaab0f293d36f48b87797c88a82b0575 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:44 2022 +0300 dmaengine: at_hdmac: Fix concurrency over the active list commit 03ed9ba357cc78116164b90b87f45eacab60b561 upstream. The tasklet (atc_advance_work()) did not held the channel lock when retrieving the first active descriptor, causing concurrency problems if issue_pending() was called in between. If issue_pending() was called exactly after the lock was released in the tasklet (atc_advance_work()), atc_chain_complete() could complete a descriptor for which the controller has not yet raised an interrupt. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-11-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit f4daa0bf7d34bd94280bed49d22a2c353acb8877 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:43 2022 +0300 dmaengine: at_hdmac: Free the memset buf without holding the chan lock commit 6ba826cbb57d675f447b59323204d1473bbd5593 upstream. There's no need to hold the channel lock when freeing the memset buf, as the operation has already completed. Free the memset buf without holding the channel lock. Fixes: 4d112426c344 ("dmaengine: hdmac: Add memset capabilities") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-10-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c8bd54b1e31f7545d8ea4e4ca49225677126f7c2 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:42 2022 +0300 dmaengine: at_hdmac: Fix concurrency over descriptor commit 06988949df8c3007ad82036d3606d8ae72ed9000 upstream. The descriptor was added to the free_list before calling the callback, which could result in reissuing of the same descriptor and calling of a single callback for both. Move the decriptor to the free list after the callback is invoked. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-9-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 4ff5adee87a27a3aa6eddbc9da58fbd717f791e8 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:41 2022 +0300 dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all() commit c6babed879fbe82796a601bf097649e07382db46 upstream. atc_complete_all() had concurrency bugs, thus remove it: 1/ atc_complete_all() in its entirety was buggy, as when the atchan->queue list (the one that contains descriptors that are not yet issued to the hardware) contained descriptors, it fired just the first from the atchan->queue, but moved all the desc from atchan->queue to atchan->active_list and considered them all as fired. This could result in calling the completion of a descriptor that was not yet issued to the hardware. 2/ when in tasklet at atc_advance_work() time, atchan->active_list was queried without holding the lock of the chan. This can result in atchan->active_list concurrency problems between the tasklet and issue_pending(). Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-8-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 2f454e5b54f00077388f0e9b090cb883af76858f Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:40 2022 +0300 dmaengine: at_hdmac: Protect atchan->status with the channel lock commit 6e5ad28d16f082efeae3d0bd2e31f24bed218019 upstream. Now that the complete callback call was removed from device_terminate_all(), we can protect the atchan->status with the channel lock. The atomic bitops on atchan->status do not substitute proper locking on the status, as one could still modify the status after the lock was dropped in atc_terminate_all() but before the atomic bitops were executed. Fixes: 078a6506141a ("dmaengine: at_hdmac: Fix deadlocks") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-7-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 880b1c3a28f748cf32a2bad652230bafc9ac4453 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:39 2022 +0300 dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all commit f645f85ae1104f8bd882f962ac0a69a1070076dd upstream. The method was wrong because it violated the dmaengine API. For aborted transfers the complete callback should not be called. Fix the behavior and do not call the complete callback on device_terminate_all. Fixes: 808347f6a317 ("dmaengine: at_hdmac: add DMA slave transfers") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-6-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 30a1cb46b95bbb963b96452575f4086dd679dfc2 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:38 2022 +0300 dmaengine: at_hdmac: Fix premature completion of desc in issue_pending commit fcd37565efdaffeac179d0f0ce980ac79bfdf569 upstream. Multiple calls to atc_issue_pending() could result in a premature completion of a descriptor from the atchan->active list, as the method always completed the first active descriptor from the list. Instead, issue_pending() should just take the first transaction descriptor from the pending queue, move it to active_list and start the transfer. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-5-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit f0cee31b5b39fb0e0cd77b026f8bc6deaa651860 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:37 2022 +0300 dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending commit 8a47221fc28417ff8a32a4f92d4448a56c3cf7e1 upstream. Cyclic channels must too call issue_pending in order to start a transfer. Start the transfer in issue_pending regardless of the type of channel. This wrongly worked before, because in the past the transfer was started at tx_submit level when only a desc in the transfer list. Fixes: 53830cc75974 ("dmaengine: at_hdmac: add cyclic DMA operation support") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-4-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 326edc955400312b6f0c993a020ae85b56dd02fa Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:36 2022 +0300 dmaengine: at_hdmac: Don't start transactions at tx_submit level commit 7176a6a8982d311e50a7c1168868d26e65bbba19 upstream. tx_submit is supposed to push the current transaction descriptor to a pending queue, waiting for issue_pending() to be called. issue_pending() must start the transfer, not tx_submit(), thus remove atc_dostart() from atc_tx_submit(). Clients of at_xdmac that assume that tx_submit() starts the transfer must be updated and call dma_async_issue_pending() if they miss to call it. The vdbg print was moved to after the lock is released. It is desirable to do the prints without the lock held if possible, and because the if statement disappears there's no reason why to do the print while holding the lock. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Reported-by: Peter Rosin <peda@axentia.se> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.se/ Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-3-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit be287ef040ffcef9ab41f359d40b0a55d501e181 Author: Tudor Ambarus <tudor.ambarus@microchip.com> Date: Tue Oct 25 12:02:35 2022 +0300 dmaengine: at_hdmac: Fix at_lli struct definition commit f1171bbdd2ba2a50ee64bb198a78c268a5baf5f1 upstream. Those hardware registers are all of 32 bits, while dma_addr_t ca be of type u64 or u32 depending on CONFIG_ARCH_DMA_ADDR_T_64BIT. Force u32 to comply with what the hardware expects. Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20221025090306.297886-2-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 386c49fe31ee748e053860b3bac7794a933ac9ac Author: Oliver Hartkopp <socketcan@hartkopp.net> Date: Wed Nov 2 10:54:31 2022 +0100 can: dev: fix skb drop check commit ae64438be1923e3c1102d90fd41db7afcfaf54cc upstream. In commit a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") the priv->ctrlmode element is read even on virtual CAN interfaces that do not create the struct can_priv at startup. This out-of-bounds read may lead to CAN frame drops for virtual CAN interfaces like vcan and vxcan. This patch mainly reverts the original commit and adds a new helper for CAN interface drivers that provide the required information in struct can_priv. Fixes: a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") Reported-by: Dariusz Stojaczyk <Dariusz.Stojaczyk@opensynergy.com> Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: Max Staudt <max@enpas.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20221102095431.36831-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org # 6.0.x [mkl: patch pch_can, too] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 13da0e8be1feefe95f0dccf48bc96aa46ae6832b Author: Paolo Bonzini <pbonzini@redhat.com> Date: Mon Nov 7 05:14:27 2022 -0500 KVM: SVM: move guest vmsave/vmload back to assembly commit e61ab42de874c5af8c5d98b327c77a374d9e7da1 upstream. It is error-prone that code after vmexit cannot access percpu data because GSBASE has not been restored yet. It forces MSR_IA32_SPEC_CTRL save/restore to happen very late, after the predictor untraining sequence, and it gets in the way of return stack depth tracking (a retbleed mitigation that is in linux-next as of 2022-11-09). As a first step towards fixing that, move the VMCB VMSAVE/VMLOAD to assembly, essentially undoing commit fb0c4a4fee5a ("KVM: SVM: move VMLOAD/VMSAVE to C code", 2021-03-15). The reason for that commit was that it made it simpler to use a different VMCB for VMLOAD/VMSAVE versus VMRUN; but that is not a big hassle anymore thanks to the kvm-asm-offsets machinery and other related cleanups. The idea on how to number the exception tables is stolen from a prototype patch by Peter Zijlstra. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 9b173dae28afec62bbc51967a82a1b5816b7ac9b Author: Paolo Bonzini <pbonzini@redhat.com> Date: Mon Nov 7 04:17:29 2022 -0500 KVM: SVM: retrieve VMCB from assembly commit f6d58266d731fd7e63163790aad21e0dbb1d5264 upstream. Continue moving accesses to struct vcpu_svm to vmenter.S. Reducing the number of arguments limits the chance of mistakes due to different registers used for argument passing in 32- and 64-bit ABIs; pushing the VMCB argument and almost immediately popping it into a different register looks pretty weird. 32-bit ABI is not a concern for __svm_sev_es_vcpu_run() which is 64-bit only; however, it will soon need @svm to save/restore SPEC_CTRL so stay consistent with __svm_vcpu_run() and let them share the same prototype. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 439741265f4c5ee7115f052a8a649d9298b1a606 Author: Peter Gonda <pgonda@google.com> Date: Fri Nov 4 07:22:20 2022 -0700 KVM: SVM: Only dump VMSA to klog at KERN_DEBUG level commit 0bd8bd2f7a789fe1dcb21ad148199d2f62d79873 upstream. Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's nothing to continuing, and as a result the VMSA gets dumped when it shouldn't. The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the previous log line has a newline. So switch from KERN_CONT to print_hex_dump_debug(). Jarkko pointed this out in reference to the original patch. See: https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/ print_hex_dump(KERN_DEBUG, ...) was pointed out there, but print_hex_dump_debug() should similar. Fixes: 6fac42f127b8 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog") Signed-off-by: Peter Gonda <pgonda@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Harald Hoyer <harald@profian.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Message-Id: <20221104142220.469452-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit e43b8a2f5df0ce516d1e2d4a82e6fe1e894d62ea Author: Paolo Bonzini <pbonzini@redhat.com> Date: Fri Oct 28 17:30:07 2022 -0400 KVM: SVM: adjust register allocation for __svm_vcpu_run() commit f7ef280132f9bf6f82acf5aa5c3c837206eef501 upstream. 32-bit ABI uses RAX/RCX/RDX as its argument registers, so they are in the way of instructions that hardcode their operands such as RDMSR/WRMSR or VMLOAD/VMRUN/VMSAVE. In preparation for moving vmload/vmsave to __svm_vcpu_run(), keep the pointer to the struct vcpu_svm in %rdi. In particular, it is now possible to load svm->vmcb01.pa in %rax without clobbering the struct vcpu_svm pointer. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7e7b93cb61308d6378202ee09962dc97a5aea8c6 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Fri Sep 30 14:14:44 2022 -0400 KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm commit 16fdc1de169ee0a4e59a8c02244414ec7acd55c3 upstream. Since registers are reachable through vcpu_svm, and we will need to access more fields of that struct, pass it instead of the regs[] array. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 17ed142bdc0f56a2c8b5c07814bd564fb1aff11c Author: Paolo Bonzini <pbonzini@redhat.com> Date: Tue Nov 8 09:44:53 2022 +0100 KVM: x86: use a separate asm-offsets.c file commit debc5a1ec0d195ffea70d11efeffb713de9fdbc7 upstream. This already removes an ugly #include "" from asm-offsets.c, but especially it avoids a future error when trying to define asm-offsets for KVM's svm/svm.h header. This would not work for kernel/asm-offsets.c, because svm/svm.h includes kvm_cache_regs.h which is not in the include path when compiling asm-offsets.c. The problem is not there if the .c file is in arch/x86/kvm. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 6b4a61b6639c7b228363679dd6c646de466a7dd8 Author: Like Xu <likexu@tencent.com> Date: Mon Sep 19 17:10:06 2022 +0800 KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet commit 8631ef59b62290c7d88e7209e35dfb47f33f4902 upstream. The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF) that limits the theoretical maximum value of the Intel GP PMC MSRs allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event selection MSRs to 15 (0x186-0x194). Limiting the maximum number of counters to 14 or 18 based on the currently allocated MSRs is clearly fragile, and it seems likely that Intel will even place PMCs 8-15 at a completely different range of MSR indices. So stop at the maximum number of GP PMCs supported today on Intel processors. There are some machines, like Intel P4 with non Architectural PMU, that may indeed have 18 counters, but those counters are in a completely different MSR address range and are not supported by KVM. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list") Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ee2fa56f07826bfbc3accbde27c72d0f8e8cc424 Author: Sean Christopherson <seanjc@google.com> Date: Fri Nov 11 00:18:41 2022 +0000 KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range() commit 6d3085e4d89ad7e6c7f1c6cf929d903393565861 upstream. When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated range to effectively block all page faults while the zap is in-progress. The invalidation helpers take a host virtual address, whereas zapping a GFN obviously provides a guest physical address and with the wrong unit of measurement (frame vs. byte). Alternatively, KVM could walk all memslots to get the associated HVAs, but thanks to SMM, that would require multiple lookups. And practically speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path, e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0 during boot, and APICv inhibits are similarly infrequent operations. Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") Reported-by: Chao Peng <chao.p.peng@linux.intel.com> Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221111001841.2412598-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit b064431630d09c52e7896a8f642fe819bdcc8cad Author: Geert Uytterhoeven <geert+renesas@glider.be> Date: Fri Oct 28 12:06:45 2022 +0200 can: rcar_canfd: Add missing ECC error checks for channels 2-7 commit 8b043dfb3dc7c32f9c2c0c93e3c2de346ee5e358 upstream. When introducing support for R-Car V3U, which has 8 instead of 2 channels, the ECC error bitmask was extended to take into account the extra channels, but rcar_canfd_global_error() was not updated to act upon the extra bits. Replace the RCANFD_GERFL_EEF[01] macros by a new macro that takes the channel number, fixing R-Car V3U while simplifying the code. Fixes: 45721c406dcf50d4 ("can: rcar_canfd: Add support for r8a779a0 SoC") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://lore.kernel.org/all/4edb2ea46cc64d0532a08a924179827481e14b4f.1666951503.git.geert+renesas@glider.be Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 3cb476cf834edca47f4470c276feb0f519401fb7 Author: Oliver Hartkopp <socketcan@hartkopp.net> Date: Fri Nov 4 15:25:51 2022 +0100 can: isotp: fix tx state handling for echo tx processing commit 866337865f3747c68a3e7bb837611e39cec1ecd6 upstream. In commit 4b7fe92c0690 ("can: isotp: add local echo tx processing for consecutive frames") the data flow for consecutive frames (CF) has been reworked to improve the reliability of long data transfers. This rework did not touch the transmission and the tx state changes of single frame (SF) transfers which likely led to the WARN in the isotp_tx_timer_handler() catching a wrong tx state. This patch makes use of the improved frame processing for SF frames and sets the ISOTP_SENDING state in isotp_sendmsg() within the cmpxchg() condition handling. A review of the state machine and the timer handling additionally revealed a missing echo timeout handling in the case of the burst mode in isotp_rcv_echo() and removes a potential timer configuration uncertainty in isotp_rcv_fc() when the receiver requests consecutive frames. Fixes: 4b7fe92c0690 ("can: isotp: add local echo tx processing for consecutive frames") Link: https://lore.kernel.org/linux-can/CAO4mrfe3dG7cMP1V5FLUkw7s+50c9vichigUMQwsxX4M=45QEw@mail.gmail.com/T/#u Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: stable@vger.kernel.org # v6.0 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20221104142551.16924-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 2719f82ad5d8199cf5f346ea8bb3998ad5323b72 Author: Oliver Hartkopp <socketcan@hartkopp.net> Date: Fri Nov 4 08:50:00 2022 +0100 can: j1939: j1939_send_one(): fix missing CAN header initialization commit 3eb3d283e8579a22b81dd2ac3987b77465b2a22f upstream. The read access to struct canxl_frame::len inside of a j1939 created skbuff revealed a missing initialization of reserved and later filled elements in struct can_frame. This patch initializes the 8 byte CAN header with zero. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/linux-can/20221104052235.GA6474@pengutronix.de Reported-by: syzbot+d168ec0caca4697e03b1@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20221104075000.105414-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c50396a8228139a0802cb33f708c4eb695ae3a7a Author: Peter Xu <peterx@redhat.com> Date: Wed Nov 2 14:41:52 2022 -0400 mm/shmem: use page_mapping() to detect page cache for uffd continue commit 93b0d9178743a68723babe8448981f658aebc58e upstream. mfill_atomic_install_pte() checks page->mapping to detect whether one page is used in the page cache. However as pointed out by Matthew, the page can logically be a tail page rather than always the head in the case of uffd minor mode with UFFDIO_CONTINUE. It means we could wrongly install one pte with shmem thp tail page assuming it's an anonymous page. It's not that clear even for anonymous page, since normally anonymous pages also have page->mapping being setup with the anon vma. It's safe here only because the only such caller to mfill_atomic_install_pte() is always passing in a newly allocated page (mcopy_atomic_pte()), whose page->mapping is not yet setup. However that's not extremely obvious either. For either of above, use page_mapping() instead. Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 83f89c90b159449ef8d58638a99120f80c6ecb72 Author: Pankaj Gupta <pankaj.gupta@amd.com> Date: Wed Nov 2 11:07:28 2022 -0500 mm/memremap.c: map FS_DAX device memory as decrypted commit 867400af90f1f953ff9e10b1b87ecaf9369a7eb8 upstream. virtio_pmem use devm_memremap_pages() to map the device memory. By default this memory is mapped as encrypted with SEV. Guest reboot changes the current encryption key and guest no longer properly decrypts the FSDAX device meta data. Mark the corresponding device memory region for FSDAX devices (mapped with memremap_pages) as decrypted to retain the persistent memory property. Link: https://lkml.kernel.org/r/20221102160728.3184016-1-pankaj.gupta@amd.com Fixes: b7b3c01b19159 ("mm/memremap_pages: support multiple ranges per invocation") Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 599e798e3bda8b958cdb72451f835428080fe910 Author: SeongJae Park <sj@kernel.org> Date: Mon Nov 7 16:50:00 2022 +0000 mm/damon/dbgfs: check if rm_contexts input is for a real context commit 1de09a7281edecfdba19b3a07417f6d65243ab5f upstream. A user could write a name of a file under 'damon/' debugfs directory, which is not a user-created context, to 'rm_contexts' file. In the case, 'dbgfs_rm_context()' just assumes it's the valid DAMON context directory only if a file of the name exist. As a result, invalid memory access could happen as below. Fix the bug by checking if the given input is for a directory. This check can filter out non-context inputs because directories under 'damon/' debugfs directory can be created via only 'mk_contexts' file. This bug has found by syzbot[1]. [1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/ Link: https://lkml.kernel.org/r/20221107165001.5717-2-sj@kernel.org Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: syzbot+6087eafb76a94c4ac9eb@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> [5.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7eb5008a5c23dca97845b6784b5cc91413b3eedb Author: Fenghua Yu <fenghua.yu@intel.com> Date: Fri Oct 14 15:25:41 2022 -0700 dmaengine: idxd: Do not enable user type Work Queue without Shared Virtual Addressing commit 0ec8ce07394442d722806fe61b901a5b2b17249d upstream. When the idxd_user_drv driver is bound to a Work Queue (WQ) device without IOMMU or with IOMMU Passthrough without Shared Virtual Addressing (SVA), the application gains direct access to physical memory via the device by programming physical address to a submitted descriptor. This allows direct userspace read and write access to arbitrary physical memory. This is inconsistent with the security goals of a good kernel API. Unlike vfio_pci driver, the IDXD char device driver does not provide any ways to pin user pages and translate the address from user VA to IOVA or PA without IOMMU SVA. Therefore the application has no way to instruct the device to perform DMA function. This makes the char device not usable for normal application usage. Since user type WQ without SVA cannot be used for normal application usage and presents the security issue, bind idxd_user_drv driver and enable user type WQ only when SVA is enabled (i.e. user PASID is enabled). Fixes: 448c3de8ac83 ("dmaengine: idxd: create user driver for wq 'device'") Cc: stable@vger.kernel.org Suggested-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221014222541.3912195-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit d51c525d63ad6bbea9b7dc9eb5b6de7d8001bb7f Author: Vasily Gorbik <gor@linux.ibm.com> Date: Wed Nov 2 19:09:17 2022 +0100 mm: hugetlb_vmemmap: include missing linux/moduleparam.h commit db5e8d84319bcdb51e1d3cfa42b410291d6d1cfa upstream. The kernel test robot reported build failures with a 'randconfig' on s390: >> mm/hugetlb_vmemmap.c:421:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); ^ Link: https://lore.kernel.org/linux-mm/202210300751.rG3UDsuc-lkp@intel.com/ Link: https://lkml.kernel.org/r/patch.git-296b83ca939b.your-ad-here.call-01667411912-ext-5073@work.hours Fixes: 30152245c63b ("mm: hugetlb_vmemmap: replace early_param() with core_param()") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit bccc10be65e365ba8a3215cb702e6f57177eea07 Author: Naoya Horiguchi <naoya.horiguchi@nec.com> Date: Mon Nov 7 11:10:10 2022 +0900 arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging commit 1fdbed657a4726639c4f17841fd2a0fb646c746e upstream. The following bug is reported to be triggered when starting X on x86-32 system with i915: [ 225.777375] kernel BUG at mm/memory.c:2664! [ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP [ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86 [ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006 [ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c [ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76 [ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00 [ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80 [ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202 [ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0 [ 225.777479] Call Trace: [ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777684] apply_to_page_range+0x21/0x27 [ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777870] remap_io_mapping+0x49/0x75 [i915] [ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.778220] ? mutex_unlock+0xb/0xd [ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915] [ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915] [ 225.778644] ? lock_is_held_type+0x56/0xe7 [ 225.778655] ? lock_is_held_type+0x7a/0xe7 [ 225.778663] ? 0xc1000000 [ 225.778670] __do_fault+0x21/0x6a [ 225.778679] handle_mm_fault+0x708/0xb21 [ 225.778686] ? mt_find+0x21e/0x5ae [ 225.778696] exc_page_fault+0x185/0x705 [ 225.778704] ? doublefault_shim+0x127/0x127 [ 225.778715] handle_exception+0x130/0x130 [ 225.778723] EIP: 0xb700468a Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") to handle some special states of gigantic page. However, it's overlooked that pud_none() always returns false when running with 2-level paging, and as a result pud_huge() can return true pointlessly. Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this. Link: https://lkml.kernel.org/r/20221107021010.2449306-1-naoya.horiguchi@linux.dev Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit eaa40230966f1f378d49da3b8f39e59b16f8518f Author: Mika Westerberg <mika.westerberg@linux.intel.com> Date: Tue Oct 25 09:28:00 2022 +0300 spi: intel: Use correct mask for flash and protected regions commit 92a66cbf6b30eda5719fbdfb24cd15fb341bba32 upstream. The flash and protected region mask is actually 0x7fff (30:16 and 14:0) and not 0x3fff so fix this accordingly. While there use GENMASK() instead. Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20221025062800.22357-1-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ac79001b8e603226fab17240a79cb9ef679d3cd9 Author: ZhangPeng <zhangpeng362@huawei.com> Date: Wed Nov 9 01:35:42 2022 +0000 udf: Fix a slab-out-of-bounds write bug in udf_find_entry() commit c8af247de385ce49afabc3bf1cf4fd455c94bfe8 upstream. Syzbot reported a slab-out-of-bounds Write bug: loop0: detected capacity change from 0 to 2048 ================================================================== BUG: KASAN: slab-out-of-bounds in udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253 Write of size 105 at addr ffff8880123ff896 by task syz-executor323/3610 CPU: 0 PID: 3610 Comm: syz-executor323 Not tainted 6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 kasan_check_range+0x2a7/0x2e0 mm/kasan/generic.c:189 memcpy+0x3c/0x60 mm/kasan/shadow.c:66 udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253 udf_lookup+0xef/0x340 fs/udf/namei.c:309 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7ffab0d164d9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe1a7e6bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffab0d164d9 RDX: 00007ffab0d164d9 RSI: 0000000000000000 RDI: 0000000020000180 RBP: 00007ffab0cd5a10 R08: 0000000000000000 R09: 0000000000000000 R10: 00005555573552c0 R11: 0000000000000246 R12: 00007ffab0cd5aa0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 3610: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] __kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380 kmalloc include/linux/slab.h:576 [inline] udf_find_entry+0x7b6/0x14f0 fs/udf/namei.c:243 udf_lookup+0xef/0x340 fs/udf/namei.c:309 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff8880123ff800 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 150 bytes inside of 256-byte region [ffff8880123ff800, ffff8880123ff900) The buggy address belongs to the physical page: page:ffffea000048ff80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x123fe head:ffffea000048ff80 order:1 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 ffffea00004b8500 dead000000000003 ffff888012041b40 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x0(), pid 1, tgid 1 (swapper/0), ts 1841222404, free_ts 0 create_dummy_stack mm/page_owner.c:67 [inline] register_early_stack+0x77/0xd0 mm/page_owner.c:83 init_page_owner+0x3a/0x731 mm/page_owner.c:93 kernel_init_freeable+0x41c/0x5d5 init/main.c:1629 kernel_init+0x19/0x2b0 init/main.c:1519 page_owner free stack trace missing Memory state around the buggy address: ffff8880123ff780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880123ff800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880123ff880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 ^ ffff8880123ff900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880123ff980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fix this by changing the memory size allocated for copy_name from UDF_NAME_LEN(254) to UDF_NAME_LEN_CS0(255), because the total length (lfi) of subsequent memcpy can be up to 255. CC: stable@vger.kernel.org Reported-by: syzbot+69c9fdccc6dd08961d34@syzkaller.appspotmail.com Fixes: 066b9cded00b ("udf: Use separate buffer for copying split names") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221109013542.442790-1-zhangpeng362@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit cef34a856d204bb3f3fc266293ca5d874610eae6 Author: Brian Norris <briannorris@chromium.org> Date: Wed Oct 26 12:42:06 2022 -0700 mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI commit fb1dec44c6750bb414f47b929c8c175a1a127c31 upstream. [[ NOTE: this is completely untested by the author, but included solely because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other drivers using CQHCI might benefit from a similar change, if they also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same bug on at least MSM, Arasan, and Intel hardware. ]] SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't tracking that properly in software. When out of sync, we may trigger various timeouts. It's not typical to perform resets while CQE is enabled, but this may occur in some suspend or error recovery scenarios. Include this fix by way of the new sdhci_and_cqhci_reset() helper. This patch depends on (and should not compile without) the patch entitled "mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI". Fixes: bb6e358169bf ("mmc: sdhci-esdhc-imx: add CMDQ support") Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221026124150.v4.4.I7d01f9ad11bacdc9213dee61b7918982aea39115@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 442fd1bfe599bc54d118775e9e1a4fe913e4b369 Author: Roger Quadros <rogerq@kernel.org> Date: Wed Nov 2 12:31:44 2022 +0200 net: ethernet: ti: am65-cpsw: Fix segmentation fault at module unload commit 1a0c016a4831ea29be09bbc8162d4a2a0690b4b8 upstream. Move am65_cpsw_nuss_phylink_cleanup() call to after am65_cpsw_nuss_cleanup_ndev() so phylink is still valid to prevent the below Segmentation fault on module remove when first slave link is up. [ 31.652944] Unable to handle kernel paging request at virtual address 00040008000005f4 [ 31.684627] Mem abort info: [ 31.687446] ESR = 0x0000000096000004 [ 31.704614] EC = 0x25: DABT (current EL), IL = 32 bits [ 31.720663] SET = 0, FnV = 0 [ 31.723729] EA = 0, S1PTW = 0 [ 31.740617] FSC = 0x04: level 0 translation fault [ 31.756624] Data abort info: [ 31.759508] ISV = 0, ISS = 0x00000004 [ 31.776705] CM = 0, WnR = 0 [ 31.779695] [00040008000005f4] address between user and kernel address ranges [ 31.808644] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 31.814928] Modules linked in: wlcore_sdio wl18xx wlcore mac80211 libarc4 cfg80211 rfkill crct10dif_ce phy_gmii_sel ti_am65_cpsw_nuss(-) sch_fq_codel ipv6 [ 31.828776] CPU: 0 PID: 1026 Comm: modprobe Not tainted 6.1.0-rc2-00012-gfabfcf7dafdb-dirty #160 [ 31.837547] Hardware name: Texas Instruments AM625 (DT) [ 31.842760] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 31.849709] pc : phy_stop+0x18/0xf8 [ 31.853202] lr : phylink_stop+0x38/0xf8 [ 31.857031] sp : ffff80000a0839f0 [ 31.860335] x29: ffff80000a0839f0 x28: ffff000000de1c80 x27: 0000000000000000 [ 31.867462] x26: 0000000000000000 x25: 0000000000000000 x24: ffff80000a083b98 [ 31.874589] x23: 0000000000000800 x22: 0000000000000001 x21: ffff000001bfba90 [ 31.881715] x20: ffff0000015ee000 x19: 0004000800000200 x18: 0000000000000000 [ 31.888842] x17: ffff800076c45000 x16: ffff800008004000 x15: 000058e39660b106 [ 31.895969] x14: 0000000000000144 x13: 0000000000000144 x12: 0000000000000000 [ 31.903095] x11: 000000000000275f x10: 00000000000009e0 x9 : ffff80000a0837d0 [ 31.910222] x8 : ffff000000de26c0 x7 : ffff00007fbd6540 x6 : ffff00007fbd64c0 [ 31.917349] x5 : ffff00007fbd0b10 x4 : ffff00007fbd0b10 x3 : ffff00007fbd3920 [ 31.924476] x2 : d0a07fcff8b8d500 x1 : 0000000000000000 x0 : 0004000800000200 [ 31.931603] Call trace: [ 31.934042] phy_stop+0x18/0xf8 [ 31.937177] phylink_stop+0x38/0xf8 [ 31.940657] am65_cpsw_nuss_ndo_slave_stop+0x28/0x1e0 [ti_am65_cpsw_nuss] [ 31.947452] __dev_close_many+0xa4/0x140 [ 31.951371] dev_close_many+0x84/0x128 [ 31.955115] unregister_netdevice_many+0x130/0x6d0 [ 31.959897] unregister_netdevice_queue+0x94/0xd8 [ 31.964591] unregister_netdev+0x24/0x38 [ 31.968504] am65_cpsw_nuss_cleanup_ndev.isra.0+0x48/0x70 [ti_am65_cpsw_nuss] [ 31.975637] am65_cpsw_nuss_remove+0x58/0xf8 [ti_am65_cpsw_nuss] Cc: <Stable@vger.kernel.org> # v5.18+ Fixes: e8609e69470f ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c559a8b5cfa3db196ced0257b288f17027621348 Author: Johan Hovold <johan+linaro@kernel.org> Date: Wed Oct 26 18:21:16 2022 +0200 phy: qcom-qmp-combo: fix NULL-deref on runtime resume commit 04948e757148f870a31f4887ea2239403f516c3c upstream. Commit fc64623637da ("phy: qcom-qmp-combo,usb: add support for separate PCS_USB region") started treating the PCS_USB registers as potentially separate from the PCS registers but used the wrong base when no PCS_USB offset has been provided. Fix the PCS_USB base used at runtime resume to prevent dereferencing a NULL pointer on platforms that do not provide a PCS_USB offset (e.g. SC7180). Fixes: fc64623637da ("phy: qcom-qmp-combo,usb: add support for separate PCS_USB region") Cc: stable@vger.kernel.org # 5.20 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20221026162116.26462-1-johan+linaro@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 93beea337fdee2ea0cebaf55d78a92cc0e3cb5f5 Author: Jens Axboe <axboe@kernel.dk> Date: Thu Nov 10 10:50:55 2022 -0700 io_uring: check for rollover of buffer ID when providing buffers commit 3851d25c75ed03117268a8feb34adca5a843a126 upstream. We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: https://github.com/axboe/liburing/issues/726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 544f38a738343d7e75f104e5e9d1ade58d8b71bd Author: Johannes Thumshirn <johannes.thumshirn@wdc.com> Date: Fri Nov 4 07:12:34 2022 -0700 btrfs: zoned: initialize device's zone info for seeding commit a8d1b1647bf8244a5f270538e9e636e2657fffa3 upstream. When performing seeding on a zoned filesystem it is necessary to initialize each zoned device's btrfs_zoned_device_info structure, otherwise mounting the filesystem will cause a NULL pointer dereference. This was uncovered by fstests' testcase btrfs/163. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ad88cabcec942c033f980cd1e28d56ecdaf5f3b8 Author: Johannes Thumshirn <johannes.thumshirn@wdc.com> Date: Fri Nov 4 07:12:33 2022 -0700 btrfs: zoned: clone zoned device info when cloning a device commit 21e61ec6d0bb786818490e926aa9aeb4de95ad0d upstream. When cloning a btrfs_device, we're not cloning the associated btrfs_zoned_device_info structure of the device in case of a zoned filesystem. Later on this leads to a NULL pointer dereference when accessing the device's zone_info for instance when setting a zone as active. This was uncovered by fstests' testcase btrfs/161. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 455af64a72aa2507e77f94e115967f644b5ed5c0 Author: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Date: Tue Nov 1 10:53:54 2022 +0800 btrfs: selftests: fix wrong error check in btrfs_free_dummy_root() commit 9b2f20344d450137d015b380ff0c2e2a6a170135 upstream. The btrfs_alloc_dummy_root() uses ERR_PTR as the error return value rather than NULL, if error happened, there will be a NULL pointer dereference: BUG: KASAN: null-ptr-deref in btrfs_free_dummy_root+0x21/0x50 [btrfs] Read of size 8 at addr 000000000000002c by task insmod/258926 CPU: 2 PID: 258926 Comm: insmod Tainted: G W 6.1.0-rc2+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> …

We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit 3851d25 upstream. We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: https://bugs.launchpad.net/bugs/1996785 commit 3851d25 upstream. We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: axboe/liburing#726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>

axboe closed this as completed Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IORING_OP_PROVIDE_BUFFERS range validation issue #726

IORING_OP_PROVIDE_BUFFERS range validation issue #726

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022

axboe commented Nov 10, 2022

IORING_OP_PROVIDE_BUFFERS range validation issue #726

IORING_OP_PROVIDE_BUFFERS range validation issue #726

Comments

lano1106 commented Nov 10, 2022 • edited Loading

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022 • edited Loading

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022 • edited Loading

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 • edited Loading

axboe commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022

axboe commented Nov 10, 2022

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

axboe commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading

lano1106 commented Nov 10, 2022 •

edited

Loading