incorrect representation of immediate in traced CLCs #2

brooksdavis · 2016-02-29T09:05:38Z

The immediate values on CLCs seems to be wrong in text instruction traces. For example 481:

0x00000001200df674:  clc        c3,s8,481(c11)
    Cap Memory Read [0000007fffffc4c0] = v:1 tps:00000000ffff00fa
    c:0000007fffffc5c0 b:0000007fffffc5c0 l:0000000000000020
    Write C03|v:1 s:0 p:7fff807d b:0000007fffffc5c0 l:0000000000000020
             |o:0000000000000000 t:0

Objdump reports:

  1200df674:   d86bf080        clc     c3,s8,128(c11)

The text was updated successfully, but these errors were encountered:

staceyson · 2016-02-29T17:53:54Z

Fixed with dbb6810

"name" is freed after visiting options, instead use the first NetClientState name. Adds a few assert() for clarifying and checking some impossible states. READ of size 1 at 0x602000000990 thread T0 #0 0x7f6b251c570c (/lib64/libasan.so.2+0x4770c) CTSRD-CHERI#1 0x5566dc380600 in qemu_find_net_clients_except net/net.c:824 CTSRD-CHERI#2 0x5566dc39bac7 in net_vhost_user_event net/vhost-user.c:193 CTSRD-CHERI#3 0x5566dbee862a in qemu_chr_be_event /home/elmarco/src/qemu/qemu-char.c:201 CTSRD-CHERI#4 0x5566dbef2890 in tcp_chr_disconnect /home/elmarco/src/qemu/qemu-char.c:2790 CTSRD-CHERI#5 0x5566dbef2d0b in tcp_chr_sync_read /home/elmarco/src/qemu/qemu-char.c:2835 CTSRD-CHERI#6 0x5566dbee8a99 in qemu_chr_fe_read_all /home/elmarco/src/qemu/qemu-char.c:295 CTSRD-CHERI#7 0x5566dc39b964 in net_vhost_user_watch net/vhost-user.c:180 CTSRD-CHERI#8 0x5566dc5a06c7 in qio_channel_fd_source_dispatch io/channel-watch.c:70 CTSRD-CHERI#9 0x7f6b1aa2ab87 in g_main_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3154 CTSRD-CHERI#10 0x7f6b1aa2b9cb in g_main_context_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3769 CTSRD-CHERI#11 0x5566dc475ed4 in glib_pollfds_poll /home/elmarco/src/qemu/main-loop.c:212 CTSRD-CHERI#12 0x5566dc476029 in os_host_main_loop_wait /home/elmarco/src/qemu/main-loop.c:257 CTSRD-CHERI#13 0x5566dc476165 in main_loop_wait /home/elmarco/src/qemu/main-loop.c:505 CTSRD-CHERI#14 0x5566dbf08d31 in main_loop /home/elmarco/src/qemu/vl.c:1932 CTSRD-CHERI#15 0x5566dbf16783 in main /home/elmarco/src/qemu/vl.c:4646 CTSRD-CHERI#16 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f) CTSRD-CHERI#17 0x5566dbbf5348 in _start (/home/elmarco/src/qemu/x86_64-softmmu/qemu-system-x86_64+0x3f9348) 0x602000000990 is located 0 bytes inside of 5-byte region [0x602000000990,0x602000000995) freed by thread T0 here: #0 0x7f6b2521666a in __interceptor_free (/lib64/libasan.so.2+0x9866a) CTSRD-CHERI#1 0x7f6b1aa332a4 in g_free /home/elmarco/src/gnome/glib/glib/gmem.c:189 CTSRD-CHERI#2 0x5566dc5f416f in qapi_dealloc_type_str qapi/qapi-dealloc-visitor.c:134 CTSRD-CHERI#3 0x5566dc5f3268 in visit_type_str qapi/qapi-visit-core.c:196 CTSRD-CHERI#4 0x5566dc5ced58 in visit_type_Netdev_fields /home/elmarco/src/qemu/qapi-visit.c:5936 CTSRD-CHERI#5 0x5566dc5cef71 in visit_type_Netdev /home/elmarco/src/qemu/qapi-visit.c:5960 CTSRD-CHERI#6 0x5566dc381a8d in net_visit net/net.c:1049 CTSRD-CHERI#7 0x5566dc381c37 in net_client_init net/net.c:1076 CTSRD-CHERI#8 0x5566dc3839e2 in net_init_netdev net/net.c:1473 CTSRD-CHERI#9 0x5566dc63cc0a in qemu_opts_foreach util/qemu-option.c:1112 CTSRD-CHERI#10 0x5566dc383b36 in net_init_clients net/net.c:1499 CTSRD-CHERI#11 0x5566dbf15d86 in main /home/elmarco/src/qemu/vl.c:4397 CTSRD-CHERI#12 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

"nc" is freed after hotplug vhost-user, but the watcher is not removed. The QEMU crash when the watcher access the "nc" when socket disconnects. Program received signal SIGSEGV, Segmentation fault. #0 object_get_class (obj=obj@entry=0x2) at qom/object.c:750 #1 0x00007f9bb4180da1 in qemu_chr_fe_disconnect (be=<optimized out>) at chardev/char-fe.c:372 #2 0x00007f9bb40d1100 in net_vhost_user_watch (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at net/vhost-user.c:188 #3 0x00007f9baf97f99a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #4 0x00007f9bb41d7ebc in glib_pollfds_poll () at util/main-loop.c:213 #5 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 #6 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515 #7 0x00007f9bb3e266a7 in main_loop () at vl.c:1917 #8 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4786 Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

The instruction "ucvtf v0.4h, v04h, #2", with input 0x8000u, overflows the intermediate float16 to infinity before we have a chance to scale the output. Use float64 as the intermediate type so that no input argument (uint32_t in this case) can overflow or round before scaling. Given the declared argument, the signed int32_t function has the same problem. When converting from float16 to integer, using u/int32_t instead of u/int16_t means that the bounding is incorrect. Cc: qemu-stable@nongnu.org Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180502221552.3873-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

@ds

After f771c54 it is possible to select device and head which to take screendump from. And even though we check if provided head number falls within range, it may still happen that the console has no surface yet leading to SIGSEGV: qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 \ -qmp stdio \ -device virtio-vga,id=video0,max_outputs=4 {"execute":"qmp_capabilities"} {"execute":"screendump", "arguments":{"filename":"/tmp/screen.ppm", "device":"video0", "head":1}} Segmentation fault #0 0x00005628249dda88 in ppm_save (filename=0x56282826cbc0 "/tmp/screen.ppm", ds=0x0, errp=0x7fff52a6fae0) at ui/console.c:304 #1 0x00005628249ddd9b in qmp_screendump (filename=0x56282826cbc0 "/tmp/screen.ppm", has_device=true, device=0x5628276902d0 "video0", has_head=true, head=1, errp=0x7fff52a6fae0) at ui/console.c:375 #2 0x00005628247740df in qmp_marshal_screendump (args=0x562828265e00, ret=0x7fff52a6fb68, errp=0x7fff52a6fb60) at qapi/qapi-commands-ui.c:110 Here, @ds from frame #0 (or @surface from frame #1) is dereferenced at the very beginning of ppm_save(). And because it's NULL crash happens. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: cb05bb1909daa6ba62145c0194aafa05a14ed3d1.1526569138.git.mprivozn@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

…nnect When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

…er!) The baseline is e38b03f. Benchmark #1: /auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128 --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub Time (mean ± σ): 29.330 s ± 0.627 s [User: 26.381 s, System: 0.415 s] Range (min … max): 28.588 s … 30.587 s Benchmark #2: /auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128.before.log --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub Time (mean ± σ): 35.656 s ± 0.603 s [User: 32.812 s, System: 0.370 s] Range (min … max): 34.855 s … 36.412 s Summary '/auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128 --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub' ran 1.22 ± 0.03 times faster than '/auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128.before.log --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub'

ivq/dvq/svq/free_page_vq is forgot to cleanup in virtio_balloon_device_unrealize, the memory leak stack is as follow: Direct leak of 14336 byte(s) in 2 object(s) allocated from: #0 0x7f99fd9d8560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7f99fcb20015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x557d90638437 in virtio_add_queue hw/virtio/virtio.c:2327 #3 0x557d9064401d in virtio_balloon_device_realize hw/virtio/virtio-balloon.c:793 #4 0x557d906356f7 in virtio_device_realize hw/virtio/virtio.c:3504 #5 0x557d9073f081 in device_set_realized hw/core/qdev.c:876 #6 0x557d908b1f4d in property_set_bool qom/object.c:2080 #7 0x557d908b655e in object_property_set_qobject qom/qom-qobject.c:26 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <1575444716-17632-2-git-send-email-pannengyuan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>

ivqs/ovqs/c_ivq/c_ovq is forgot to cleanup in virtio_serial_device_unrealize, the memory leak stack is as bellow: Direct leak of 1290240 byte(s) in 180 object(s) allocated from: #0 0x7fc9bfc27560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7fc9bed6f015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x5650e02b83e7 in virtio_add_queue hw/virtio/virtio.c:2327 #3 0x5650e02847b5 in virtio_serial_device_realize hw/char/virtio-serial-bus.c:1089 #4 0x5650e02b56a7 in virtio_device_realize hw/virtio/virtio.c:3504 #5 0x5650e03bf031 in device_set_realized hw/core/qdev.c:876 #6 0x5650e0531efd in property_set_bool qom/object.c:2080 #7 0x5650e053650e in object_property_set_qobject qom/qom-qobject.c:26 #8 0x5650e0533e14 in object_property_set_bool qom/object.c:1338 #9 0x5650e04c0e37 in virtio_pci_realize hw/virtio/virtio-pci.c:1801 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: Amit Shah <amit@kernel.org> Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1575444716-17632-3-git-send-email-pannengyuan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Currently the SLOF firmware for pseries guests will disable/re-enable a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. In QEMU, when PCI_COMMAND_MASTER is disabled we disable the corresponding IOMMU memory region, so DMA accesses (including to vring fields like idx/flags) will no longer undergo the necessary translation. Normally we wouldn't expect this to happen since it would be misbehavior on the driver side to continue driving DMA requests. However, in the case of pseries, with iommu_platform=on, we trigger the following sequence when tearing down the virtio-blk dataplane ioeventfd in response to the guest unsetting PCI_COMMAND_MASTER: #2 0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757 #3 0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950 #4 0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255 #5 0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776 #6 0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550 #7 0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546 #8 0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527 #9 0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8) at /home/mdroth/w/qemu.git/util/aio-posix.c:520 #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0) at /home/mdroth/w/qemu.git/util/aio-posix.c:607 #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true) at /home/mdroth/w/qemu.git/util/aio-posix.c:639 #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0) at /home/mdroth/w/qemu.git/util/aio-wait.c:71 #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288 #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245 #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237 #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292 #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613 I.e. the calling code is only scheduling a one-shot BH for virtio_blk_data_plane_stop_bh, but somehow we end up trying to process an additional virtqueue entry before we get there. This is likely due to the following check in virtio_queue_host_notifier_aio_poll: static bool virtio_queue_host_notifier_aio_poll(void *opaque) { EventNotifier *n = opaque; VirtQueue *vq = container_of(n, VirtQueue, host_notifier); bool progress; if (!vq->vring.desc || virtio_queue_empty(vq)) { return false; } progress = virtio_queue_notify_aio_vq(vq); namely the call to virtio_queue_empty(). In this case, since no new requests have actually been issued, shadow_avail_idx == last_avail_idx, so we actually try to access the vring via vring_avail_idx() to get the latest non-shadowed idx: int virtio_queue_empty(VirtQueue *vq) { bool empty; ... if (vq->shadow_avail_idx != vq->last_avail_idx) { return 0; } rcu_read_lock(); empty = vring_avail_idx(vq) == vq->last_avail_idx; rcu_read_unlock(); return empty; but since the IOMMU region has been disabled we get a bogus value (0 usually), which causes virtio_queue_empty() to falsely report that there are entries to be processed, which causes errors such as: "virtio: zero sized buffers are not allowed" or "virtio-blk missing headers" and puts the device in an error state. This patch works around the issue by introducing virtio_set_disabled(), which sets a 'disabled' flag to bypass checks like virtio_queue_empty() when bus-mastering is disabled. Since we'd check this flag at all the same sites as vdev->broken, we replace those checks with an inline function which checks for either vdev->broken or vdev->disabled. The 'disabled' flag is only migrated when set, which should be fairly rare, but to maintain migration compatibility we disable it's use for older machine types. Users requiring the use of the flag in conjunction with older machine types can set it explicitly as a virtio-device option. NOTES: - This leaves some other oddities in play, like the fact that DRIVER_OK also gets unset in response to bus-mastering being disabled, but not restored (however the device seems to continue working) - Similarly, we disable the host notifier via virtio_bus_stop_ioeventfd(), which seems to move the handling out of virtio-blk dataplane and back into the main IO thread, and it ends up staying there till a reset (but otherwise continues working normally) Cc: David Gibson <david@gibson.dropbear.id.au>, Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

This avoid a memory leak when qom-set is called to set throttle_group limits, here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -qcow2 184 Following is the asan output backtrack: Direct leak of 912 byte(s) in 3 object(s) allocated from: #0 0xffff8d7ab3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3) #1 0xffff8d4c31cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) #2 0x190c857 in qobject_input_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x19070df in visit_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x1948b87 in visit_type_ThrottleLimits qapi/qapi-visit-block-core.c:3759 #5 0x17e4aa3 in throttle_group_set_limits /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900 #6 0x1650eff in object_property_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272 #7 0x1658517 in object_property_set_qobject /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26 #8 0x15880bb in qmp_qom_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74 #9 0x157e3e3 in qmp_marshal_qom_set qapi/qapi-commands-qom.c:154 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: PanNengyuan <pannengyuan@huawei.com> Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>

The accel_list forgot to free, the asan output: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0xffff919331cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) #1 0xffff913f7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) #2 0xffff91413d9b in g_strsplit (/lib64/libglib-2.0.so.0+0x73d9b) #3 0xaaab42fb58e7 in configure_accelerators /qemu/vl.c:2777 #4 0xaaab42fb58e7 in main /qemu/vl.c:4121 #5 0xffff8f9b0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) #6 0xaaab42fc1dab (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b1dab) Indirect leak of 4 byte(s) in 1 object(s) allocated from: #0 0xffff919331cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) #1 0xffff913f7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) #2 0xffff9141243b in g_strdup (/lib64/libglib-2.0.so.0+0x7243b) #3 0xffff91413e6f in g_strsplit (/lib64/libglib-2.0.so.0+0x73e6f) #4 0xaaab42fb58e7 in configure_accelerators /qemu/vl.c:2777 #5 0xaaab42fb58e7 in main /qemu/vl.c:4121 #6 0xffff8f9b0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) #7 0xaaab42fc1dab (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b1dab) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200108114207.58084-1-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>

start vm with libvirt, when GuestOS running, enter poweroff command using the xhci keyboard, then ASAN shows memory leak stack： Direct leak of 80 byte(s) in 5 object(s) allocated from: #0 0xfffd1e6431cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) #1 0xfffd1e107163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) #2 0xaaad39051367 in qemu_sglist_init /qemu/dma-helpers.c:43 #3 0xaaad3947c407 in pci_dma_sglist_init /qemu/include/hw/pci/pci.h:842 #4 0xaaad3947c407 in xhci_xfer_create_sgl /qemu/hw/usb/hcd-xhci.c:1446 #5 0xaaad3947c407 in xhci_setup_packet /qemu/hw/usb/hcd-xhci.c:1618 #6 0xaaad3948625f in xhci_submit /qemu/hw/usb/hcd-xhci.c:1827 #7 0xaaad3948625f in xhci_fire_transfer /qemu/hw/usb/hcd-xhci.c:1839 #8 0xaaad3948625f in xhci_kick_epctx /qemu/hw/usb/hcd-xhci.c:1991 #9 0xaaad3948f537 in xhci_doorbell_write /qemu/hw/usb/hcd-xhci.c:3158 #10 0xaaad38bcbfc7 in memory_region_write_accessor /qemu/memory.c:483 #11 0xaaad38bc654f in access_with_adjusted_size /qemu/memory.c:544 #12 0xaaad38bd1877 in memory_region_dispatch_write /qemu/memory.c:1482 #13 0xaaad38b1c77f in flatview_write_continue /qemu/exec.c:3167 #14 0xaaad38b1ca83 in flatview_write /qemu/exec.c:3207 #15 0xaaad38b268db in address_space_write /qemu/exec.c:3297 #16 0xaaad38bf909b in kvm_cpu_exec /qemu/accel/kvm/kvm-all.c:2383 #17 0xaaad38bb063f in qemu_kvm_cpu_thread_fn /qemu/cpus.c:1246 #18 0xaaad39821c93 in qemu_thread_start /qemu/util/qemu-thread-posix.c:519 #19 0xfffd1c8378bb (/lib64/libpthread.so.0+0x78bb) #20 0xfffd1c77616b (/lib64/libc.so.6+0xd616b) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-id: 20200110105855.81144-1-kuhn.chenqun@huawei.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

One multifd channel will shutdown all the other multifd's IOChannel when it fails to receive an IOChannel. In this senario, if some multifds had not received its IOChannel yet, it would try to shutdown its IOChannel which could cause nullptr access at qio_channel_shutdown. Here is the coredump stack: #0 object_get_class (obj=obj@entry=0x0) at qom/object.c:908 #1 0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355 #2 0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280 #3 0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478 #4 0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605 #5 0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44 #6 0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166 #7 0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54 #8 0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #9 0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218 #10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241 #11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517 #12 0x00005563fd8edb37 in main_loop () at vl.c:1791 #13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473 To fix it up, let's check p->c before calling qio_channel_shutdown. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

…threads One multifd will lock all the other multifds' IOChannel mutex to inform them to quit by setting p->quit or shutting down p->c. In this senario, if some multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup had destroyed their mutex, it could cause destroyed mutex access when trying lock their mutex. Here is the coredump stack: #0 0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6 #1 0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6 #2 0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6 #3 0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6 #4 0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64 #5 0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015 #6 0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171 #7 0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502 #8 0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0 #9 0x00007f81a286048d in clone () from /usr/lib64/libc.so.6 To fix it up, let's destroy the mutex after all the other multifd threads had been terminated. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

v->vq forgot to cleanup in virtio_9p_device_unrealize, the memory leak stack is as follow: Direct leak of 14336 byte(s) in 2 object(s) allocated from: #0 0x7f819ae43970 (/lib64/libasan.so.5+0xef970) ??:? #1 0x7f819872f49d (/lib64/libglib-2.0.so.0+0x5249d) ??:? #2 0x55a3a58da624 (./x86_64-softmmu/qemu-system-x86_64+0x2c14624) /mnt/sdb/qemu/hw/virtio/virtio.c:2327 #3 0x55a3a571bac7 (./x86_64-softmmu/qemu-system-x86_64+0x2a55ac7) /mnt/sdb/qemu/hw/9pfs/virtio-9p-device.c:209 #4 0x55a3a58e7bc6 (./x86_64-softmmu/qemu-system-x86_64+0x2c21bc6) /mnt/sdb/qemu/hw/virtio/virtio.c:3504 #5 0x55a3a5ebfb37 (./x86_64-softmmu/qemu-system-x86_64+0x31f9b37) /mnt/sdb/qemu/hw/core/qdev.c:876 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200117060927.51996-2-pannengyuan@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Greg Kurz <groug@kaod.org>

This patch fix memleaks when attaching/detaching virtio-scsi device, the memory leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970) ??:? #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d) ??:? #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912 #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924 #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224) /mnt/sdb/qemu/hw/core/qdev.c:865 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200117075547.60864-2-pannengyuan@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This patch save receive/transmit vq pointer in realize() and cleanup vqs through those vq pointers in unrealize(). The leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970) ??:? #1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d) ??:? #2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 #3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208) /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339 #4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 #5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65) /mnt/sdb/qemu/hw/core/qdev.c:865 #6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41) /mnt/sdb/qemu/qom/object.c:2102 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200115062535.50644-1-pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

When adding new devices implementing QOM interfaces, we might forgot to add the Kconfig dependency that pulls the required objects in when building. Since QOM dependencies are resolved at runtime, we don't get any link-time failures, and QEMU aborts while starting: $ qemu ... Segmentation fault (core dumped) (gdb) bt #0 0x00007ff6e96b1e35 in raise () from /lib64/libc.so.6 #1 0x00007ff6e969c895 in abort () from /lib64/libc.so.6 #2 0x00005572bc5051cf in type_initialize (ti=0x5572be6f1200) at qom/object.c:323 #3 0x00005572bc505074 in type_initialize (ti=0x5572be6f1800) at qom/object.c:301 #4 0x00005572bc505074 in type_initialize (ti=0x5572be6e48e0) at qom/object.c:301 #5 0x00005572bc506939 in object_class_by_name (typename=0x5572bc56109a) at qom/object.c:959 #6 0x00005572bc503dd5 in cpu_class_by_name (typename=0x5572bc56109a, cpu_model=0x5572be6d9930) at hw/core/cpu.c:286 Since the caller has access to the qdev parent/interface names, we can simply display them to avoid starting a debugger: $ qemu ... qemu: missing interface 'fancy-if' for object 'fancy-dev' Aborted (core dumped) This commit is similar to e02bdf1 ("Display more helpful message when an object type is missing"). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200118162348.17823-1-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This patch fix memleaks when attaching/detaching virtio-scsi device, the memory leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970) ??:? #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d) ??:? #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912 #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924 #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224) /mnt/sdb/qemu/hw/core/qdev.c:865 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200117075547.60864-2-pannengyuan@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 #5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 #6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 #7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 #8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 #9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 #10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 #11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 #13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 #14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 #15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 #16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 #22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Dirty map addition and removal functions are not acquiring to BDS AioContext, while they may call to code that expects it to be acquired. This may trigger a crash with a stack trace like this one: #0 0x00007f0ef146370f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f0ef144db25 in __GI_abort () at abort.c:79 #2 0x0000565022294dce in error_exit (err=<optimized out>, msg=msg@entry=0x56502243a730 <__func__.16350> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x00005650222950ba in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5650244b0240, file=file@entry=0x565022439adf "util/async.c", line=line@entry=526) at util/qemu-thread-posix.c:108 #4 0x0000565022290029 in aio_context_release (ctx=ctx@entry=0x5650244b01e0) at util/async.c:526 #5 0x000056502221cd08 in bdrv_can_store_new_dirty_bitmap (bs=bs@entry=0x5650244dc820, name=name@entry=0x56502481d360 "bitmap1", granularity=granularity@entry=65536, errp=errp@entry=0x7fff22831718) at block/dirty-bitmap.c:542 #6 0x000056502206ae53 in qmp_block_dirty_bitmap_add (errp=0x7fff22831718, disabled=false, has_disabled=<optimized out>, persistent=<optimized out>, has_persistent=true, granularity=65536, has_granularity=<optimized out>, name=0x56502481d360 "bitmap1", node=<optimized out>) at blockdev.c:2894 #7 0x000056502206ae53 in qmp_block_dirty_bitmap_add (node=<optimized out>, name=0x56502481d360 "bitmap1", has_granularity=<optimized out>, granularity=<optimized out>, has_persistent=true, persistent=<optimized out>, has_disabled=false, disabled=false, errp=0x7fff22831718) at blockdev.c:2856 #8 0x00005650221847a3 in qmp_marshal_block_dirty_bitmap_add (args=<optimized out>, ret=<optimized out>, errp=0x7fff22831798) at qapi/qapi-commands-block-core.c:651 #9 0x0000565022247e6c in do_qmp_dispatch (errp=0x7fff22831790, allow_oob=<optimized out>, request=<optimized out>, cmds=0x565022b32d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 #10 0x0000565022247e6c in qmp_dispatch (cmds=0x565022b32d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #11 0x0000565022166061 in monitor_qmp_dispatch (mon=0x56502450faa0, req=<optimized out>) at monitor/qmp.c:145 #12 0x00005650221666fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 #13 0x000056502228f866 in aio_bh_call (bh=0x56502440eae0) at util/async.c:117 #14 0x000056502228f866 in aio_bh_poll (ctx=ctx@entry=0x56502440d7a0) at util/async.c:117 #15 0x0000565022292c54 in aio_dispatch (ctx=0x56502440d7a0) at util/aio-posix.c:459 #16 0x000056502228f742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #17 0x00007f0ef5ce667d in g_main_dispatch (context=0x56502449aa40) at gmain.c:3176 #18 0x00007f0ef5ce667d in g_main_context_dispatch (context=context@entry=0x56502449aa40) at gmain.c:3829 #19 0x0000565022291d08 in glib_pollfds_poll () at util/main-loop.c:219 #20 0x0000565022291d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #21 0x0000565022291d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #22 0x00005650220743c1 in main_loop () at vl.c:1828 #23 0x0000565021f20a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by acquiring the AioContext at qmp_block_dirty_bitmap_add() and qmp_block_dirty_bitmap_add(). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782175 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

external_snapshot_abort() calls to bdrv_set_backing_hd(), which returns state->old_bs to the main AioContext, as it's intended to be used then the BDS is going to be released. As that's not the case when aborting an external snapshot, return it to the AioContext it was before the call. This issue can be triggered by issuing a transaction with two actions, a proper blockdev-snapshot-sync and a bogus one, so the second will trigger a transaction abort. This results in a crash with an stack trace like this one: #0 0x00007fa1048b28df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fa10489ccf5 in __GI_abort () at abort.c:79 #2 0x00007fa10489cbc9 in __assert_fail_base (fmt=0x7fa104a03300 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=0x557224014d30 "block.c", line=2240, function=<optimized out>) at assert.c:92 #3 0x00007fa1048aae96 in __GI___assert_fail (assertion=assertion@entry=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=file@entry=0x557224014d30 "block.c", line=line@entry=2240, function=function@entry=0x5572240b5d60 <__PRETTY_FUNCTION__.31620> "bdrv_replace_child_noperm") at assert.c:101 #4 0x0000557223e631f8 in bdrv_replace_child_noperm (child=0x557225b9c980, new_bs=new_bs@entry=0x557225c42e40) at block.c:2240 #5 0x0000557223e68be7 in bdrv_replace_node (from=0x557226951a60, to=0x557225c42e40, errp=0x5572247d6138 <error_abort>) at block.c:4196 #6 0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1731 #7 0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1717 #8 0x0000557223d09013 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x557225cc7d70, errp=errp@entry=0x7ffe704c0c98) at blockdev.c:2360 #9 0x0000557223e32085 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffe704c0d08) at qapi/qapi-commands-transaction.c:44 #10 0x0000557223ee798c in do_qmp_dispatch (errp=0x7ffe704c0d00, allow_oob=<optimized out>, request=<optimized out>, cmds=0x5572247d3cc0 <qmp_commands>) at qapi/qmp-dispatch.c:132 #11 0x0000557223ee798c in qmp_dispatch (cmds=0x5572247d3cc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #12 0x0000557223e06141 in monitor_qmp_dispatch (mon=0x557225c69ff0, req=<optimized out>) at monitor/qmp.c:120 #13 0x0000557223e0678a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209 #14 0x0000557223f2f366 in aio_bh_call (bh=0x557225b9dc60) at util/async.c:117 #15 0x0000557223f2f366 in aio_bh_poll (ctx=ctx@entry=0x557225b9c840) at util/async.c:117 #16 0x0000557223f32754 in aio_dispatch (ctx=0x557225b9c840) at util/aio-posix.c:459 #17 0x0000557223f2f242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #18 0x00007fa10913467d in g_main_dispatch (context=0x557225c28e80) at gmain.c:3176 #19 0x00007fa10913467d in g_main_context_dispatch (context=context@entry=0x557225c28e80) at gmain.c:3829 #20 0x0000557223f31808 in glib_pollfds_poll () at util/main-loop.c:219 #21 0x0000557223f31808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #22 0x0000557223f31808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #23 0x0000557223d13201 in main_loop () at vl.c:1828 #24 0x0000557223bbfb82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1779036 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

If the multifd_send_threads is not created when migration is failed, multifd_save_cleanup would be called twice. In this senario, the multifd_send_state is accessed after it has been released, the result is that the source VM is crashing down. Here is the coredump stack: Program received signal SIGSEGV, Segmentation fault. 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 1012 MultiFDSendParams *p = &multifd_send_state->params[i]; #0 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 #1 0x00005629333ab8a9 in multifd_save_cleanup () at migration/ram.c:1028 #2 0x00005629333abaea in multifd_new_send_channel_async (task=0x562935450e70, opaque=<optimized out>) at migration/ram.c:1202 #3 0x000056293373a562 in qio_task_complete (task=task@entry=0x562935450e70) at io/task.c:196 #4 0x000056293373a6e0 in qio_task_thread_result (opaque=0x562935450e70) at io/task.c:111 #5 0x00007f475d4d75a7 in g_idle_dispatch () from /usr/lib64/libglib-2.0.so.0 #6 0x00007f475d4da9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #7 0x0000562933785b33 in glib_pollfds_poll () at util/main-loop.c:219 #8 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518 #10 0x00005629334c5acf in main_loop () at vl.c:1810 #11 0x000056293334d7bb in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4471 If the multifd_send_threads is not created when migration is failed. In this senario, we don't call multifd_save_cleanup in multifd_new_send_channel_async. Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

It's not a big deal, but 'check qtest-ppc/ppc64' runs fail if sanitizers is enabled. The memory leak stack is as follow: Direct leak of 128 byte(s) in 4 object(s) allocated from: #0 0x7f11756f5970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f1174f2549d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x556af05aa7da in mm_fw_cfg_init /mnt/sdb/qemu/tests/libqos/fw_cfg.c:119 #3 0x556af059f4f5 in read_boot_order_pmac /mnt/sdb/qemu/tests/boot-order-test.c:137 #4 0x556af059efe2 in test_a_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:47 #5 0x556af059f2c0 in test_boot_orders /mnt/sdb/qemu/tests/boot-order-test.c:59 #6 0x556af059f52d in test_pmac_oldworld_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:152 #7 0x7f1174f46cb9 (/lib64/libglib-2.0.so.0+0x73cb9) #8 0x7f1174f46b73 (/lib64/libglib-2.0.so.0+0x73b73) #9 0x7f1174f46b73 (/lib64/libglib-2.0.so.0+0x73b73) #10 0x7f1174f46f71 in g_test_run_suite (/lib64/libglib-2.0.so.0+0x73f71) #11 0x7f1174f46f94 in g_test_run (/lib64/libglib-2.0.so.0+0x73f94) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200203025935.36228-1-pannengyuan@huawei.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>

When remove dup_fd in monitor_fdset_dup_fd_find_remove function, we need to free mon_fdset_fd_dup. ASAN shows memory leak stack: Direct leak of 96 byte(s) in 3 object(s) allocated from: #0 0xfffd37b033b3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33b3) #1 0xfffd375c71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) #2 0xaaae25bf1c17 in monitor_fdset_dup_fd_add /qemu/monitor/misc.c:1724 #3 0xaaae265cfd8f in qemu_open /qemu/util/osdep.c:315 #4 0xaaae264e2b2b in qmp_chardev_open_file_source /qemu/chardev/char-fd.c:122 #5 0xaaae264e47cf in qmp_chardev_open_file /qemu/chardev/char-file.c:81 #6 0xaaae264e118b in qemu_char_open /qemu/chardev/char.c:237 #7 0xaaae264e118b in qemu_chardev_new /qemu/chardev/char.c:964 #8 0xaaae264e1543 in qemu_chr_new_from_opts /qemu/chardev/char.c:680 #9 0xaaae25e12e0f in chardev_init_func /qemu/vl.c:2083 #10 0xaaae26603823 in qemu_opts_foreach /qemu/util/qemu-option.c:1170 #11 0xaaae258c9787 in main /qemu/vl.c:4089 #12 0xfffd35b80b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) #13 0xaaae258d7b63 (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b7b63) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200115072016.167252-1-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>

If we call the qmp 'query-block' while qemu is working on 'block-commit', it will cause memleaks, the memory leak stack is as follow: Indirect leak of 12360 byte(s) in 3 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #6 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #7 0x55ea958818ea in bdrv_block_device_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:56 #8 0x55ea958879de in bdrv_query_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:392 #9 0x55ea9588b58f in qmp_query_block /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:578 #10 0x55ea95567392 in qmp_marshal_query_block qapi/qapi-commands-block-core.c:95 Indirect leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #6 0x55ea9569f301 in bdrv_backing_attach /mnt/sdb/qemu-4.2.0-rc0/block.c:1064 #7 0x55ea956a99dd in bdrv_replace_child_noperm /mnt/sdb/qemu-4.2.0-rc0/block.c:2283 #8 0x55ea956b9b53 in bdrv_replace_node /mnt/sdb/qemu-4.2.0-rc0/block.c:4196 #9 0x55ea956b9e49 in bdrv_append /mnt/sdb/qemu-4.2.0-rc0/block.c:4236 #10 0x55ea958c3472 in commit_start /mnt/sdb/qemu-4.2.0-rc0/block/commit.c:306 #11 0x55ea94b68ab0 in qmp_block_commit /mnt/sdb/qemu-4.2.0-rc0/blockdev.c:3459 #12 0x55ea9556a7a7 in qmp_marshal_block_commit qapi/qapi-commands-block-core.c:407 Fixes: bb808d5 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-id: 20200116085600.24056-1-pannengyuan@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>

gtk_widget_get_window() returns NULL if the widget's window is not realized, and QEMU crashes. Example under gtk 3.22.30 (mate 1.20.1): qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion 'GDK_IS_WINDOW (window)' failed (gdb) bt #0 0x00007ffff496cf70 in gdk_window_get_origin () from /usr/lib64/libgdk-3.so.0 #1 0x00007ffff49582a0 in gdk_display_get_monitor_at_window () from /usr/lib64/libgdk-3.so.0 #2 0x0000555555bb73e2 in gd_refresh_rate_millihz (window=0x5555579d6280) at ui/gtk.c:1973 #3 gd_vc_gfx_init (view_menu=0x5555579f0590, group=0x0, idx=0, con=<optimized out>, vc=0x5555579d4a90, s=0x5555579d49f0) at ui/gtk.c:2048 #4 gd_create_menu_view (s=0x5555579d49f0) at ui/gtk.c:2149 #5 gd_create_menus (s=0x5555579d49f0) at ui/gtk.c:2188 #6 gtk_display_init (ds=<optimized out>, opts=0x55555661ed80 <dpy>) at ui/gtk.c:2256 #7 0x000055555583d5a0 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4358 Fixes: c4c0092 and 28b58f1 (display/gtk: get proper refreshrate) Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Jan Kiszka <jan.kiszka@web.de> Message-id: 20200208161048.11311-3-f4bug@amsat.org Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

These functions have assertions that are enabled even in debug mode and those assertion show up while profiling QEMU booting CheriBSD. Re-implementing them without assertions gives a small but measurable speedup: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.494 s ± 0.054 s [User: 8.519 s, System: 0.178 s] Range (min … max): 9.443 s … 9.600 s 10 runs Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.284 s ± 0.043 s [User: 8.249 s, System: 0.135 s] Range (min … max): 9.234 s … 9.381 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

The bitwise operations are more expensive than expanding it to 32 bytes. Additionally, QEMU's deposit64() and extract64() have an assertion that will be evaluated on every call (this shows up in `perf report` traces). This is another 1.08 times faster than CTSRD-CHERI#153 ``` alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5 Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 10.644 s ± 0.051 s [User: 10.203 s, System: 0.122 s] Range (min … max): 10.586 s … 10.701 s 5 runs Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.852 s ± 0.091 s [User: 9.159 s, System: 0.154 s] Range (min … max): 9.780 s … 9.965 s 5 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.08 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

The bitwise operations are more expensive than expanding the 64-bit value to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT kernel and a 1.08x speedup for the full purecap kernel+purecap userspace boot. Benchmarks using a previous version of this patch indicated a 1.08 speedup for the MFS_ROOT case. My assumption is that removing the TCG global accesses to cpu_capreg_state now has a larger impact after 0c09763 remove the incorrect NO_RWG flag. MFS_ROOT boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1 Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.499 s ± 0.027 s [User: 8.581 s, System: 0.136 s] Range (min … max): 9.448 s … 9.539 s 10 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.281 s ± 0.029 s [User: 8.260 s, System: 0.137 s] Range (min … max): 9.234 s … 9.326 s 10 runs Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 7.852 s ± 0.053 s [User: 7.523 s, System: 0.182 s] Range (min … max): 7.793 s … 7.933 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' 1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ``` Full purecap+purecap boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 227.351 s ± 0.484 s [User: 213.193 s, System: 3.544 s] Range (min … max): 226.792 s … 227.646 s 3 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.156 s ± 3.601 s [User: 197.448 s, System: 1.979 s] Range (min … max): 206.397 s … 213.575 s 3 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

These functions have assertions that are enabled even in debug mode and those assertion show up while profiling QEMU booting CheriBSD. Re-implementing them without assertions gives a small but measurable speedup: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.494 s ± 0.054 s [User: 8.519 s, System: 0.178 s] Range (min … max): 9.443 s … 9.600 s 10 runs Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.284 s ± 0.043 s [User: 8.249 s, System: 0.135 s] Range (min … max): 9.234 s … 9.381 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

The bitwise operations are more expensive than expanding it to 32 bytes. Additionally, QEMU's deposit64() and extract64() have an assertion that will be evaluated on every call (this shows up in `perf report` traces). This is another 1.08 times faster than CTSRD-CHERI#153 ``` alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5 Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 10.644 s ± 0.051 s [User: 10.203 s, System: 0.122 s] Range (min … max): 10.586 s … 10.701 s 5 runs Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.852 s ± 0.091 s [User: 9.159 s, System: 0.154 s] Range (min … max): 9.780 s … 9.965 s 5 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.08 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

These functions have assertions that are enabled even in debug mode and those assertion show up while profiling QEMU booting CheriBSD. Re-implementing them without assertions gives a small but measurable speedup: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.494 s ± 0.054 s [User: 8.519 s, System: 0.178 s] Range (min … max): 9.443 s … 9.600 s 10 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.284 s ± 0.043 s [User: 8.249 s, System: 0.135 s] Range (min … max): 9.234 s … 9.381 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

The bitwise operations are more expensive than expanding the 64-bit value to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT kernel and a 1.08x speedup for the full purecap kernel+purecap userspace boot. Benchmarks using a previous version of this patch indicated a 1.08 speedup for the MFS_ROOT case. My assumption is that removing the TCG global accesses to cpu_capreg_state now has a larger impact after 0c09763 remove the incorrect NO_RWG flag. MFS_ROOT boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1 Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.499 s ± 0.027 s [User: 8.581 s, System: 0.136 s] Range (min … max): 9.448 s … 9.539 s 10 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.281 s ± 0.029 s [User: 8.260 s, System: 0.137 s] Range (min … max): 9.234 s … 9.326 s 10 runs Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 7.852 s ± 0.053 s [User: 7.523 s, System: 0.182 s] Range (min … max): 7.793 s … 7.933 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' 1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ``` Full purecap+purecap boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 227.351 s ± 0.484 s [User: 213.193 s, System: 3.544 s] Range (min … max): 226.792 s … 227.646 s 3 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.156 s ± 3.601 s [User: 197.448 s, System: 1.979 s] Range (min … max): 206.397 s … 213.575 s 3 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

These functions are called extremely frequently when booting CheriBSD (for every capability load/store), so the `switch (access_type)` inside probe_access_internal() has a noticeable impact on overall performance. By adding QEMU_ALWAYS_INLINE, we can avoid this switch for all calls to probe_{write,read,cap_write}() and thereby speed up the CheriBSD boot: MFS_ROOT purecap: ``` alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5 Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 8.501 s ± 0.601 s [User: 8.014 s, System: 0.121 s] Range (min … max): 7.999 s … 9.194 s 5 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.739 s ± 0.030 s [User: 8.973 s, System: 0.206 s] Range (min … max): 9.707 s … 9.774 s 5 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.15 ± 0.08 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ``` Full disk image purecap kernel: ``` alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 213.637 s ± 1.920 s [User: 200.744 s, System: 2.847 s] Range (min … max): 211.734 s … 215.573 s 3 runs Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 243.621 s ± 3.036 s [User: 230.935 s, System: 2.485 s] Range (min … max): 240.935 s … 246.915 s 3 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.14 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

These functions have assertions that are enabled even in debug mode and those assertion show up while profiling QEMU booting CheriBSD. Re-implementing them without assertions gives a small but measurable speedup: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.494 s ± 0.054 s [User: 8.519 s, System: 0.178 s] Range (min … max): 9.443 s … 9.600 s 10 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.284 s ± 0.043 s [User: 8.249 s, System: 0.135 s] Range (min … max): 9.234 s … 9.381 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ```

The bitwise operations are more expensive than expanding the 64-bit value to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT kernel and a 1.08x speedup for the full purecap kernel+purecap userspace boot. Benchmarks using a previous version of this patch indicated a 1.08 speedup for the MFS_ROOT case. My assumption is that removing the TCG global accesses to cpu_capreg_state now has a larger impact after 0c09763 remove the incorrect NO_RWG flag. MFS_ROOT boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1 Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.499 s ± 0.027 s [User: 8.581 s, System: 0.136 s] Range (min … max): 9.448 s … 9.539 s 10 runs Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 9.281 s ± 0.029 s [User: 8.260 s, System: 0.137 s] Range (min … max): 9.234 s … 9.326 s 10 runs Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh Time (mean ± σ): 7.852 s ± 0.053 s [User: 7.523 s, System: 0.182 s] Range (min … max): 7.793 s … 7.933 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran 1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' 1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ``` Full purecap+purecap boot: ``` hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 227.351 s ± 0.484 s [User: 213.193 s, System: 3.544 s] Range (min … max): 226.792 s … 227.646 s 3 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.156 s ± 3.601 s [User: 197.448 s, System: 1.979 s] Range (min … max): 206.397 s … 213.575 s 3 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

Profiling showed that we were spending unncessary amounts of time unpacking these fields. This speeds up a QEMU purecap kernel boot by about 5 seconds: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 215.379 s ± 1.550 s [User: 203.415 s, System: 2.209 s] Range (min … max): 212.726 s … 216.815 s 5 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.971 s ± 1.600 s [User: 199.158 s, System: 2.056 s] Range (min … max): 208.997 s … 212.825 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

Profiling showed that we were spending unnecessary amounts of time unpacking these fields. This speeds up a QEMU purecap kernel boot by about 5 seconds: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 215.379 s ± 1.550 s [User: 203.415 s, System: 2.209 s] Range (min … max): 212.726 s … 216.815 s 5 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.971 s ± 1.600 s [User: 199.158 s, System: 2.056 s] Range (min … max): 208.997 s … 212.825 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

Profiling showed that we were spending unnecessary amounts of time unpacking these fields. This speeds up a QEMU purecap kernel boot by about 5 seconds: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 215.379 s ± 1.550 s [User: 203.415 s, System: 2.209 s] Range (min … max): 212.726 s … 216.815 s 5 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 210.971 s ± 1.600 s [User: 199.158 s, System: 2.056 s] Range (min … max): 208.997 s … 212.825 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

Running perf shows me that with the morello changes a RISC-V purecap CheriBSD boot spends 4.7% of total time in this function. Before this change: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd _tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/ alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 221.103 s ± 2.055 s [User: 208.764 s, System: 2.578 s] Range (min … max): 218.424 s … 223.736 s 5 runs Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 209.378 s ± 1.133 s [User: 196.410 s, System: 3.062 s] Range (min … max): 207.857 s … 210.841 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

Running perf shows me that with the morello changes a RISC-V purecap CheriBSD boot spends 4.7% of total time in this function. Before this change: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd _tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/ alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 221.103 s ± 2.055 s [User: 208.764 s, System: 2.578 s] Range (min … max): 218.424 s … 223.736 s 5 runs Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 209.378 s ± 1.133 s [User: 196.410 s, System: 3.062 s] Range (min … max): 207.857 s … 210.841 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ``` After: ``` hyperfine -L qemu 'qemu-system-riscv64cheri.morello-ldst-reg-inlined,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5 Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 206.196 s ± 1.410 s [User: 194.016 s, System: 2.004 s] Range (min … max): 203.675 s … 206.932 s 5 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest Time (mean ± σ): 203.379 s ± 0.572 s [User: 190.759 s, System: 2.685 s] Range (min … max): 202.715 s … 203.860 s 5 runs Summary '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran 1.01 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ```

…idx() probe_read() is a very hot function when booting CheriBSD (6.42% of the total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot). Looking at the perf report, we were calling probe read once inside load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in cheri_tag_get(), even though we know that the second call cannot trigger a MMU fault. Pass the host address as a argument to cheri_tag_get() and skip the probe_read() call if the passed argument is non-NULL. With this change probe_read() is down to 3.69% of the total. ``` hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10 Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.005 s ± 0.461 s [User: 7.585 s, System: 0.151 s] Range (min … max): 7.741 s … 8.908 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.599 s ± 0.530 s [User: 7.816 s, System: 0.122 s] Range (min … max): 7.821 s … 8.989 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran 1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ``` ``` perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8177.658418 task-clock (msec) # 0.936 CPUs utilized ( +- 0.71% ) 2,398 context-switches # 0.293 K/sec ( +- 1.64% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 12,858 page-faults # 0.002 M/sec ( +- 0.59% ) 27,261,165,519 cycles # 3.334 GHz ( +- 0.17% ) 75,402,569,943 instructions # 2.77 insn per cycle ( +- 0.16% ) 11,183,417,524 branches # 1367.557 M/sec ( +- 0.15% ) 162,719,113 branch-misses # 1.46% of all branches ( +- 0.20% ) 8.738376962 seconds time elapsed ( +- 3.03% ) ``` Before this change: ``` Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8429.485885 task-clock (msec) # 0.911 CPUs utilized ( +- 0.35% ) 2,508 context-switches # 0.298 K/sec ( +- 0.61% ) 1 cpu-migrations # 0.000 K/sec ( +- 66.67% ) 12,867 page-faults # 0.002 M/sec ( +- 0.79% ) 27,831,470,600 cycles # 3.302 GHz ( +- 0.08% ) 77,801,651,546 instructions # 2.80 insn per cycle ( +- 0.01% ) 11,223,140,290 branches # 1331.415 M/sec ( +- 0.01% ) 164,672,766 branch-misses # 1.47% of all branches ( +- 0.08% ) 9.256983251 seconds time elapsed ( +- 0.23% ) ```

…idx() probe_read() is a very hot function when booting CheriBSD (6.42% of the total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot). Looking at the perf report, we were calling probe read once inside load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in cheri_tag_get(), even though we know that the second call cannot trigger a MMU fault. Pass the host address as a argument to cheri_tag_get() and skip the probe_read() call if the passed argument is non-NULL. With this change probe_read() is down to 3.69% of the total. ``` hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10 Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.005 s ± 0.461 s [User: 7.585 s, System: 0.151 s] Range (min … max): 7.741 s … 8.908 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.599 s ± 0.530 s [User: 7.816 s, System: 0.122 s] Range (min … max): 7.821 s … 8.989 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran 1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ``` ``` perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8177.658418 task-clock (msec) # 0.936 CPUs utilized ( +- 0.71% ) 2,398 context-switches # 0.293 K/sec ( +- 1.64% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 12,858 page-faults # 0.002 M/sec ( +- 0.59% ) 27,261,165,519 cycles # 3.334 GHz ( +- 0.17% ) 75,402,569,943 instructions # 2.77 insn per cycle ( +- 0.16% ) 11,183,417,524 branches # 1367.557 M/sec ( +- 0.15% ) 162,719,113 branch-misses # 1.46% of all branches ( +- 0.20% ) 8.738376962 seconds time elapsed ( +- 3.03% ) ``` Before this change: ``` Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8429.485885 task-clock (msec) # 0.911 CPUs utilized ( +- 0.35% ) 2,508 context-switches # 0.298 K/sec ( +- 0.61% ) 1 cpu-migrations # 0.000 K/sec ( +- 66.67% ) 12,867 page-faults # 0.002 M/sec ( +- 0.79% ) 27,831,470,600 cycles # 3.302 GHz ( +- 0.08% ) 77,801,651,546 instructions # 2.80 insn per cycle ( +- 0.01% ) 11,223,140,290 branches # 1331.415 M/sec ( +- 0.01% ) 164,672,766 branch-misses # 1.47% of all branches ( +- 0.08% ) 9.256983251 seconds time elapsed ( +- 0.23% ) ``` Before ``` Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.probe_read_fix -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8177.658418 task-clock (msec) # 0.936 CPUs utilized ( +- 0.71% ) 2,398 context-switches # 0.293 K/sec ( +- 1.64% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 12,858 page-faults # 0.002 M/sec ( +- 0.59% ) 27,261,165,519 cycles # 3.334 GHz ( +- 0.17% ) 75,402,569,943 instructions # 2.77 insn per cycle ( +- 0.16% ) 11,183,417,524 branches # 1367.557 M/sec ( +- 0.15% ) 162,719,113 branch-misses # 1.46% of all branches ( +- 0.20% ) 8.738376962 seconds time elapsed ( +- 3.03% ) ```

…idx() probe_read() is a very hot function when booting CheriBSD (6.42% of the total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot). Looking at the perf report, we were calling probe read once inside load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in cheri_tag_get(), even though we know that the second call cannot trigger a MMU fault. Pass the host address as a argument to cheri_tag_get() and skip the probe_read() call if the passed argument is non-NULL. With this change probe_read() is down to 3.69% of the total. ``` hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10 Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.005 s ± 0.461 s [User: 7.585 s, System: 0.151 s] Range (min … max): 7.741 s … 8.908 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.599 s ± 0.530 s [User: 7.816 s, System: 0.122 s] Range (min … max): 7.821 s … 8.989 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran 1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ``` ``` perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8177.658418 task-clock (msec) # 0.936 CPUs utilized ( +- 0.71% ) 2,398 context-switches # 0.293 K/sec ( +- 1.64% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 12,858 page-faults # 0.002 M/sec ( +- 0.59% ) 27,261,165,519 cycles # 3.334 GHz ( +- 0.17% ) 75,402,569,943 instructions # 2.77 insn per cycle ( +- 0.16% ) 11,183,417,524 branches # 1367.557 M/sec ( +- 0.15% ) 162,719,113 branch-misses # 1.46% of all branches ( +- 0.20% ) 8.738376962 seconds time elapsed ( +- 3.03% ) ``` Before this change: ``` Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8429.485885 task-clock (msec) # 0.911 CPUs utilized ( +- 0.35% ) 2,508 context-switches # 0.298 K/sec ( +- 0.61% ) 1 cpu-migrations # 0.000 K/sec ( +- 66.67% ) 12,867 page-faults # 0.002 M/sec ( +- 0.79% ) 27,831,470,600 cycles # 3.302 GHz ( +- 0.08% ) 77,801,651,546 instructions # 2.80 insn per cycle ( +- 0.01% ) 11,223,140,290 branches # 1331.415 M/sec ( +- 0.01% ) 164,672,766 branch-misses # 1.47% of all branches ( +- 0.08% ) 9.256983251 seconds time elapsed ( +- 0.23% ) ```

…idx() probe_read() is a very hot function when booting CheriBSD (6.42% of the total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot). Looking at the perf report, we were calling probe read once inside load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in cheri_tag_get(), even though we know that the second call cannot trigger a MMU fault. Pass the host address as a argument to cheri_tag_get() and skip the probe_read() call if the passed argument is non-NULL. With this change probe_read() is down to 3.69% of the total. ``` hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10 Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.005 s ± 0.461 s [User: 7.585 s, System: 0.151 s] Range (min … max): 7.741 s … 8.908 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci Time (mean ± σ): 8.599 s ± 0.530 s [User: 7.816 s, System: 0.122 s] Range (min … max): 7.821 s … 8.989 s 10 runs Summary '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran 1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ``` ``` perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8177.658418 task-clock (msec) # 0.936 CPUs utilized ( +- 0.71% ) 2,398 context-switches # 0.293 K/sec ( +- 1.64% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 12,858 page-faults # 0.002 M/sec ( +- 0.59% ) 27,261,165,519 cycles # 3.334 GHz ( +- 0.17% ) 75,402,569,943 instructions # 2.77 insn per cycle ( +- 0.16% ) 11,183,417,524 branches # 1367.557 M/sec ( +- 0.15% ) 162,719,113 branch-misses # 1.46% of all branches ( +- 0.20% ) 8.738376962 seconds time elapsed ( +- 3.03% ) ``` Before this change: ``` Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs): 8429.485885 task-clock (msec) # 0.911 CPUs utilized ( +- 0.35% ) 2,508 context-switches # 0.298 K/sec ( +- 0.61% ) 1 cpu-migrations # 0.000 K/sec ( +- 66.67% ) 12,867 page-faults # 0.002 M/sec ( +- 0.79% ) 27,831,470,600 cycles # 3.302 GHz ( +- 0.08% ) 77,801,651,546 instructions # 2.80 insn per cycle ( +- 0.01% ) 11,223,140,290 branches # 1331.415 M/sec ( +- 0.01% ) 164,672,766 branch-misses # 1.47% of all branches ( +- 0.08% ) 9.256983251 seconds time elapsed ( +- 0.23% ) ```

We can't know the caller read enough data in the memory pointed by ext_hdr to cast it as a ip6_ext_hdr_routing. Declare rt_hdr on the stack and fill it again from the iovec. Since we already checked there is enough data in the iovec buffer, simply add an assert() call to consume the bytes_read variable. This fix a 2 bytes buffer overrun in eth_parse_ipv6_hdr() reported by QEMU fuzzer: $ cat << EOF | ./qemu-system-i386 -M pc-q35-5.0 \ -accel qtest -monitor none \ -serial none -nographic -qtest stdio outl 0xcf8 0x80001010 outl 0xcfc 0xe1020000 outl 0xcf8 0x80001004 outw 0xcfc 0x7 write 0x25 0x1 0x86 write 0x26 0x1 0xdd write 0x4f 0x1 0x2b write 0xe1020030 0x4 0x190002e1 write 0xe102003a 0x2 0x0807 write 0xe1020048 0x4 0x12077cdd write 0xe1020400 0x4 0xba077cdd write 0xe1020420 0x4 0x190002e1 write 0xe1020428 0x4 0x3509d807 write 0xe1020438 0x1 0xe2 EOF ================================================================= ==2859770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdef904902 at pc 0x561ceefa78de bp 0x7ffdef904820 sp 0x7ffdef904818 READ of size 1 at 0x7ffdef904902 thread T0 #0 0x561ceefa78dd in _eth_get_rss_ex_dst_addr net/eth.c:410:17 #1 0x561ceefa41fb in eth_parse_ipv6_hdr net/eth.c:532:17 #2 0x561cef7de639 in net_tx_pkt_parse_headers hw/net/net_tx_pkt.c:228:14 #3 0x561cef7dbef4 in net_tx_pkt_parse hw/net/net_tx_pkt.c:273:9 #4 0x561ceec29f22 in e1000e_process_tx_desc hw/net/e1000e_core.c:730:29 #5 0x561ceec28eac in e1000e_start_xmit hw/net/e1000e_core.c:927:9 #6 0x561ceec1baab in e1000e_set_tdt hw/net/e1000e_core.c:2444:9 #7 0x561ceebf300e in e1000e_core_write hw/net/e1000e_core.c:3256:9 #8 0x561cef3cd4cd in e1000e_mmio_write hw/net/e1000e.c:110:5 Address 0x7ffdef904902 is located in stack of thread T0 at offset 34 in frame #0 0x561ceefa320f in eth_parse_ipv6_hdr net/eth.c:486 This frame has 1 object(s): [32, 34) 'ext_hdr' (line 487) <== Memory access at offset 34 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow net/eth.c:410:17 in _eth_get_rss_ex_dst_addr Shadow bytes around the buggy address: 0x10003df188d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df188e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df188f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18910: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 =>0x10003df18920:[02]f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10003df18970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Stack left redzone: f1 Stack right redzone: f3 ==2859770==ABORTING Add the corresponding qtest case with the fuzzer reproducer. FWIW GCC 11 similarly reported: net/eth.c: In function 'eth_parse_ipv6_hdr': net/eth.c:410:15: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds] 410 | if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) { | ~~~~~^~~~~~~ net/eth.c:485:24: note: while referencing 'ext_hdr' 485 | struct ip6_ext_hdr ext_hdr; | ^~~~~~~ net/eth.c:410:38: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds] 410 | if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) { | ~~~~~^~~~~~~~~ net/eth.c:485:24: note: while referencing 'ext_hdr' 485 | struct ip6_ext_hdr ext_hdr; | ^~~~~~~ Cc: qemu-stable@nongnu.org Buglink: https://bugs.launchpad.net/qemu/+bug/1879531 Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Miroslav Rezanina <mrezanin@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Miroslav Rezanina <mrezanin@redhat.com> Fixes: eb70002 ("net_pkt: Extend packet abstraction as required by e1000e functionality") Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>

Incoming enabled bitmaps are busy, because we do bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps being migrated are not marked busy, and user can remove them during the incoming migration. Then we may crash in cancel_incoming_locked() when try to remove the bitmap that was already removed by user, like this: #0 qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20 "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77 #1 bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9) at ../block/dirty-bitmap.c:64 #2 bdrv_release_dirty_bitmap (bitmap=0x5596810e9570) at ../block/dirty-bitmap.c:362 #3 cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>) at ../migration/block-dirty-bitmap.c:918 #4 dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0 <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194 #5 vmstate_load (f=0x559681d02b10, se=0x559680fb5810) at ../migration/savevm.c:908 #6 qemu_loadvm_section_part_end (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2473 #7 qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2626 #8 postcopy_ram_listen_thread (opaque=0x0) at ../migration/savevm.c:1871 #9 qemu_thread_start (args=0x5596817ccd10) at ../util/qemu-thread-posix.c:521 #10 start_thread () at /lib64/libpthread.so.0 #11 clone () at /lib64/libc.so.6 Note bs pointer taken from bitmap: it's definitely bad aligned. That's because we are in use after free, bitmap is already freed. So, let's make disabled bitmaps (being migrated) busy during incoming migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210322094906.5079-2-vsementsov@virtuozzo.com>

When building with --enable-sanitizers we get: Direct leak of 32 byte(s) in 2 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847f02ca2 in usb_packet_init hw/usb/core.c:531:5 #3 0x561848df4df4 in usb_ehci_init hw/usb/hcd-ehci.c:2575:5 #4 0x561847c119ac in ehci_sysbus_init hw/usb/hcd-ehci-sysbus.c:73:5 #5 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 #6 0x56184a5bd955 in object_init_with_type qom/object.c:371:9 #7 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 #8 0x56184a5a24d5 in object_initialize qom/object.c:536:5 #9 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 #10 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 #11 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 #12 0x561849542d18 in npcm7xx_init hw/arm/npcm7xx.c:427:5 Similarly to commit d710e1e ("usb: ehci: fix memory leak in ehci"), fix by calling usb_ehci_finalize() to free the USBPacket. Fixes: 7341ea0 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210323183701.281152-1-f4bug@amsat.org> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

When building with --enable-sanitizers we get: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847c2dcc9 in xlnx_dp_init hw/display/xlnx_dp.c:1259:5 #3 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 #4 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 #5 0x56184a5a24d5 in object_initialize qom/object.c:536:5 #6 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 #7 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 #8 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 #9 0x5618495aa431 in xlnx_zynqmp_init hw/arm/xlnx-zynqmp.c:273:5 The RX/TX FIFOs are created in xlnx_dp_init(), add xlnx_dp_finalize() to destroy them. Fixes: 58ac482 ("introduce xlnx-dp") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210323182958.277654-1-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

g_hash_table_add always retains ownership of the pointer passed in as the key. Its return status merely indicates whether the added entry was new, or replaced an existing entry. Thus key must never be freed after this method returns. Spotted by ASAN: ==2407186==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020003ac4f0 at pc 0x7ffff766659c bp 0x7fffffffd1d0 sp 0x7fffffffc980 READ of size 1 at 0x6020003ac4f0 thread T0 #0 0x7ffff766659b (/lib64/libasan.so.6+0x8a59b) #1 0x7ffff6bfa843 in g_str_equal ../glib/ghash.c:2303 #2 0x7ffff6bf8167 in g_hash_table_lookup_node ../glib/ghash.c:493 #3 0x7ffff6bf9b78 in g_hash_table_insert_internal ../glib/ghash.c:1598 #4 0x7ffff6bf9c32 in g_hash_table_add ../glib/ghash.c:1689 #5 0x5555596caad4 in module_load_one ../util/module.c:233 #6 0x5555596ca949 in module_load_one ../util/module.c:225 #7 0x5555596ca949 in module_load_one ../util/module.c:225 #8 0x5555596cbdf4 in module_load_qom_all ../util/module.c:349 Typical C bug... Fixes: 9062912 ("module: use g_hash_table_add()") Cc: qemu-stable@nongnu.org Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210316134456.3243102-1-marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

brooksdavis assigned staceyson Feb 29, 2016

staceyson closed this as completed Feb 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect representation of immediate in traced CLCs #2

incorrect representation of immediate in traced CLCs #2

brooksdavis commented Feb 29, 2016

staceyson commented Feb 29, 2016

incorrect representation of immediate in traced CLCs #2

incorrect representation of immediate in traced CLCs #2

Comments

brooksdavis commented Feb 29, 2016

staceyson commented Feb 29, 2016