Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect representation of immediate in traced CLCs #2

Closed
brooksdavis opened this issue Feb 29, 2016 · 1 comment
Closed

incorrect representation of immediate in traced CLCs #2

brooksdavis opened this issue Feb 29, 2016 · 1 comment
Assignees

Comments

@brooksdavis
Copy link
Member

The immediate values on CLCs seems to be wrong in text instruction traces. For example 481:

0x00000001200df674:  clc        c3,s8,481(c11)
    Cap Memory Read [0000007fffffc4c0] = v:1 tps:00000000ffff00fa
    c:0000007fffffc5c0 b:0000007fffffc5c0 l:0000000000000020
    Write C03|v:1 s:0 p:7fff807d b:0000007fffffc5c0 l:0000000000000020
             |o:0000000000000000 t:0

Objdump reports:

  1200df674:   d86bf080        clc     c3,s8,128(c11)
@staceyson
Copy link
Collaborator

Fixed with dbb6810

arichardson pushed a commit to arichardson/qemu that referenced this issue Oct 20, 2017
"name" is freed after visiting options, instead use the first NetClientState
name. Adds a few assert() for clarifying and checking some impossible states.

READ of size 1 at 0x602000000990 thread T0
    #0 0x7f6b251c570c  (/lib64/libasan.so.2+0x4770c)
    CTSRD-CHERI#1 0x5566dc380600 in qemu_find_net_clients_except net/net.c:824
    CTSRD-CHERI#2 0x5566dc39bac7 in net_vhost_user_event net/vhost-user.c:193
    CTSRD-CHERI#3 0x5566dbee862a in qemu_chr_be_event /home/elmarco/src/qemu/qemu-char.c:201
    CTSRD-CHERI#4 0x5566dbef2890 in tcp_chr_disconnect /home/elmarco/src/qemu/qemu-char.c:2790
    CTSRD-CHERI#5 0x5566dbef2d0b in tcp_chr_sync_read /home/elmarco/src/qemu/qemu-char.c:2835
    CTSRD-CHERI#6 0x5566dbee8a99 in qemu_chr_fe_read_all /home/elmarco/src/qemu/qemu-char.c:295
    CTSRD-CHERI#7 0x5566dc39b964 in net_vhost_user_watch net/vhost-user.c:180
    CTSRD-CHERI#8 0x5566dc5a06c7 in qio_channel_fd_source_dispatch io/channel-watch.c:70
    CTSRD-CHERI#9 0x7f6b1aa2ab87 in g_main_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3154
    CTSRD-CHERI#10 0x7f6b1aa2b9cb in g_main_context_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3769
    CTSRD-CHERI#11 0x5566dc475ed4 in glib_pollfds_poll /home/elmarco/src/qemu/main-loop.c:212
    CTSRD-CHERI#12 0x5566dc476029 in os_host_main_loop_wait /home/elmarco/src/qemu/main-loop.c:257
    CTSRD-CHERI#13 0x5566dc476165 in main_loop_wait /home/elmarco/src/qemu/main-loop.c:505
    CTSRD-CHERI#14 0x5566dbf08d31 in main_loop /home/elmarco/src/qemu/vl.c:1932
    CTSRD-CHERI#15 0x5566dbf16783 in main /home/elmarco/src/qemu/vl.c:4646
    CTSRD-CHERI#16 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f)
    CTSRD-CHERI#17 0x5566dbbf5348 in _start (/home/elmarco/src/qemu/x86_64-softmmu/qemu-system-x86_64+0x3f9348)

0x602000000990 is located 0 bytes inside of 5-byte region [0x602000000990,0x602000000995)
freed by thread T0 here:
    #0 0x7f6b2521666a in __interceptor_free (/lib64/libasan.so.2+0x9866a)
    CTSRD-CHERI#1 0x7f6b1aa332a4 in g_free /home/elmarco/src/gnome/glib/glib/gmem.c:189
    CTSRD-CHERI#2 0x5566dc5f416f in qapi_dealloc_type_str qapi/qapi-dealloc-visitor.c:134
    CTSRD-CHERI#3 0x5566dc5f3268 in visit_type_str qapi/qapi-visit-core.c:196
    CTSRD-CHERI#4 0x5566dc5ced58 in visit_type_Netdev_fields /home/elmarco/src/qemu/qapi-visit.c:5936
    CTSRD-CHERI#5 0x5566dc5cef71 in visit_type_Netdev /home/elmarco/src/qemu/qapi-visit.c:5960
    CTSRD-CHERI#6 0x5566dc381a8d in net_visit net/net.c:1049
    CTSRD-CHERI#7 0x5566dc381c37 in net_client_init net/net.c:1076
    CTSRD-CHERI#8 0x5566dc3839e2 in net_init_netdev net/net.c:1473
    CTSRD-CHERI#9 0x5566dc63cc0a in qemu_opts_foreach util/qemu-option.c:1112
    CTSRD-CHERI#10 0x5566dc383b36 in net_init_clients net/net.c:1499
    CTSRD-CHERI#11 0x5566dbf15d86 in main /home/elmarco/src/qemu/vl.c:4397
    CTSRD-CHERI#12 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
arichardson pushed a commit that referenced this issue Nov 22, 2017
"nc" is freed after hotplug vhost-user, but the watcher is not removed.
The QEMU crash when the watcher access the "nc" when socket disconnects.

    Program received signal SIGSEGV, Segmentation fault.
    #0  object_get_class (obj=obj@entry=0x2) at qom/object.c:750
    #1  0x00007f9bb4180da1 in qemu_chr_fe_disconnect (be=<optimized out>) at chardev/char-fe.c:372
    #2  0x00007f9bb40d1100 in net_vhost_user_watch (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at net/vhost-user.c:188
    #3  0x00007f9baf97f99a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    #4  0x00007f9bb41d7ebc in glib_pollfds_poll () at util/main-loop.c:213
    #5  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
    #6  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
    #7  0x00007f9bb3e266a7 in main_loop () at vl.c:1917
    #8  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4786

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
arichardson pushed a commit that referenced this issue Jun 18, 2018
The instruction "ucvtf v0.4h, v04h, #2", with input 0x8000u,
overflows the intermediate float16 to infinity before we have a
chance to scale the output.  Use float64 as the intermediate type
so that no input argument (uint32_t in this case) can overflow
or round before scaling.  Given the declared argument, the signed
int32_t function has the same problem.

When converting from float16 to integer, using u/int32_t instead
of u/int16_t means that the bounding is incorrect.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180502221552.3873-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
arichardson pushed a commit that referenced this issue Jun 18, 2018
After f771c54 it is possible to select device and
head which to take screendump from. And even though we check if
provided head number falls within range, it may still happen that
the console has no surface yet leading to SIGSEGV:

  qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 \
    -qmp stdio \
    -device virtio-vga,id=video0,max_outputs=4

  {"execute":"qmp_capabilities"}
  {"execute":"screendump", "arguments":{"filename":"/tmp/screen.ppm", "device":"video0", "head":1}}
  Segmentation fault

 #0  0x00005628249dda88 in ppm_save (filename=0x56282826cbc0 "/tmp/screen.ppm", ds=0x0, errp=0x7fff52a6fae0) at ui/console.c:304
 #1  0x00005628249ddd9b in qmp_screendump (filename=0x56282826cbc0 "/tmp/screen.ppm", has_device=true, device=0x5628276902d0 "video0", has_head=true, head=1, errp=0x7fff52a6fae0) at ui/console.c:375
 #2  0x00005628247740df in qmp_marshal_screendump (args=0x562828265e00, ret=0x7fff52a6fb68, errp=0x7fff52a6fb60) at qapi/qapi-commands-ui.c:110

Here, @ds from frame #0 (or @surface from frame #1) is
dereferenced at the very beginning of ppm_save(). And because
it's NULL crash happens.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: cb05bb1909daa6ba62145c0194aafa05a14ed3d1.1526569138.git.mprivozn@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
arichardson pushed a commit that referenced this issue Jun 18, 2018
…nnect

When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.

The backtrace is:
    (gdb) bt
    #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
    #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
    #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
    #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
    #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
    #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
    #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
    #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
    #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
    #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
    #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
    #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
        at util/async.c:261
    #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
    #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
    #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
    #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
    #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
    #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752

It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.

According to IB Spec once active side send DREQ message, it should wait for DREP message
and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
due to network issues.
For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.

So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
is also destroyed in qemu_rdma_cleanup.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
arichardson added a commit that referenced this issue Oct 1, 2018
…er!)

The baseline is e38b03f.

Benchmark #1: /auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128 --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub
  Time (mean ± σ):     29.330 s ±  0.627 s    [User: 26.381 s, System: 0.415 s]
  Range (min … max):   28.588 s … 30.587 s

Benchmark #2: /auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128.before.log --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub
  Time (mean ± σ):     35.656 s ±  0.603 s    [User: 32.812 s, System: 0.370 s]
  Range (min … max):   34.855 s … 36.412 s

Summary
  '/auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128 --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub' ran
    1.22 ± 0.03 times faster than '/auto/homes/alr48/devel/cheribuild/test-scripts/test_boot.py --kernel /local/scratch/alr48/cheri/output/kernel.CHERI128_MALTA64_MFS_ROOT --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-cheri128.before.log --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub'
arichardson pushed a commit that referenced this issue Jan 29, 2020
ivq/dvq/svq/free_page_vq is forgot to cleanup in
virtio_balloon_device_unrealize, the memory leak stack is as follow:

Direct leak of 14336 byte(s) in 2 object(s) allocated from:
    #0 0x7f99fd9d8560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
    #1 0x7f99fcb20015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
    #2 0x557d90638437 in virtio_add_queue hw/virtio/virtio.c:2327
    #3 0x557d9064401d in virtio_balloon_device_realize hw/virtio/virtio-balloon.c:793
    #4 0x557d906356f7 in virtio_device_realize hw/virtio/virtio.c:3504
    #5 0x557d9073f081 in device_set_realized hw/core/qdev.c:876
    #6 0x557d908b1f4d in property_set_bool qom/object.c:2080
    #7 0x557d908b655e in object_property_set_qobject qom/qom-qobject.c:26

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <1575444716-17632-2-git-send-email-pannengyuan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
ivqs/ovqs/c_ivq/c_ovq is forgot to cleanup in
virtio_serial_device_unrealize, the memory leak stack is as bellow:

Direct leak of 1290240 byte(s) in 180 object(s) allocated from:
    #0 0x7fc9bfc27560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
    #1 0x7fc9bed6f015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
    #2 0x5650e02b83e7 in virtio_add_queue hw/virtio/virtio.c:2327
    #3 0x5650e02847b5 in virtio_serial_device_realize hw/char/virtio-serial-bus.c:1089
    #4 0x5650e02b56a7 in virtio_device_realize hw/virtio/virtio.c:3504
    #5 0x5650e03bf031 in device_set_realized hw/core/qdev.c:876
    #6 0x5650e0531efd in property_set_bool qom/object.c:2080
    #7 0x5650e053650e in object_property_set_qobject qom/qom-qobject.c:26
    #8 0x5650e0533e14 in object_property_set_bool qom/object.c:1338
    #9 0x5650e04c0e37 in virtio_pci_realize hw/virtio/virtio-pci.c:1801

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Amit Shah <amit@kernel.org>
Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1575444716-17632-3-git-send-email-pannengyuan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.

In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.

However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:

  #2  0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
  #3  0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
  #4  0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
  #5  0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
  #6  0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
  #7  0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
  #8  0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
  #9  0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:520
  #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:607
  #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:639
  #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0)
      at /home/mdroth/w/qemu.git/util/aio-wait.c:71
  #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
  #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
  #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
  #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
  #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613

I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:

  static bool virtio_queue_host_notifier_aio_poll(void *opaque)
  {
      EventNotifier *n = opaque;
      VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
      bool progress;

      if (!vq->vring.desc || virtio_queue_empty(vq)) {
          return false;
      }

      progress = virtio_queue_notify_aio_vq(vq);

namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:

  int virtio_queue_empty(VirtQueue *vq)
  {
      bool empty;
      ...

      if (vq->shadow_avail_idx != vq->last_avail_idx) {
          return 0;
      }

      rcu_read_lock();
      empty = vring_avail_idx(vq) == vq->last_avail_idx;
      rcu_read_unlock();
      return empty;

but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:

  "virtio: zero sized buffers are not allowed"

or

  "virtio-blk missing headers"

and puts the device in an error state.

This patch works around the issue by introducing virtio_set_disabled(),
which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
when bus-mastering is disabled. Since we'd check this flag at all the
same sites as vdev->broken, we replace those checks with an inline
function which checks for either vdev->broken or vdev->disabled.

The 'disabled' flag is only migrated when set, which should be fairly
rare, but to maintain migration compatibility we disable it's use for
older machine types. Users requiring the use of the flag in conjunction
with older machine types can set it explicitly as a virtio-device
option.

NOTES:

 - This leaves some other oddities in play, like the fact that
   DRIVER_OK also gets unset in response to bus-mastering being
   disabled, but not restored (however the device seems to continue
   working)
 - Similarly, we disable the host notifier via
   virtio_bus_stop_ioeventfd(), which seems to move the handling out
   of virtio-blk dataplane and back into the main IO thread, and it
   ends up staying there till a reset (but otherwise continues working
   normally)

Cc: David Gibson <david@gibson.dropbear.id.au>,
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
This avoid a memory leak when qom-set is called to set throttle_group
limits, here is an easy way to reproduce:

1. run qemu-iotests as follow and check the result with asan:
       ./check -qcow2 184

Following is the asan output backtrack:
Direct leak of 912 byte(s) in 3 object(s) allocated from:
    #0 0xffff8d7ab3c3 in __interceptor_calloc   (/lib64/libasan.so.4+0xd33c3)
    #1 0xffff8d4c31cb in g_malloc0   (/lib64/libglib-2.0.so.0+0x571cb)
    #2 0x190c857 in qobject_input_start_struct  /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
    #3 0x19070df in visit_start_struct   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
    #4 0x1948b87 in visit_type_ThrottleLimits   qapi/qapi-visit-block-core.c:3759
    #5 0x17e4aa3 in throttle_group_set_limits   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900
    #6 0x1650eff in object_property_set     /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272
    #7 0x1658517 in object_property_set_qobject   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26
    #8 0x15880bb in qmp_qom_set  /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74
    #9 0x157e3e3 in qmp_marshal_qom_set  qapi/qapi-commands-qom.c:154

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: PanNengyuan <pannengyuan@huawei.com>
Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
The accel_list forgot to free, the asan output:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff919331cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
    #1 0xffff913f7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
    #2 0xffff91413d9b in g_strsplit (/lib64/libglib-2.0.so.0+0x73d9b)
    #3 0xaaab42fb58e7 in configure_accelerators /qemu/vl.c:2777
    #4 0xaaab42fb58e7 in main /qemu/vl.c:4121
    #5 0xffff8f9b0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
    #6 0xaaab42fc1dab  (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b1dab)

Indirect leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0xffff919331cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
    #1 0xffff913f7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
    #2 0xffff9141243b in g_strdup (/lib64/libglib-2.0.so.0+0x7243b)
    #3 0xffff91413e6f in g_strsplit (/lib64/libglib-2.0.so.0+0x73e6f)
    #4 0xaaab42fb58e7 in configure_accelerators /qemu/vl.c:2777
    #5 0xaaab42fb58e7 in main /qemu/vl.c:4121
    #6 0xffff8f9b0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
    #7 0xaaab42fc1dab  (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b1dab)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200108114207.58084-1-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
arichardson pushed a commit that referenced this issue Jan 29, 2020
start vm with libvirt, when GuestOS running, enter poweroff command using
the xhci keyboard, then ASAN shows memory leak stack:

Direct leak of 80 byte(s) in 5 object(s) allocated from:
    #0 0xfffd1e6431cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
    #1 0xfffd1e107163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
    #2 0xaaad39051367 in qemu_sglist_init /qemu/dma-helpers.c:43
    #3 0xaaad3947c407 in pci_dma_sglist_init /qemu/include/hw/pci/pci.h:842
    #4 0xaaad3947c407 in xhci_xfer_create_sgl /qemu/hw/usb/hcd-xhci.c:1446
    #5 0xaaad3947c407 in xhci_setup_packet /qemu/hw/usb/hcd-xhci.c:1618
    #6 0xaaad3948625f in xhci_submit /qemu/hw/usb/hcd-xhci.c:1827
    #7 0xaaad3948625f in xhci_fire_transfer /qemu/hw/usb/hcd-xhci.c:1839
    #8 0xaaad3948625f in xhci_kick_epctx /qemu/hw/usb/hcd-xhci.c:1991
    #9 0xaaad3948f537 in xhci_doorbell_write /qemu/hw/usb/hcd-xhci.c:3158
    #10 0xaaad38bcbfc7 in memory_region_write_accessor /qemu/memory.c:483
    #11 0xaaad38bc654f in access_with_adjusted_size /qemu/memory.c:544
    #12 0xaaad38bd1877 in memory_region_dispatch_write /qemu/memory.c:1482
    #13 0xaaad38b1c77f in flatview_write_continue /qemu/exec.c:3167
    #14 0xaaad38b1ca83 in flatview_write /qemu/exec.c:3207
    #15 0xaaad38b268db in address_space_write /qemu/exec.c:3297
    #16 0xaaad38bf909b in kvm_cpu_exec /qemu/accel/kvm/kvm-all.c:2383
    #17 0xaaad38bb063f in qemu_kvm_cpu_thread_fn /qemu/cpus.c:1246
    #18 0xaaad39821c93 in qemu_thread_start /qemu/util/qemu-thread-posix.c:519
    #19 0xfffd1c8378bb  (/lib64/libpthread.so.0+0x78bb)
    #20 0xfffd1c77616b  (/lib64/libc.so.6+0xd616b)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-id: 20200110105855.81144-1-kuhn.chenqun@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
One multifd channel will shutdown all the other multifd's IOChannel when it
fails to receive an IOChannel. In this senario, if some multifds had not
received its IOChannel yet, it would try to shutdown its IOChannel which could
cause nullptr access at qio_channel_shutdown.

Here is the coredump stack:
    #0  object_get_class (obj=obj@entry=0x0) at qom/object.c:908
    #1  0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355
    #2  0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280
    #3  0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478
    #4  0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605
    #5  0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44
    #6  0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166
    #7  0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54
    #8  0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    #9  0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218
    #10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241
    #11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
    #12 0x00005563fd8edb37 in main_loop () at vl.c:1791
    #13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473

To fix it up, let's check p->c before calling qio_channel_shutdown.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
…threads

One multifd will lock all the other multifds' IOChannel mutex to inform them
to quit by setting p->quit or shutting down p->c. In this senario, if some
multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup
had destroyed their mutex, it could cause destroyed mutex access when trying
lock their mutex.

Here is the coredump stack:
    #0  0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6
    #1  0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6
    #2  0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6
    #3  0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6
    #4  0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64
    #5  0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015
    #6  0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171
    #7  0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502
    #8  0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0
    #9  0x00007f81a286048d in clone () from /usr/lib64/libc.so.6

To fix it up, let's destroy the mutex after all the other multifd threads had
been terminated.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
v->vq forgot to cleanup in virtio_9p_device_unrealize, the memory leak
stack is as follow:

Direct leak of 14336 byte(s) in 2 object(s) allocated from:
  #0 0x7f819ae43970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f819872f49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x55a3a58da624 (./x86_64-softmmu/qemu-system-x86_64+0x2c14624)  /mnt/sdb/qemu/hw/virtio/virtio.c:2327
  #3 0x55a3a571bac7 (./x86_64-softmmu/qemu-system-x86_64+0x2a55ac7)  /mnt/sdb/qemu/hw/9pfs/virtio-9p-device.c:209
  #4 0x55a3a58e7bc6 (./x86_64-softmmu/qemu-system-x86_64+0x2c21bc6)  /mnt/sdb/qemu/hw/virtio/virtio.c:3504
  #5 0x55a3a5ebfb37 (./x86_64-softmmu/qemu-system-x86_64+0x31f9b37)  /mnt/sdb/qemu/hw/core/qdev.c:876

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117060927.51996-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Acked-by: Greg Kurz <groug@kaod.org>
arichardson pushed a commit that referenced this issue Jan 29, 2020
This patch fix memleaks when attaching/detaching virtio-scsi device, the
memory leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912
  #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924
  #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224)  /mnt/sdb/qemu/hw/core/qdev.c:865

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117075547.60864-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This
patch save receive/transmit vq pointer in realize() and cleanup vqs
through those vq pointers in unrealize(). The leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208)  /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339
  #4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65)  /mnt/sdb/qemu/hw/core/qdev.c:865
  #6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41)  /mnt/sdb/qemu/qom/object.c:2102

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200115062535.50644-1-pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
When adding new devices implementing QOM interfaces, we might
forgot to add the Kconfig dependency that pulls the required
objects in when building.

Since QOM dependencies are resolved at runtime, we don't get any
link-time failures, and QEMU aborts while starting:

  $ qemu ...
  Segmentation fault (core dumped)

  (gdb) bt
  #0  0x00007ff6e96b1e35 in raise () from /lib64/libc.so.6
  #1  0x00007ff6e969c895 in abort () from /lib64/libc.so.6
  #2  0x00005572bc5051cf in type_initialize (ti=0x5572be6f1200) at qom/object.c:323
  #3  0x00005572bc505074 in type_initialize (ti=0x5572be6f1800) at qom/object.c:301
  #4  0x00005572bc505074 in type_initialize (ti=0x5572be6e48e0) at qom/object.c:301
  #5  0x00005572bc506939 in object_class_by_name (typename=0x5572bc56109a) at qom/object.c:959
  #6  0x00005572bc503dd5 in cpu_class_by_name (typename=0x5572bc56109a, cpu_model=0x5572be6d9930) at hw/core/cpu.c:286

Since the caller has access to the qdev parent/interface names,
we can simply display them to avoid starting a debugger:

  $ qemu ...
  qemu: missing interface 'fancy-if' for object 'fancy-dev'
  Aborted (core dumped)

This commit is similar to e02bdf1 ("Display more helpful message
when an object type is missing").

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200118162348.17823-1-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
This patch fix memleaks when attaching/detaching virtio-scsi device, the
memory leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912
  #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924
  #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224)  /mnt/sdb/qemu/hw/core/qdev.c:865

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200117075547.60864-2-pannengyuan@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
All paths that lead to bdrv_backup_top_drop(), except for the call
from backup_clean(), imply that the BDS AioContext has already been
acquired, so doing it there too can potentially lead to QEMU hanging
on AIO_WAIT_WHILE().

An easy way to trigger this situation is by issuing a two actions
transaction, with a proper and a bogus blockdev-backup, so the second
one will trigger a rollback. This will trigger a hang with an stack
trace like this one:

 #0  0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>,
     timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
 #1  0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
     at /usr/include/bits/poll2.h:77
 #2  0x000055e743386e09 in qemu_poll_ns
     (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336
 #3  0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true)
     at util/aio-posix.c:669
 #4  0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878
 #5  0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017
 #6  0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262
 #7  0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644
 #8  0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273
 #9  0x000055e74331461f in backup_job_create
     (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478
 #10 0x000055e74315bc52 in do_backup_common
     (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at blockdev.c:3580
 #11 0x000055e74315c37c in do_blockdev_backup
     (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492
 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018)
     at blockdev.c:1885
 #13 0x000055e743160152 in qmp_transaction
     (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340
 #14 0x000055e743287ff5 in qmp_marshal_transaction
     (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8)
     at qapi/qapi-commands-transaction.c:44
 #15 0x000055e74333de6c in do_qmp_dispatch
     (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #16 0x000055e74333de6c in qmp_dispatch
     (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>)
     at qapi/qmp-dispatch.c:175
 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>)
     at monitor/qmp.c:145
 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117
 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117
 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459
 #22 0x000055e743385742 in aio_ctx_dispatch
     (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176
 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829
 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219
 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828
 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
     at vl.c:4504

Fix this by not acquiring the AioContext there, and ensuring all paths
leading to it have it already acquired (backup_clean()).

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
Dirty map addition and removal functions are not acquiring to BDS
AioContext, while they may call to code that expects it to be
acquired.

This may trigger a crash with a stack trace like this one:

 #0  0x00007f0ef146370f in __GI_raise (sig=sig@entry=6)
     at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007f0ef144db25 in __GI_abort () at abort.c:79
 #2  0x0000565022294dce in error_exit
     (err=<optimized out>, msg=msg@entry=0x56502243a730 <__func__.16350> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x00005650222950ba in qemu_mutex_unlock_impl
     (mutex=mutex@entry=0x5650244b0240, file=file@entry=0x565022439adf "util/async.c", line=line@entry=526) at util/qemu-thread-posix.c:108
 #4  0x0000565022290029 in aio_context_release
     (ctx=ctx@entry=0x5650244b01e0) at util/async.c:526
 #5  0x000056502221cd08 in bdrv_can_store_new_dirty_bitmap
     (bs=bs@entry=0x5650244dc820, name=name@entry=0x56502481d360 "bitmap1", granularity=granularity@entry=65536, errp=errp@entry=0x7fff22831718)
     at block/dirty-bitmap.c:542
 #6  0x000056502206ae53 in qmp_block_dirty_bitmap_add
     (errp=0x7fff22831718, disabled=false, has_disabled=<optimized out>, persistent=<optimized out>, has_persistent=true, granularity=65536, has_granularity=<optimized out>, name=0x56502481d360 "bitmap1", node=<optimized out>) at blockdev.c:2894
 #7  0x000056502206ae53 in qmp_block_dirty_bitmap_add
     (node=<optimized out>, name=0x56502481d360 "bitmap1", has_granularity=<optimized out>, granularity=<optimized out>, has_persistent=true, persistent=<optimized out>, has_disabled=false, disabled=false, errp=0x7fff22831718) at blockdev.c:2856
 #8  0x00005650221847a3 in qmp_marshal_block_dirty_bitmap_add
     (args=<optimized out>, ret=<optimized out>, errp=0x7fff22831798)
     at qapi/qapi-commands-block-core.c:651
 #9  0x0000565022247e6c in do_qmp_dispatch
     (errp=0x7fff22831790, allow_oob=<optimized out>, request=<optimized out>, cmds=0x565022b32d60 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #10 0x0000565022247e6c in qmp_dispatch
     (cmds=0x565022b32d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
 #11 0x0000565022166061 in monitor_qmp_dispatch
     (mon=0x56502450faa0, req=<optimized out>) at monitor/qmp.c:145
 #12 0x00005650221666fa in monitor_qmp_bh_dispatcher
     (data=<optimized out>) at monitor/qmp.c:234
 #13 0x000056502228f866 in aio_bh_call (bh=0x56502440eae0)
     at util/async.c:117
 #14 0x000056502228f866 in aio_bh_poll (ctx=ctx@entry=0x56502440d7a0)
     at util/async.c:117
 #15 0x0000565022292c54 in aio_dispatch (ctx=0x56502440d7a0)
     at util/aio-posix.c:459
 #16 0x000056502228f742 in aio_ctx_dispatch
     (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #17 0x00007f0ef5ce667d in g_main_dispatch (context=0x56502449aa40)
     at gmain.c:3176
 #18 0x00007f0ef5ce667d in g_main_context_dispatch
     (context=context@entry=0x56502449aa40) at gmain.c:3829
 #19 0x0000565022291d08 in glib_pollfds_poll () at util/main-loop.c:219
 #20 0x0000565022291d08 in os_host_main_loop_wait
     (timeout=<optimized out>) at util/main-loop.c:242
 #21 0x0000565022291d08 in main_loop_wait (nonblocking=<optimized out>)
     at util/main-loop.c:518
 #22 0x00005650220743c1 in main_loop () at vl.c:1828
 #23 0x0000565021f20a72 in main
     (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
     at vl.c:4504

Fix this by acquiring the AioContext at qmp_block_dirty_bitmap_add()
and qmp_block_dirty_bitmap_add().

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782175
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
arichardson pushed a commit that referenced this issue Jan 29, 2020
external_snapshot_abort() calls to bdrv_set_backing_hd(), which
returns state->old_bs to the main AioContext, as it's intended to be
used then the BDS is going to be released. As that's not the case when
aborting an external snapshot, return it to the AioContext it was
before the call.

This issue can be triggered by issuing a transaction with two actions,
a proper blockdev-snapshot-sync and a bogus one, so the second will
trigger a transaction abort. This results in a crash with an stack
trace like this one:

 #0  0x00007fa1048b28df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007fa10489ccf5 in __GI_abort () at abort.c:79
 #2  0x00007fa10489cbc9 in __assert_fail_base
     (fmt=0x7fa104a03300 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=0x557224014d30 "block.c", line=2240, function=<optimized out>) at assert.c:92
 #3  0x00007fa1048aae96 in __GI___assert_fail
     (assertion=assertion@entry=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=file@entry=0x557224014d30 "block.c", line=line@entry=2240, function=function@entry=0x5572240b5d60 <__PRETTY_FUNCTION__.31620> "bdrv_replace_child_noperm") at assert.c:101
 #4  0x0000557223e631f8 in bdrv_replace_child_noperm (child=0x557225b9c980, new_bs=new_bs@entry=0x557225c42e40) at block.c:2240
 #5  0x0000557223e68be7 in bdrv_replace_node (from=0x557226951a60, to=0x557225c42e40, errp=0x5572247d6138 <error_abort>) at block.c:4196
 #6  0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1731
 #7  0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1717
 #8  0x0000557223d09013 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x557225cc7d70, errp=errp@entry=0x7ffe704c0c98) at blockdev.c:2360
 #9  0x0000557223e32085 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffe704c0d08) at qapi/qapi-commands-transaction.c:44
 #10 0x0000557223ee798c in do_qmp_dispatch (errp=0x7ffe704c0d00, allow_oob=<optimized out>, request=<optimized out>, cmds=0x5572247d3cc0 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #11 0x0000557223ee798c in qmp_dispatch (cmds=0x5572247d3cc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
 #12 0x0000557223e06141 in monitor_qmp_dispatch (mon=0x557225c69ff0, req=<optimized out>) at monitor/qmp.c:120
 #13 0x0000557223e0678a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209
 #14 0x0000557223f2f366 in aio_bh_call (bh=0x557225b9dc60) at util/async.c:117
 #15 0x0000557223f2f366 in aio_bh_poll (ctx=ctx@entry=0x557225b9c840) at util/async.c:117
 #16 0x0000557223f32754 in aio_dispatch (ctx=0x557225b9c840) at util/aio-posix.c:459
 #17 0x0000557223f2f242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #18 0x00007fa10913467d in g_main_dispatch (context=0x557225c28e80) at gmain.c:3176
 #19 0x00007fa10913467d in g_main_context_dispatch (context=context@entry=0x557225c28e80) at gmain.c:3829
 #20 0x0000557223f31808 in glib_pollfds_poll () at util/main-loop.c:219
 #21 0x0000557223f31808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
 #22 0x0000557223f31808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
 #23 0x0000557223d13201 in main_loop () at vl.c:1828
 #24 0x0000557223bbfb82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1779036
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
arichardson pushed a commit that referenced this issue Feb 2, 2020
If the multifd_send_threads is not created when migration is failed,
multifd_save_cleanup would be called twice. In this senario, the
multifd_send_state is accessed after it has been released, the result
is that the source VM is crashing down.

Here is the coredump stack:
    Program received signal SIGSEGV, Segmentation fault.
    0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012
    1012            MultiFDSendParams *p = &multifd_send_state->params[i];
    #0  0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012
    #1  0x00005629333ab8a9 in multifd_save_cleanup () at migration/ram.c:1028
    #2  0x00005629333abaea in multifd_new_send_channel_async (task=0x562935450e70, opaque=<optimized out>) at migration/ram.c:1202
    #3  0x000056293373a562 in qio_task_complete (task=task@entry=0x562935450e70) at io/task.c:196
    #4  0x000056293373a6e0 in qio_task_thread_result (opaque=0x562935450e70) at io/task.c:111
    #5  0x00007f475d4d75a7 in g_idle_dispatch () from /usr/lib64/libglib-2.0.so.0
    #6  0x00007f475d4da9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    #7  0x0000562933785b33 in glib_pollfds_poll () at util/main-loop.c:219
    #8  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
    #9  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
    #10 0x00005629334c5acf in main_loop () at vl.c:1810
    #11 0x000056293334d7bb in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4471

If the multifd_send_threads is not created when migration is failed.
In this senario, we don't call multifd_save_cleanup in multifd_new_send_channel_async.

Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
arichardson pushed a commit that referenced this issue Feb 11, 2020
It's not a big deal, but 'check qtest-ppc/ppc64' runs fail if sanitizers is enabled.
The memory leak stack is as follow:

Direct leak of 128 byte(s) in 4 object(s) allocated from:
    #0 0x7f11756f5970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f1174f2549d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x556af05aa7da in mm_fw_cfg_init /mnt/sdb/qemu/tests/libqos/fw_cfg.c:119
    #3 0x556af059f4f5 in read_boot_order_pmac /mnt/sdb/qemu/tests/boot-order-test.c:137
    #4 0x556af059efe2 in test_a_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:47
    #5 0x556af059f2c0 in test_boot_orders /mnt/sdb/qemu/tests/boot-order-test.c:59
    #6 0x556af059f52d in test_pmac_oldworld_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:152
    #7 0x7f1174f46cb9  (/lib64/libglib-2.0.so.0+0x73cb9)
    #8 0x7f1174f46b73  (/lib64/libglib-2.0.so.0+0x73b73)
    #9 0x7f1174f46b73  (/lib64/libglib-2.0.so.0+0x73b73)
    #10 0x7f1174f46f71 in g_test_run_suite (/lib64/libglib-2.0.so.0+0x73f71)
    #11 0x7f1174f46f94 in g_test_run (/lib64/libglib-2.0.so.0+0x73f94)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200203025935.36228-1-pannengyuan@huawei.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
arichardson pushed a commit that referenced this issue Feb 11, 2020
When remove dup_fd in monitor_fdset_dup_fd_find_remove function,
we need to free mon_fdset_fd_dup. ASAN shows memory leak stack:

Direct leak of 96 byte(s) in 3 object(s) allocated from:
    #0 0xfffd37b033b3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33b3)
    #1 0xfffd375c71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb)
    #2 0xaaae25bf1c17 in monitor_fdset_dup_fd_add /qemu/monitor/misc.c:1724
    #3 0xaaae265cfd8f in qemu_open /qemu/util/osdep.c:315
    #4 0xaaae264e2b2b in qmp_chardev_open_file_source /qemu/chardev/char-fd.c:122
    #5 0xaaae264e47cf in qmp_chardev_open_file /qemu/chardev/char-file.c:81
    #6 0xaaae264e118b in qemu_char_open /qemu/chardev/char.c:237
    #7 0xaaae264e118b in qemu_chardev_new /qemu/chardev/char.c:964
    #8 0xaaae264e1543 in qemu_chr_new_from_opts /qemu/chardev/char.c:680
    #9 0xaaae25e12e0f in chardev_init_func /qemu/vl.c:2083
    #10 0xaaae26603823 in qemu_opts_foreach /qemu/util/qemu-option.c:1170
    #11 0xaaae258c9787 in main /qemu/vl.c:4089
    #12 0xfffd35b80b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
    #13 0xaaae258d7b63  (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b7b63)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200115072016.167252-1-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
arichardson pushed a commit that referenced this issue Feb 11, 2020
If we call the qmp 'query-block' while qemu is working on
'block-commit', it will cause memleaks, the memory leak stack is as
follow:

Indirect leak of 12360 byte(s) in 3 object(s) allocated from:
    #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29
    #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427
    #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
    #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
    #6 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
    #7 0x55ea958818ea in bdrv_block_device_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:56
    #8 0x55ea958879de in bdrv_query_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:392
    #9 0x55ea9588b58f in qmp_query_block /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:578
    #10 0x55ea95567392 in qmp_marshal_query_block qapi/qapi-commands-block-core.c:95

Indirect leak of 4120 byte(s) in 1 object(s) allocated from:
    #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29
    #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427
    #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
    #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
    #6 0x55ea9569f301 in bdrv_backing_attach /mnt/sdb/qemu-4.2.0-rc0/block.c:1064
    #7 0x55ea956a99dd in bdrv_replace_child_noperm /mnt/sdb/qemu-4.2.0-rc0/block.c:2283
    #8 0x55ea956b9b53 in bdrv_replace_node /mnt/sdb/qemu-4.2.0-rc0/block.c:4196
    #9 0x55ea956b9e49 in bdrv_append /mnt/sdb/qemu-4.2.0-rc0/block.c:4236
    #10 0x55ea958c3472 in commit_start /mnt/sdb/qemu-4.2.0-rc0/block/commit.c:306
    #11 0x55ea94b68ab0 in qmp_block_commit /mnt/sdb/qemu-4.2.0-rc0/blockdev.c:3459
    #12 0x55ea9556a7a7 in qmp_marshal_block_commit qapi/qapi-commands-block-core.c:407

Fixes: bb808d5
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-id: 20200116085600.24056-1-pannengyuan@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
trasz pushed a commit that referenced this issue Feb 28, 2020
gtk_widget_get_window() returns NULL if the widget's window is not
realized, and QEMU crashes. Example under gtk 3.22.30 (mate 1.20.1):

  qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion 'GDK_IS_WINDOW (window)' failed
  (gdb) bt
  #0  0x00007ffff496cf70 in gdk_window_get_origin () from /usr/lib64/libgdk-3.so.0
  #1  0x00007ffff49582a0 in gdk_display_get_monitor_at_window () from /usr/lib64/libgdk-3.so.0
  #2  0x0000555555bb73e2 in gd_refresh_rate_millihz (window=0x5555579d6280) at ui/gtk.c:1973
  #3  gd_vc_gfx_init (view_menu=0x5555579f0590, group=0x0, idx=0, con=<optimized out>, vc=0x5555579d4a90, s=0x5555579d49f0) at ui/gtk.c:2048
  #4  gd_create_menu_view (s=0x5555579d49f0) at ui/gtk.c:2149
  #5  gd_create_menus (s=0x5555579d49f0) at ui/gtk.c:2188
  #6  gtk_display_init (ds=<optimized out>, opts=0x55555661ed80 <dpy>) at ui/gtk.c:2256
  #7  0x000055555583d5a0 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4358

Fixes: c4c0092 and 28b58f1 (display/gtk: get proper refreshrate)
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Jan Kiszka <jan.kiszka@web.de>
Message-id: 20200208161048.11311-3-f4bug@amsat.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
arichardson added a commit to arichardson/qemu that referenced this issue Jul 7, 2021
These functions have assertions that are enabled even in debug mode and
those assertion show up while profiling QEMU booting CheriBSD.
Re-implementing them without assertions gives a small but measurable speedup:

```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.494 s ±  0.054 s    [User: 8.519 s, System: 0.178 s]
  Range (min … max):    9.443 s …  9.600 s    10 runs

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.284 s ±  0.043 s    [User: 8.249 s, System: 0.135 s]
  Range (min … max):    9.234 s …  9.381 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit to arichardson/qemu that referenced this issue Jul 7, 2021
The bitwise operations are more expensive than expanding it to 32 bytes.
Additionally, QEMU's deposit64() and extract64() have an assertion that
will be evaluated on every call (this shows up in `perf report` traces).

This is another 1.08 times faster than CTSRD-CHERI#153
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):     10.644 s ±  0.051 s    [User: 10.203 s, System: 0.122 s]
  Range (min … max):   10.586 s … 10.701 s    5 runs

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.852 s ±  0.091 s    [User: 9.159 s, System: 0.154 s]
  Range (min … max):    9.780 s …  9.965 s    5 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.08 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit that referenced this issue Jul 7, 2021
The bitwise operations are more expensive than expanding the 64-bit value
to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT
kernel and a 1.08x speedup for the full purecap kernel+purecap userspace
boot. Benchmarks using a previous version of this patch indicated a 1.08
speedup for the MFS_ROOT case. My assumption is that removing the TCG
global accesses to cpu_capreg_state now has a larger impact after
0c09763 remove the incorrect NO_RWG flag.

MFS_ROOT boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.499 s ±  0.027 s    [User: 8.581 s, System: 0.136 s]
  Range (min … max):    9.448 s …  9.539 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.281 s ±  0.029 s    [User: 8.260 s, System: 0.137 s]
  Range (min … max):    9.234 s …  9.326 s    10 runs

Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      7.852 s ±  0.053 s    [User: 7.523 s, System: 0.182 s]
  Range (min … max):    7.793 s …  7.933 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
    1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
Full purecap+purecap boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     227.351 s ±  0.484 s    [User: 213.193 s, System: 3.544 s]
  Range (min … max):   226.792 s … 227.646 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.156 s ±  3.601 s    [User: 197.448 s, System: 1.979 s]
  Range (min … max):   206.397 s … 213.575 s    3 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Jul 7, 2021
These functions have assertions that are enabled even in debug mode and
those assertion show up while profiling QEMU booting CheriBSD.
Re-implementing them without assertions gives a small but measurable speedup:

```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.494 s ±  0.054 s    [User: 8.519 s, System: 0.178 s]
  Range (min … max):    9.443 s …  9.600 s    10 runs

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.284 s ±  0.043 s    [User: 8.249 s, System: 0.135 s]
  Range (min … max):    9.234 s …  9.381 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit to arichardson/qemu that referenced this issue Jul 7, 2021
The bitwise operations are more expensive than expanding it to 32 bytes.
Additionally, QEMU's deposit64() and extract64() have an assertion that
will be evaluated on every call (this shows up in `perf report` traces).

This is another 1.08 times faster than CTSRD-CHERI#153
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):     10.644 s ±  0.051 s    [User: 10.203 s, System: 0.122 s]
  Range (min … max):   10.586 s … 10.701 s    5 runs

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.852 s ±  0.091 s    [User: 9.159 s, System: 0.154 s]
  Range (min … max):    9.780 s …  9.965 s    5 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.08 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.after-cincoffset -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit that referenced this issue Jul 7, 2021
These functions have assertions that are enabled even in debug mode and
those assertion show up while profiling QEMU booting CheriBSD.
Re-implementing them without assertions gives a small but measurable speedup:

```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.494 s ±  0.054 s    [User: 8.519 s, System: 0.178 s]
  Range (min … max):    9.443 s …  9.600 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.284 s ±  0.043 s    [User: 8.249 s, System: 0.135 s]
  Range (min … max):    9.234 s …  9.381 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit that referenced this issue Jul 7, 2021
The bitwise operations are more expensive than expanding the 64-bit value
to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT
kernel and a 1.08x speedup for the full purecap kernel+purecap userspace
boot. Benchmarks using a previous version of this patch indicated a 1.08
speedup for the MFS_ROOT case. My assumption is that removing the TCG
global accesses to cpu_capreg_state now has a larger impact after
0c09763 remove the incorrect NO_RWG flag.

MFS_ROOT boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.499 s ±  0.027 s    [User: 8.581 s, System: 0.136 s]
  Range (min … max):    9.448 s …  9.539 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.281 s ±  0.029 s    [User: 8.260 s, System: 0.137 s]
  Range (min … max):    9.234 s …  9.326 s    10 runs

Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      7.852 s ±  0.053 s    [User: 7.523 s, System: 0.182 s]
  Range (min … max):    7.793 s …  7.933 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
    1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
Full purecap+purecap boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     227.351 s ±  0.484 s    [User: 213.193 s, System: 3.544 s]
  Range (min … max):   226.792 s … 227.646 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.156 s ±  3.601 s    [User: 197.448 s, System: 1.979 s]
  Range (min … max):   206.397 s … 213.575 s    3 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Jul 15, 2021
These functions are called extremely frequently when booting CheriBSD
(for every capability load/store), so the `switch (access_type)`
inside probe_access_internal() has a noticeable impact on overall
performance. By adding QEMU_ALWAYS_INLINE, we can avoid this switch
for all calls to probe_{write,read,cap_write}() and thereby speed up
the CheriBSD boot:

MFS_ROOT purecap:
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 5
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      8.501 s ±  0.601 s    [User: 8.014 s, System: 0.121 s]
  Range (min … max):    7.999 s …  9.194 s    5 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.739 s ±  0.030 s    [User: 8.973 s, System: 0.206 s]
  Range (min … max):    9.707 s …  9.774 s    5 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.15 ± 0.08 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```

Full disk image purecap kernel:
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     213.637 s ±  1.920 s    [User: 200.744 s, System: 2.847 s]
  Range (min … max):   211.734 s … 215.573 s    3 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     243.621 s ±  3.036 s    [User: 230.935 s, System: 2.485 s]
  Range (min … max):   240.935 s … 246.915 s    3 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.14 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-inline-probe-functions --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Jul 15, 2021
These functions have assertions that are enabled even in debug mode and
those assertion show up while profiling QEMU booting CheriBSD.
Re-implementing them without assertions gives a small but measurable speedup:

```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.494 s ±  0.054 s    [User: 8.519 s, System: 0.178 s]
  Range (min … max):    9.443 s …  9.600 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.284 s ±  0.043 s    [User: 8.249 s, System: 0.135 s]
  Range (min … max):    9.234 s …  9.381 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.02 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
arichardson added a commit that referenced this issue Jul 15, 2021
The bitwise operations are more expensive than expanding the 64-bit value
to a 32 byte array. This results in a 1.21x speedup running the MFS_ROOT
kernel and a 1.08x speedup for the full purecap kernel+purecap userspace
boot. Benchmarks using a previous version of this patch indicated a 1.08
speedup for the MFS_ROOT case. My assumption is that removing the TCG
global accesses to cpu_capreg_state now has a larger impact after
0c09763 remove the incorrect NO_RWG flag.

MFS_ROOT boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -w 1
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.499 s ±  0.027 s    [User: 8.581 s, System: 0.136 s]
  Range (min … max):    9.448 s …  9.539 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      9.281 s ±  0.029 s    [User: 8.260 s, System: 0.137 s]
  Range (min … max):    9.234 s …  9.326 s    10 runs

Benchmark #3: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      7.852 s ±  0.053 s    [User: 7.523 s, System: 0.182 s]
  Range (min … max):    7.793 s …  7.933 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.18 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.no-deposit-assertions -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
    1.21 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
```
Full purecap+purecap boot:
```
hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 3
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     227.351 s ±  0.484 s    [User: 213.193 s, System: 3.544 s]
  Range (min … max):   226.792 s … 227.646 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.156 s ±  3.601 s    [User: 197.448 s, System: 1.979 s]
  Range (min … max):   206.397 s … 213.575 s    3 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.08 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.v5.2.0-933-g0c09763123 --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Jul 30, 2021
Profiling showed that we were spending unncessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Jul 30, 2021
Profiling showed that we were spending unncessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Aug 4, 2021
Profiling showed that we were spending unncessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Sep 7, 2021
Profiling showed that we were spending unnecessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Sep 7, 2021
Profiling showed that we were spending unnecessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Sep 7, 2021
Profiling showed that we were spending unnecessary amounts of time
unpacking these fields. This speeds up a QEMU purecap kernel boot by
about 5 seconds:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.e2d8499d9a,qemu-system-riscv64cheri' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     215.379 s ±  1.550 s    [User: 203.415 s, System: 2.209 s]
  Range (min … max):   212.726 s … 216.815 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     210.971 s ±  1.600 s    [User: 199.158 s, System: 2.056 s]
  Range (min … max):   208.997 s … 212.825 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.02 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.e2d8499d9a --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Sep 29, 2021
Running perf shows me that with the morello changes a RISC-V purecap
CheriBSD boot spends 4.7% of total time in this function.

Before this change:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd
_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/
alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark CTSRD-CHERI#1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     221.103 s ±  2.055 s    [User: 208.764 s, System: 2.578 s]
  Range (min … max):   218.424 s … 223.736 s    5 runs

Benchmark CTSRD-CHERI#2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     209.378 s ±  1.133 s    [User: 196.410 s, System: 3.062 s]
  Range (min … max):   207.857 s … 210.841 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Sep 29, 2021
Running perf shows me that with the morello changes a RISC-V purecap
CheriBSD boot spends 4.7% of total time in this function.

Before this change:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd
_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/
alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     221.103 s ±  2.055 s    [User: 208.764 s, System: 2.578 s]
  Range (min … max):   218.424 s … 223.736 s    5 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     209.378 s ±  1.133 s    [User: 196.410 s, System: 3.062 s]
  Range (min … max):   207.857 s … 210.841 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```

After:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello-ldst-reg-inlined,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     206.196 s ±  1.410 s    [User: 194.016 s, System: 2.004 s]
  Range (min … max):   203.675 s … 206.932 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     203.379 s ±  0.572 s    [User: 190.759 s, System: 2.685 s]
  Range (min … max):   202.715 s … 203.860 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.01 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Sep 29, 2021
Running perf shows me that with the morello changes a RISC-V purecap
CheriBSD boot spends 4.7% of total time in this function.

Before this change:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd
_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/
alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     221.103 s ±  2.055 s    [User: 208.764 s, System: 2.578 s]
  Range (min … max):   218.424 s … 223.736 s    5 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     209.378 s ±  1.133 s    [User: 196.410 s, System: 3.062 s]
  Range (min … max):   207.857 s … 210.841 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```

After:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello-ldst-reg-inlined,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     206.196 s ±  1.410 s    [User: 194.016 s, System: 2.004 s]
  Range (min … max):   203.675 s … 206.932 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     203.379 s ±  0.572 s    [User: 190.759 s, System: 2.685 s]
  Range (min … max):   202.715 s … 203.860 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.01 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Oct 1, 2021
Running perf shows me that with the morello changes a RISC-V purecap
CheriBSD boot spends 4.7% of total time in this function.

Before this change:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd
_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/
alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     221.103 s ±  2.055 s    [User: 208.764 s, System: 2.578 s]
  Range (min … max):   218.424 s … 223.736 s    5 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     209.378 s ±  1.133 s    [User: 196.410 s, System: 3.062 s]
  Range (min … max):   207.857 s … 210.841 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```

After:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello-ldst-reg-inlined,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     206.196 s ±  1.410 s    [User: 194.016 s, System: 2.004 s]
  Range (min … max):   203.675 s … 206.932 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     203.379 s ±  0.572 s    [User: 190.759 s, System: 2.685 s]
  Range (min … max):   202.715 s … 203.860 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.01 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit that referenced this issue Oct 12, 2021
Running perf shows me that with the morello changes a RISC-V purecap
CheriBSD boot spends 4.7% of total time in this function.

Before this change:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd
_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/
alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     221.103 s ±  2.055 s    [User: 208.764 s, System: 2.578 s]
  Range (min … max):   218.424 s … 223.736 s    5 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     209.378 s ±  1.133 s    [User: 196.410 s, System: 3.062 s]
  Range (min … max):   207.857 s … 210.841 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.06 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```

After:
```
hyperfine -L qemu 'qemu-system-riscv64cheri.morello-ldst-reg-inlined,qemu-system-riscv64cheri.dev-pre-morello' '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/{qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 5
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     206.196 s ±  1.410 s    [User: 194.016 s, System: 2.004 s]
  Range (min … max):   203.675 s … 206.932 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     203.379 s ±  0.572 s    [User: 190.759 s, System: 2.685 s]
  Range (min … max):   202.715 s … 203.860 s    5 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-pre-morello --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.01 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.morello-ldst-reg-inlined --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
arichardson added a commit to arichardson/qemu that referenced this issue Oct 13, 2021
…idx()

probe_read() is a very hot function when booting CheriBSD (6.42% of the
total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot).
Looking at the perf report, we were calling probe read once inside
load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in
cheri_tag_get(), even though we know that the second call cannot trigger
a MMU fault. Pass the host address as a argument to cheri_tag_get() and
skip the probe_read() call if the passed argument is non-NULL.

With this change probe_read() is down to 3.69% of the total.
```
hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello  '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.005 s ±  0.461 s    [User: 7.585 s, System: 0.151 s]
  Range (min … max):    7.741 s …  8.908 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.599 s ±  0.530 s    [User: 7.816 s, System: 0.122 s]
  Range (min … max):    7.821 s …  8.989 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran
    1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci'
```

```
perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null

 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8177.658418      task-clock (msec)         #    0.936 CPUs utilized            ( +-  0.71% )
             2,398      context-switches          #    0.293 K/sec                    ( +-  1.64% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
            12,858      page-faults               #    0.002 M/sec                    ( +-  0.59% )
    27,261,165,519      cycles                    #    3.334 GHz                      ( +-  0.17% )
    75,402,569,943      instructions              #    2.77  insn per cycle           ( +-  0.16% )
    11,183,417,524      branches                  # 1367.557 M/sec                    ( +-  0.15% )
       162,719,113      branch-misses             #    1.46% of all branches          ( +-  0.20% )

       8.738376962 seconds time elapsed                                          ( +-  3.03% )
```

Before this change:
```
 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8429.485885      task-clock (msec)         #    0.911 CPUs utilized            ( +-  0.35% )
             2,508      context-switches          #    0.298 K/sec                    ( +-  0.61% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 66.67% )
            12,867      page-faults               #    0.002 M/sec                    ( +-  0.79% )
    27,831,470,600      cycles                    #    3.302 GHz                      ( +-  0.08% )
    77,801,651,546      instructions              #    2.80  insn per cycle           ( +-  0.01% )
    11,223,140,290      branches                  # 1331.415 M/sec                    ( +-  0.01% )
       164,672,766      branch-misses             #    1.47% of all branches          ( +-  0.08% )

       9.256983251 seconds time elapsed                                          ( +-  0.23% )
```
arichardson added a commit to arichardson/qemu that referenced this issue Oct 13, 2021
…idx()

probe_read() is a very hot function when booting CheriBSD (6.42% of the
total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot).
Looking at the perf report, we were calling probe read once inside
load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in
cheri_tag_get(), even though we know that the second call cannot trigger
a MMU fault. Pass the host address as a argument to cheri_tag_get() and
skip the probe_read() call if the passed argument is non-NULL.

With this change probe_read() is down to 3.69% of the total.
```
hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello  '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.005 s ±  0.461 s    [User: 7.585 s, System: 0.151 s]
  Range (min … max):    7.741 s …  8.908 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.599 s ±  0.530 s    [User: 7.816 s, System: 0.122 s]
  Range (min … max):    7.821 s …  8.989 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran
    1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci'
```

```
perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null

 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8177.658418      task-clock (msec)         #    0.936 CPUs utilized            ( +-  0.71% )
             2,398      context-switches          #    0.293 K/sec                    ( +-  1.64% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
            12,858      page-faults               #    0.002 M/sec                    ( +-  0.59% )
    27,261,165,519      cycles                    #    3.334 GHz                      ( +-  0.17% )
    75,402,569,943      instructions              #    2.77  insn per cycle           ( +-  0.16% )
    11,183,417,524      branches                  # 1367.557 M/sec                    ( +-  0.15% )
       162,719,113      branch-misses             #    1.46% of all branches          ( +-  0.20% )

       8.738376962 seconds time elapsed                                          ( +-  3.03% )
```

Before this change:
```
 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8429.485885      task-clock (msec)         #    0.911 CPUs utilized            ( +-  0.35% )
             2,508      context-switches          #    0.298 K/sec                    ( +-  0.61% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 66.67% )
            12,867      page-faults               #    0.002 M/sec                    ( +-  0.79% )
    27,831,470,600      cycles                    #    3.302 GHz                      ( +-  0.08% )
    77,801,651,546      instructions              #    2.80  insn per cycle           ( +-  0.01% )
    11,223,140,290      branches                  # 1331.415 M/sec                    ( +-  0.01% )
       164,672,766      branch-misses             #    1.47% of all branches          ( +-  0.08% )

       9.256983251 seconds time elapsed                                          ( +-  0.23% )
```


Before
```
 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.probe_read_fix -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8177.658418      task-clock (msec)         #    0.936 CPUs utilized            ( +-  0.71% )
             2,398      context-switches          #    0.293 K/sec                    ( +-  1.64% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
            12,858      page-faults               #    0.002 M/sec                    ( +-  0.59% )
    27,261,165,519      cycles                    #    3.334 GHz                      ( +-  0.17% )
    75,402,569,943      instructions              #    2.77  insn per cycle           ( +-  0.16% )
    11,183,417,524      branches                  # 1367.557 M/sec                    ( +-  0.15% )
       162,719,113      branch-misses             #    1.46% of all branches          ( +-  0.20% )

       8.738376962 seconds time elapsed                                          ( +-  3.03% )
```
arichardson added a commit to arichardson/qemu that referenced this issue Oct 13, 2021
…idx()

probe_read() is a very hot function when booting CheriBSD (6.42% of the
total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot).
Looking at the perf report, we were calling probe read once inside
load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in
cheri_tag_get(), even though we know that the second call cannot trigger
a MMU fault. Pass the host address as a argument to cheri_tag_get() and
skip the probe_read() call if the passed argument is non-NULL.

With this change probe_read() is down to 3.69% of the total.
```
hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello  '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10
Benchmark CTSRD-CHERI#1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.005 s ±  0.461 s    [User: 7.585 s, System: 0.151 s]
  Range (min … max):    7.741 s …  8.908 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark CTSRD-CHERI#2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.599 s ±  0.530 s    [User: 7.816 s, System: 0.122 s]
  Range (min … max):    7.821 s …  8.989 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran
    1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci'
```

```
perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null

 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8177.658418      task-clock (msec)         #    0.936 CPUs utilized            ( +-  0.71% )
             2,398      context-switches          #    0.293 K/sec                    ( +-  1.64% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
            12,858      page-faults               #    0.002 M/sec                    ( +-  0.59% )
    27,261,165,519      cycles                    #    3.334 GHz                      ( +-  0.17% )
    75,402,569,943      instructions              #    2.77  insn per cycle           ( +-  0.16% )
    11,183,417,524      branches                  # 1367.557 M/sec                    ( +-  0.15% )
       162,719,113      branch-misses             #    1.46% of all branches          ( +-  0.20% )

       8.738376962 seconds time elapsed                                          ( +-  3.03% )
```

Before this change:
```
 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8429.485885      task-clock (msec)         #    0.911 CPUs utilized            ( +-  0.35% )
             2,508      context-switches          #    0.298 K/sec                    ( +-  0.61% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 66.67% )
            12,867      page-faults               #    0.002 M/sec                    ( +-  0.79% )
    27,831,470,600      cycles                    #    3.302 GHz                      ( +-  0.08% )
    77,801,651,546      instructions              #    2.80  insn per cycle           ( +-  0.01% )
    11,223,140,290      branches                  # 1331.415 M/sec                    ( +-  0.01% )
       164,672,766      branch-misses             #    1.47% of all branches          ( +-  0.08% )

       9.256983251 seconds time elapsed                                          ( +-  0.23% )
```
arichardson added a commit that referenced this issue Oct 13, 2021
…idx()

probe_read() is a very hot function when booting CheriBSD (6.42% of the
total time for a purecap MFS_ROOT /sbin/startup-benchmark.sh boot).
Looking at the perf report, we were calling probe read once inside
load_cap_from_memory_raw_tag_mmu_idx() and then calling it again in
cheri_tag_get(), even though we know that the second call cannot trigger
a MMU fault. Pass the host address as a argument to cheri_tag_get() and
skip the probe_read() call if the passed argument is non-NULL.

With this change probe_read() is down to 3.69% of the total.
```
hyperfine -L qemu qemu-system-riscv64cheri,qemu-system-riscv64cheri.dev-post-morello  '/local/scratch/alr48/cheri/output/sdk/bin/{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' -m 10
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.005 s ±  0.461 s    [User: 7.585 s, System: 0.151 s]
  Range (min … max):    7.741 s …  8.908 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci
  Time (mean ± σ):      8.599 s ±  0.530 s    [User: 7.816 s, System: 0.122 s]
  Range (min … max):    7.821 s …  8.989 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' ran
    1.07 ± 0.09 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci'
```

```
perf stat -r 5 /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci > /dev/null

 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8177.658418      task-clock (msec)         #    0.936 CPUs utilized            ( +-  0.71% )
             2,398      context-switches          #    0.293 K/sec                    ( +-  1.64% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
            12,858      page-faults               #    0.002 M/sec                    ( +-  0.59% )
    27,261,165,519      cycles                    #    3.334 GHz                      ( +-  0.17% )
    75,402,569,943      instructions              #    2.77  insn per cycle           ( +-  0.16% )
    11,183,417,524      branches                  # 1367.557 M/sec                    ( +-  0.15% )
       162,719,113      branch-misses             #    1.46% of all branches          ( +-  0.20% )

       8.738376962 seconds time elapsed                                          ( +-  3.03% )
```

Before this change:
```
 Performance counter stats for '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.dev-post-morello -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh -device virtio-rng-pci' (5 runs):

       8429.485885      task-clock (msec)         #    0.911 CPUs utilized            ( +-  0.35% )
             2,508      context-switches          #    0.298 K/sec                    ( +-  0.61% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 66.67% )
            12,867      page-faults               #    0.002 M/sec                    ( +-  0.79% )
    27,831,470,600      cycles                    #    3.302 GHz                      ( +-  0.08% )
    77,801,651,546      instructions              #    2.80  insn per cycle           ( +-  0.01% )
    11,223,140,290      branches                  # 1331.415 M/sec                    ( +-  0.01% )
       164,672,766      branch-misses             #    1.47% of all branches          ( +-  0.08% )

       9.256983251 seconds time elapsed                                          ( +-  0.23% )
```
arichardson pushed a commit that referenced this issue Sep 1, 2023
We can't know the caller read enough data in the memory pointed
by ext_hdr to cast it as a ip6_ext_hdr_routing.
Declare rt_hdr on the stack and fill it again from the iovec.

Since we already checked there is enough data in the iovec buffer,
simply add an assert() call to consume the bytes_read variable.

This fix a 2 bytes buffer overrun in eth_parse_ipv6_hdr() reported
by QEMU fuzzer:

  $ cat << EOF | ./qemu-system-i386 -M pc-q35-5.0 \
    -accel qtest -monitor none \
    -serial none -nographic -qtest stdio
  outl 0xcf8 0x80001010
  outl 0xcfc 0xe1020000
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  write 0x25 0x1 0x86
  write 0x26 0x1 0xdd
  write 0x4f 0x1 0x2b
  write 0xe1020030 0x4 0x190002e1
  write 0xe102003a 0x2 0x0807
  write 0xe1020048 0x4 0x12077cdd
  write 0xe1020400 0x4 0xba077cdd
  write 0xe1020420 0x4 0x190002e1
  write 0xe1020428 0x4 0x3509d807
  write 0xe1020438 0x1 0xe2
  EOF
  =================================================================
  ==2859770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdef904902 at pc 0x561ceefa78de bp 0x7ffdef904820 sp 0x7ffdef904818
  READ of size 1 at 0x7ffdef904902 thread T0
      #0 0x561ceefa78dd in _eth_get_rss_ex_dst_addr net/eth.c:410:17
      #1 0x561ceefa41fb in eth_parse_ipv6_hdr net/eth.c:532:17
      #2 0x561cef7de639 in net_tx_pkt_parse_headers hw/net/net_tx_pkt.c:228:14
      #3 0x561cef7dbef4 in net_tx_pkt_parse hw/net/net_tx_pkt.c:273:9
      #4 0x561ceec29f22 in e1000e_process_tx_desc hw/net/e1000e_core.c:730:29
      #5 0x561ceec28eac in e1000e_start_xmit hw/net/e1000e_core.c:927:9
      #6 0x561ceec1baab in e1000e_set_tdt hw/net/e1000e_core.c:2444:9
      #7 0x561ceebf300e in e1000e_core_write hw/net/e1000e_core.c:3256:9
      #8 0x561cef3cd4cd in e1000e_mmio_write hw/net/e1000e.c:110:5

  Address 0x7ffdef904902 is located in stack of thread T0 at offset 34 in frame
      #0 0x561ceefa320f in eth_parse_ipv6_hdr net/eth.c:486

    This frame has 1 object(s):
      [32, 34) 'ext_hdr' (line 487) <== Memory access at offset 34 overflows this variable
  HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
        (longjmp and C++ exceptions *are* supported)
  SUMMARY: AddressSanitizer: stack-buffer-overflow net/eth.c:410:17 in _eth_get_rss_ex_dst_addr
  Shadow bytes around the buggy address:
    0x10003df188d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df188e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df188f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18910: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  =>0x10003df18920:[02]f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Stack left redzone:      f1
    Stack right redzone:     f3
  ==2859770==ABORTING

Add the corresponding qtest case with the fuzzer reproducer.

FWIW GCC 11 similarly reported:

  net/eth.c: In function 'eth_parse_ipv6_hdr':
  net/eth.c:410:15: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds]
    410 |     if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) {
        |          ~~~~~^~~~~~~
  net/eth.c:485:24: note: while referencing 'ext_hdr'
    485 |     struct ip6_ext_hdr ext_hdr;
        |                        ^~~~~~~
  net/eth.c:410:38: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds]
    410 |     if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) {
        |                                 ~~~~~^~~~~~~~~
  net/eth.c:485:24: note: while referencing 'ext_hdr'
    485 |     struct ip6_ext_hdr ext_hdr;
        |                        ^~~~~~~

Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/qemu/+bug/1879531
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Miroslav Rezanina <mrezanin@redhat.com>
Fixes: eb70002 ("net_pkt: Extend packet abstraction as required by e1000e functionality")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
arichardson pushed a commit that referenced this issue Sep 1, 2023
Incoming enabled bitmaps are busy, because we do
bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps
being migrated are not marked busy, and user can remove them during the
incoming migration. Then we may crash in cancel_incoming_locked() when
try to remove the bitmap that was already removed by user, like this:

 #0  qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20
   "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77
 #1  bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9)
   at ../block/dirty-bitmap.c:64
 #2  bdrv_release_dirty_bitmap (bitmap=0x5596810e9570)
   at ../block/dirty-bitmap.c:362
 #3  cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>)
   at ../migration/block-dirty-bitmap.c:918
 #4  dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0
   <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194
 #5  vmstate_load (f=0x559681d02b10, se=0x559680fb5810)
   at ../migration/savevm.c:908
 #6  qemu_loadvm_section_part_end (f=0x559681d02b10,
   mis=0x559680fb4a30) at ../migration/savevm.c:2473
 #7  qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30)
   at ../migration/savevm.c:2626
 #8  postcopy_ram_listen_thread (opaque=0x0)
   at ../migration/savevm.c:1871
 #9  qemu_thread_start (args=0x5596817ccd10)
   at ../util/qemu-thread-posix.c:521
 #10 start_thread () at /lib64/libpthread.so.0
 #11 clone () at /lib64/libc.so.6

Note bs pointer taken from bitmap: it's definitely bad aligned. That's
because we are in use after free, bitmap is already freed.

So, let's make disabled bitmaps (being migrated) busy during incoming
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210322094906.5079-2-vsementsov@virtuozzo.com>
arichardson pushed a commit that referenced this issue Sep 1, 2023
When building with --enable-sanitizers we get:

  Direct leak of 32 byte(s) in 2 object(s) allocated from:
      #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf)
      #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958)
      #2 0x561847f02ca2 in usb_packet_init hw/usb/core.c:531:5
      #3 0x561848df4df4 in usb_ehci_init hw/usb/hcd-ehci.c:2575:5
      #4 0x561847c119ac in ehci_sysbus_init hw/usb/hcd-ehci-sysbus.c:73:5
      #5 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9
      #6 0x56184a5bd955 in object_init_with_type qom/object.c:371:9
      #7 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5
      #8 0x56184a5a24d5 in object_initialize qom/object.c:536:5
      #9 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5
      #10 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10
      #11 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5
      #12 0x561849542d18 in npcm7xx_init hw/arm/npcm7xx.c:427:5

Similarly to commit d710e1e ("usb: ehci: fix memory leak in
ehci"), fix by calling usb_ehci_finalize() to free the USBPacket.

Fixes: 7341ea0
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210323183701.281152-1-f4bug@amsat.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
arichardson pushed a commit that referenced this issue Sep 1, 2023
When building with --enable-sanitizers we get:

  Direct leak of 16 byte(s) in 1 object(s) allocated from:
      #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf)
      #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958)
      #2 0x561847c2dcc9 in xlnx_dp_init hw/display/xlnx_dp.c:1259:5
      #3 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9
      #4 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5
      #5 0x56184a5a24d5 in object_initialize qom/object.c:536:5
      #6 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5
      #7 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10
      #8 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5
      #9 0x5618495aa431 in xlnx_zynqmp_init hw/arm/xlnx-zynqmp.c:273:5

The RX/TX FIFOs are created in xlnx_dp_init(), add xlnx_dp_finalize()
to destroy them.

Fixes: 58ac482 ("introduce xlnx-dp")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210323182958.277654-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
arichardson pushed a commit that referenced this issue Sep 1, 2023
g_hash_table_add always retains ownership of the pointer passed in as
the key. Its return status merely indicates whether the added entry was
new, or replaced an existing entry. Thus key must never be freed after
this method returns.

Spotted by ASAN:

==2407186==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020003ac4f0 at pc 0x7ffff766659c bp 0x7fffffffd1d0 sp 0x7fffffffc980
READ of size 1 at 0x6020003ac4f0 thread T0
    #0 0x7ffff766659b  (/lib64/libasan.so.6+0x8a59b)
    #1 0x7ffff6bfa843 in g_str_equal ../glib/ghash.c:2303
    #2 0x7ffff6bf8167 in g_hash_table_lookup_node ../glib/ghash.c:493
    #3 0x7ffff6bf9b78 in g_hash_table_insert_internal ../glib/ghash.c:1598
    #4 0x7ffff6bf9c32 in g_hash_table_add ../glib/ghash.c:1689
    #5 0x5555596caad4 in module_load_one ../util/module.c:233
    #6 0x5555596ca949 in module_load_one ../util/module.c:225
    #7 0x5555596ca949 in module_load_one ../util/module.c:225
    #8 0x5555596cbdf4 in module_load_qom_all ../util/module.c:349

Typical C bug...

Fixes: 9062912 ("module: use g_hash_table_add()")
Cc: qemu-stable@nongnu.org
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210316134456.3243102-1-marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants