Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretender beta #3

Merged
merged 52 commits into from
Sep 17, 2019
Merged

Pretender beta #3

merged 52 commits into from
Sep 17, 2019

Conversation

mariusmue
Copy link
Member

This PR has all changes done for pretender. However, it looks like the githistory got a bit messed up, containing commits from before divergence. Squash+Merge should do it.

@mariusmue mariusmue merged commit 77566ff into master Sep 17, 2019
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
When adding the generic PCA955xClass in commit 736132e, we
forgot to set the class_size field. Fill it now to avoid:

  (gdb) run -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  Starting program: ../../qemu/qemu/arm-softmmu/qemu-system-arm -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  double free or corruption (!prev)
  Thread 1 "qemu-system-arm" received signal SIGABRT, Aborted.
  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  (gdb) where
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  avatartwo#1  0x00007ffff75d8859 in __GI_abort () at abort.c:79
  avatartwo#2  0x00007ffff76433ee in __libc_message
      (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff776d285 "%s\n")
      at ../sysdeps/posix/libc_fatal.c:155
  avatartwo#3  0x00007ffff764b47c in malloc_printerr
      (str=str@entry=0x7ffff776f690 "double free or corruption (!prev)")
      at malloc.c:5347
  avatartwo#4  0x00007ffff764d12c in _int_free
      (av=0x7ffff779eb80 <main_arena>, p=0x5555567a3990, have_lock=<optimized out>) at malloc.c:4317
  avatartwo#5  0x0000555555c906c3 in type_initialize_interface
      (ti=ti@entry=0x5555565b8f40, interface_type=0x555556597ad0, parent_type=0x55555662ca10) at qom/object.c:259
  avatartwo#6  0x0000555555c902da in type_initialize (ti=ti@entry=0x5555565b8f40)
      at qom/object.c:323
  avatartwo#7  0x0000555555c90d20 in type_initialize (ti=0x5555565b8f40)
      at qom/object.c:1028

  $ valgrind --track-origins=yes qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479== Memcheck, a memory error detector
  ==77479== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
  ==77479== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
  ==77479== Command: qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479==
  ==77479== Invalid write of size 2
  ==77479==    at 0x6D8322: pca9552_class_init (pca9552.c:424)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)
  ==77479==  Address 0x583f108 is 0 bytes after a block of size 200 alloc'd
  ==77479==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==77479==    by 0x4AF8D30: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x844258: type_initialize.part.0 (object.c:306)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)

Fixes: 736132e ("hw/misc/pca9552: Add generic PCA955xClass")
Reported-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Message-id: 20200629074704.23028-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
tries to fix a leak detected when building with --enable-sanitizers:
./i386-softmmu/qemu-system-i386
Upon exit:
==13576==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1216 byte(s) in 1 object(s) allocated from:
    #0 0x7f9d2ed5c628 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5)
    avatartwo#1 0x7f9d2e963500 in g_malloc (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.)
    avatartwo#2 0x55fa646d25cc in object_new_with_type /tmp/qemu/qom/object.c:686
    avatartwo#3 0x55fa63dbaa88 in qdev_new /tmp/qemu/hw/core/qdev.c:140
    avatartwo#4 0x55fa638a533f in pc_pflash_create /tmp/qemu/hw/i386/pc_sysfw.c:88
    avatartwo#5 0x55fa638a54c4 in pc_system_flash_create /tmp/qemu/hw/i386/pc_sysfw.c:106
    avatartwo#6 0x55fa646caa1d in object_init_with_type /tmp/qemu/qom/object.c:369
    avatartwo#7 0x55fa646d20b5 in object_initialize_with_type /tmp/qemu/qom/object.c:511
    avatartwo#8 0x55fa646d2606 in object_new_with_type /tmp/qemu/qom/object.c:687
    avatartwo#9 0x55fa639431e9 in qemu_init /tmp/qemu/softmmu/vl.c:3878
    avatartwo#10 0x55fa6335c1b8 in main /tmp/qemu/softmmu/main.c:48
    avatartwo#11 0x7f9d2cf06e0a in __libc_start_main ../csu/libc-start.c:308
    avatartwo#12 0x55fa6335f8e9 in _start (/tmp/qemu/build/i386-softmmu/qemu-system-i386)

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200701145231.19531-1-alxndr@bu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
…tart

When the disconnect event is triggered in the connecting stage,
the tcp_chr_disconnect_locked may be called twice.

The first call:
    #0  qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:120
    avatartwo#1  0x000055555558e38c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    avatartwo#2  0x000055555558e3cd in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    avatartwo#3  0x000055555558ea32 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    avatartwo#4  0x000055555558eeb8 in qemu_chr_socket_connected (task=0x55555582f300, opaque=<optimized out>) at chardev/char-socket.c:1090
    avatartwo#5  0x0000555555574352 in qio_task_complete (task=task@entry=0x55555582f300) at io/task.c:196
    avatartwo#6  0x00005555555745f4 in qio_task_thread_result (opaque=0x55555582f300) at io/task.c:111
    avatartwo#7  qio_task_wait_thread (task=0x55555582f300) at io/task.c:190
    avatartwo#8  0x000055555558f17e in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1013
    avatartwo#9  0x0000555555567cbd in char_socket_client_reconnect_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1152
The second call:
    #0  0x00007ffff5ac3277 in raise () from /lib64/libc.so.6
    avatartwo#1  0x00007ffff5ac4968 in abort () from /lib64/libc.so.6
    avatartwo#2  0x00007ffff5abc096 in __assert_fail_base () from /lib64/libc.so.6
    avatartwo#3  0x00007ffff5abc142 in __assert_fail () from /lib64/libc.so.6
    avatartwo#4  0x000055555558d10a in qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:125
    avatartwo#5  0x000055555558df0c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    avatartwo#6  0x000055555558df4d in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    avatartwo#7  0x000055555558e5b2 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    avatartwo#8  0x000055555558e93a in tcp_chr_connect_client_sync (chr=chr@entry=0x55555582ee90, errp=errp@entry=0x7fffffffd178) at chardev/char-socket.c:944
    avatartwo#9  0x000055555558ec78 in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1035
    avatartwo#10 0x000055555556804b in char_socket_client_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1023

Run test/test-char to reproduce this issue.

test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.

Signed-off-by: Li Feng <fengli@smartx.com>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200522025554.41063-1-fengli@smartx.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
With a reconnect socket, qemu_char_open() will start a background
thread. It should keep a reference on the chardev.

Fixes invalid read:
READ of size 8 at 0x6040000ac858 thread T7
    #0 0x5555598d37b8 in unix_connect_saddr /home/elmarco/src/qq/util/qemu-sockets.c:954
    avatartwo#1 0x5555598d4751 in socket_connect /home/elmarco/src/qq/util/qemu-sockets.c:1109
    avatartwo#2 0x555559707c34 in qio_channel_socket_connect_sync /home/elmarco/src/qq/io/channel-socket.c:145
    avatartwo#3 0x5555596adebb in tcp_chr_connect_client_task /home/elmarco/src/qq/chardev/char-socket.c:1104
    avatartwo#4 0x555559723d55 in qio_task_thread_worker /home/elmarco/src/qq/io/task.c:123
    avatartwo#5 0x5555598a6731 in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:519
    avatartwo#6 0x7ffff40d4431 in start_thread (/lib64/libpthread.so.0+0x9431)
    avatartwo#7 0x7ffff40029d2 in __clone (/lib64/libc.so.6+0x1019d2)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200420112012.567284-1-marcandre.lureau@redhat.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
"tmp.tls_hostname" and "tmp.tls_creds" allocated by migrate_params_test_apply()
is forgot to free at the end of qmp_migrate_set_parameters(). Fix that.

The leak stack:
Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   avatartwo#1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   avatartwo#2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   avatartwo#3 0xaaaac52447fb in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1377)
   avatartwo#4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   avatartwo#5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c:165)
   avatartwo#6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   avatartwo#7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   avatartwo#8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   avatartwo#9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   avatartwo#10 0xaaaac55cae0f in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   avatartwo#11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   avatartwo#12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   avatartwo#13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef(/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   avatartwo#1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   avatartwo#2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   avatartwo#3 0xaaaac5244893 in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1382)
   avatartwo#4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   avatartwo#5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c)
   avatartwo#6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   avatartwo#7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   avatartwo#8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   avatartwo#9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   avatartwo#10 0xaaaac55cae0f in in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   avatartwo#11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   avatartwo#12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   avatartwo#13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef (/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: KeQian Zhu <zhukeqian1@huawei.com>
Reviewed-by: HaiLiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O
request until it has completed. ENOMEDIUM requests are special because
there is no BlockDriverState when the drive has no medium!

Define a .get_aio_context() function for BlkAioEmAIOCB requests so that
bdrv_aio_cancel() can find the AioContext where the completion BH is
pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM
requests!

libFuzzer triggered the following assertion:

  cat << EOF | qemu-system-i386 -M pc-q35-5.0 \
    -nographic -monitor none -serial none \
    -qtest stdio -trace ide\*
  outl 0xcf8 0x8000fa24
  outl 0xcfc 0xe106c000
  outl 0xcf8 0x8000fa04
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fb20
  write 0x0 0x3 0x2780e7
  write 0xe106c22c 0xd 0x1130c218021130c218021130c2
  write 0xe106c218 0x15 0x110010110010110010110010110010110010110010
  EOF
  ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7
  ide_reset IDEstate 0x56170a77a340
  Aborted (core dumped)

  (gdb) bt
  avatartwo#1  0x00007ffff4f93895 in abort () at /lib64/libc.so.6
  avatartwo#2  0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745
  avatartwo#3  0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546
  avatartwo#4  0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318
  avatartwo#5  0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422
  avatartwo#6  0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650
  avatartwo#7  0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360
  avatartwo#8  0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513
  avatartwo#9  0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483

Looking at bdrv_aio_cancel:

2728 /* async I/Os */
2729
2730 void bdrv_aio_cancel(BlockAIOCB *acb)
2731 {
2732     qemu_aio_ref(acb);
2733     bdrv_aio_cancel_async(acb);
2734     while (acb->refcnt > 1) {
2735         if (acb->aiocb_info->get_aio_context) {
2736             aio_poll(acb->aiocb_info->get_aio_context(acb), true);
2737         } else if (acb->bs) {
2738             /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
2739              * assert that we're not using an I/O thread.  Thread-safe
2740              * code should use bdrv_aio_cancel_async exclusively.
2741              */
2742             assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
2743             aio_poll(bdrv_get_aio_context(acb->bs), true);
2744         } else {
2745             abort();     <===============
2746         }
2747     }
2748     qemu_aio_unref(acb);
2749 }

Fixes: 02c50ef ("block: Add bdrv_aio_cancel_async")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Buglink: https://bugs.launchpad.net/qemu/+bug/1878255
Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200720100141.129739-1-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
To make deallocating partially constructed objects work, the
visit_type_STRUCT() need to succeed without doing anything when passed
a null object.

Commit cdd2b22 "qapi: Smooth visitor error checking in generated
code" broke that.  To reproduce, run tests/test-qobject-input-visitor
with AddressSanitizer:

    ==4353==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 16 byte(s) in 1 object(s) allocated from:
	#0 0x7f192d0c5d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
	avatartwo#1 0x7f192cd21b10 in g_malloc0 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51b10)
	avatartwo#2 0x556725f6bbee in visit_next_list qapi/qapi-visit-core.c:86
	avatartwo#3 0x556725f49e15 in visit_type_UserDefOneList tests/test-qapi-visit.c:474
	avatartwo#4 0x556725f4489b in test_visitor_in_fail_struct_in_list tests/test-qobject-input-visitor.c:1086
	avatartwo#5 0x7f192cd42f29  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f29)

    SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Test case /visitor/input/fail/struct-in-list feeds a list with a bad
element to the QObject input visitor.  Visiting that element duly
fails, and aborts the visit with the list only partially constructed:
the faulty object is null.  Cleaning up the partially constructed list
visits that null object, fails, and aborts the visit before the list
node gets freed.

Fix the the generated visit_type_STRUCT() to succeed for null objects.

Fixes: cdd2b22
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200716150617.4027356-1-armbru@redhat.com>
Tested-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
diagprov pushed a commit to diagprov/avatar-qemu that referenced this pull request Sep 29, 2020
It should be safe to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done

After some time, qemu-io will hang trying to drain, for example, like
this:

 avatartwo#3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
    util/aio-posix.c:600
 avatartwo#4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
    parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
 avatartwo#5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
 avatartwo#6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
 avatartwo#7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
 avatartwo#8 bdrv_open_inherit (filename=0x7fffba1563bc
    "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0,
    flags=24578, parent=0x0, child_class=0x0, child_role=0,
    errp=0x7fffba154620) at block.c:3491
 avatartwo#9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block.c:3513
 avatartwo#10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block/block-backend.c:421

And connection_co stack like this:

 #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
    action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
 avatartwo#1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
 avatartwo#2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
    io/channel.c:472
 avatartwo#3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
 avatartwo#4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
 avatartwo#5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
    "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
    io/channel.c:247
 avatartwo#6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:365
 avatartwo#7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:391
 avatartwo#8 nbd_start_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, structured_reply=true,
    zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
 avatartwo#9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
    nbd/client.c:1032
 avatartwo#10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
    block/nbd.c:1460
 avatartwo#11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
 avatartwo#12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
 avatartwo#13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
 #14 coroutine_trampoline (i0=113190480, i1=22000) at
    util/coroutine-ucontext.c:173

Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
mariusmue added a commit that referenced this pull request Feb 10, 2022
* ARM: added a configurable machine

* Small fixes

* Fixed compilation error

* Support for system-wide named semaphores

* Added support for message queues

* IPC: fixes permission on message queues and semaphores creation

* Message queues: non-blocking

Conflicts:
	include/qemu/thread-posix.h

* Avatar: Modifications to the IPC interface

Extracted from 6e31210a3c13f4b6db7d7db5262cc47278e1f693

* Started the memory forwarder device

* Configurable: entry address and endianness

* somewhat working remotememoryreadrequests

* RemoteMemory!

* Experimental removal of ram-resizable flag

* Small fixes on the configurable machine

* error handling in remote memory

* integrated change in qapi

* Hotfix: Don't bail out on failed rmr/rmw

* Fixed bug regarding entrypoint, removed load_kernel and set_endianness from configurable machine

* Cleaned up configurable machine and added MIPS-support

* Adjusted RMemory to hold pc

* bugfix in configurable machine

* Banked registers!

* move header files to include/hw/avatar

* Refactored avatar mqueueing

* starting on interrupt-injection

* Unclean hackish commit to be squashed later

* WIP: interrupt_exec and avatar_log

* added log_item for avatar

* minimal changes

* nvic-related changes (untested!)

* bugfixes

* Banked registers!

* nvic-write-forward update

* let\'s return xpsr and not cpsr

* Refactored avatar mqueueing

* new logging for interrupts

* move header files to include/hw/avatar

* minor fixes

* ignore/unignore irq returns

* not enabling - stuff should be fine

* additional interrupt notifications and acks

* bugfix

* Allow board init for more than armv7m

* format strings are hard

* Add set_nvic_base to qmp

* I meant vector table base, not nvic base

* fix compile on 18.04

* Update to comply with qemu-3.1

* cpu-model -> cpu-type

* make usage of the new ARM_CPU_TYPE_NAME macro

* temporary race condition hotfix: don't emit qapi_resume/stop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants