Commits
multi-tcg
Name already in use
Commits on Jul 9, 2017
-
translate-all: do not hold tb_lock during code generation in softmmu
Each vCPU can now generate code with TCG in parallel. Thus, drop tb_lock around code generation in softmmu. Note that we still have to take tb_lock after code translation, since there is global state that we have to update. Nonetheless holding tb_lock for less time provides significant performance improvements to workloads that are translation-heavy. A good example of this is booting Linux; in my measurements, bootup+shutdown time of debian-arm is reduced by 20% before/after this entire patchset, when using -smp 8 and MTTCG on a machine with >= 8 real cores: Host: Intel(R) Xeon(R) CPU E5-2690 @ 2.90GHz Performance counter stats for 'qemu/build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=foobar.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel /foobar.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 8' (3 runs): Before: 28764.018852 task-clock # 1.663 CPUs utilized ( +- 0.30% ) 727,490 context-switches # 0.025 M/sec ( +- 0.68% ) 2,429 CPU-migrations # 0.000 M/sec ( +- 11.36% ) 14,042 page-faults # 0.000 M/sec ( +- 1.00% ) 70,644,349,920 cycles # 2.456 GHz ( +- 0.96% ) [83.42%] 37,129,806,098 stalled-cycles-frontend # 52.56% frontend cycles idle ( +- 1.27% ) [83.20%] 26,620,190,524 stalled-cycles-backend # 37.68% backend cycles idle ( +- 1.29% ) [66.50%] 85,528,287,892 instructions # 1.21 insns per cycle # 0.43 stalled cycles per insn ( +- 0.62% ) [83.40%] 14,417,482,689 branches # 501.233 M/sec ( +- 0.49% ) [83.36%] 321,182,192 branch-misses # 2.23% of all branches ( +- 1.17% ) [83.53%] 17.297750583 seconds time elapsed ( +- 1.08% ) After: 28690.888633 task-clock # 2.069 CPUs utilized ( +- 1.54% ) 473,947 context-switches # 0.017 M/sec ( +- 1.32% ) 2,793 CPU-migrations # 0.000 M/sec ( +- 18.74% ) 22,634 page-faults # 0.001 M/sec ( +- 1.20% ) 69,314,663,510 cycles # 2.416 GHz ( +- 1.08% ) [83.50%] 36,114,710,208 stalled-cycles-frontend # 52.10% frontend cycles idle ( +- 1.64% ) [83.26%] 25,519,842,658 stalled-cycles-backend # 36.82% backend cycles idle ( +- 1.70% ) [66.77%] 84,588,443,638 instructions # 1.22 insns per cycle # 0.43 stalled cycles per insn ( +- 0.78% ) [83.44%] 14,258,100,183 branches # 496.956 M/sec ( +- 0.87% ) [83.32%] 324,984,804 branch-misses # 2.28% of all branches ( +- 0.51% ) [83.17%] 13.870347754 seconds time elapsed ( +- 1.65% ) That is, a speedup of 17.29/13.87=1.24X. Similar numbers on a slower machine: Host: AMD Opteron(tm) Processor 6376: Before: 74765.850569 task-clock (msec) # 1.956 CPUs utilized ( +- 1.42% ) 841,430 context-switches # 0.011 M/sec ( +- 2.50% ) 18,228 cpu-migrations # 0.244 K/sec ( +- 2.87% ) 26,565 page-faults # 0.355 K/sec ( +- 9.19% ) 98,775,815,944 cycles # 1.321 GHz ( +- 1.40% ) (83.44%) 26,325,365,757 stalled-cycles-frontend # 26.65% frontend cycles idle ( +- 1.96% ) (83.26%) 17,270,620,447 stalled-cycles-backend # 17.48% backend cycles idle ( +- 3.45% ) (33.32%) 82,998,905,540 instructions # 0.84 insns per cycle # 0.32 stalled cycles per insn ( +- 0.71% ) (50.06%) 14,209,593,402 branches # 190.055 M/sec ( +- 1.01% ) (66.74%) 571,258,648 branch-misses # 4.02% of all branches ( +- 0.20% ) (83.40%) 38.220740889 seconds time elapsed ( +- 0.72% ) After: 73281.226761 task-clock (msec) # 2.415 CPUs utilized ( +- 0.29% ) 571,984 context-switches # 0.008 M/sec ( +- 1.11% ) 14,301 cpu-migrations # 0.195 K/sec ( +- 2.90% ) 42,635 page-faults # 0.582 K/sec ( +- 7.76% ) 98,478,185,775 cycles # 1.344 GHz ( +- 0.32% ) (83.39%) 25,555,945,935 stalled-cycles-frontend # 25.95% frontend cycles idle ( +- 0.47% ) (83.37%) 15,174,223,390 stalled-cycles-backend # 15.41% backend cycles idle ( +- 0.83% ) (33.26%) 81,939,511,983 instructions # 0.83 insns per cycle # 0.31 stalled cycles per insn ( +- 0.12% ) (49.95%) 13,992,075,918 branches # 190.937 M/sec ( +- 0.16% ) (66.65%) 580,790,655 branch-misses # 4.15% of all branches ( +- 0.20% ) (83.26%) 30.340574988 seconds time elapsed ( +- 0.39% ) That is, a speedup of 1.25X. Signed-off-by: Emilio G. Cota <cota@braap.org> -
tcg: enable per-thread TCG for softmmu
This allows us to generate TCG code in parallel. MTTCG already uses it, although the next commit pushes down a lock to actually perform parallel generation. User-mode is kept out of this: contention due to concurrent translation is more commonly found in full-system mode. This patch is fairly small due to the preparation work done in previous patches. Note that targets do not need any conversion: the TCGContext set up during initialization (i.e. where globals are set) is then cloned by the vCPU threads, which also double as TCG threads. I searched for globals under tcg/ that might have to be converted to thread-local. I converted the ones that I saw, and I wrote down the ones that I found are non-const globals that are only set at init-time: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr I was not sure about tci_regs. From code inspection it seems that they have to be per-thread, so I converted them, but I do not think anyone has ever tried to get MTTCG working with TCI. Signed-off-by: Emilio G. Cota <cota@braap.org> -
tcg: dynamically allocate from code_gen_buffer using equally-sized re…
…gions In preparation for having multiple TCG threads. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Signed-off-by: Emilio G. Cota <cota@braap.org> -
tcg: introduce tcg_context_clone
Before we make TCGContext thread-local. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
Will come in handy very soon. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg: distribute profiling counters across TCGContext's
TCGContext is about to be made thread-local. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. Note that this change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be concurrent accesses to them. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg: keep a list of TCGContext's
Before we make TCGContext thread-local. Once that is done, iterating over all TCG contexts will be quite useful; for instance we will need it to gather profiling info from each TCGContext. A possible alternative would be to keep an array of TCGContext pointers. However this option however is not that trivial, because vCPUs are spawned in parallel. So let's just keep it simple and use a list protected by a lock. Note that this lock will soon be used for other purposes, hence the generic "tcg_lock" name. Signed-off-by: Emilio G. Cota <cota@braap.org>
Commits on Jul 8, 2017
-
gen-icount: fold exitreq_label into TCGContext
Before we make TCGContext thread-local. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg: take .helpers out of TCGContext
Before TCGContext is made thread-local. The hash table becomes read-only after it is filled in, so we can save space by keeping just a global pointer to it. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg: take tb_ctx out of TCGContext
Before TCGContext is made thread-local. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org>
-
translate-all: report correct avg host TB size
Since commit 6e3b2bf ("tcg: allocate TB structs before the corresponding translated code") we are not fully utilizing code_gen_buffer for translated code, and therefore are incorrectly reporting the amount of translated code as well as the average host TB size. Address this by: - Making the conscious choice of misreporting the total translated code; doing otherwise would mislead users into thinking "-tb-size" is not honoured. - Expanding tb_tree_stats to accurately count the bytes of translated code on the host, and using this for reporting the average tb host size, as well as the expansion ratio. In the future we might want to consider reporting the accurate numbers for the total translated code, together with a "bookkeeping/overhead" field to account for the TB structs. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
translate-all: use a binary search tree to track TBs in TBContext
This is a prerequisite for having threads generate code on separate buffers, which will help scalability when booting multiple cores under MTTCG. For this we need a new field (.tc_size) in TranslationBlock to keep track of the size of the translated code. This field is added into a 4-byte hole that the previous commit created. In order to use glib's binary search tree we embed a helper struct in TranslationBlock to allow us to compare tb's based on their tc_ptr as well as their tc_size fields. We use an anonymous struct in TranslationBlock to minimize churn; the alternatives I can see are to (a) just add a comment and cross our fingers, (b) use -fms-extensions, and (c) embed the struct and update all calling code. I think using an anonymous struct is superior, but I can be persuaded otherwise. The comparison function we use is optimized for the common case: insertions. Profiling shows that upon booting debian-arm, 98% of comparisons are between existing tb's (i.e. a->size and b->size are both !0), which happens during insertions (and removals, but those are rare). The remaining cases are lookups. From reading the glib sources we see that the first key is always the lookup key. However, the code does not assume this to always be the case because this behaviour is not guaranteed in the glib docs. However, we embed this knowledge in the code as a branch hint for the compiler. Note that tb_free does not free space in the code_gen_buffer anymore, since we cannot easily know whether the tb is the last one inserted in code_gen_buffer. Performance-wise, lookups in tb_find_pc are the same as before: O(log n). However, insertions are O(log n) instead of O(1), which results in a small slowdown when booting debian-arm: Performance counter stats for 'build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel img/arm/aarch32-current-linux-kernel-only.img \ -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): - Before: 8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% ) 16,974 context-switches # 0.002 M/sec ( +- 0.12% ) 0 cpu-migrations # 0.000 K/sec 10,125 page-faults # 0.001 M/sec ( +- 1.23% ) 35,144,901,879 cycles # 4.367 GHz ( +- 0.14% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% ) 10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% ) 192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% ) 8.640869419 seconds time elapsed ( +- 0.57% ) - After: 8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% ) 17,016 context-switches # 0.002 M/sec ( +- 0.40% ) 0 cpu-migrations # 0.000 K/sec 18,769 page-faults # 0.002 M/sec ( +- 0.45% ) 35,660,956,120 cycles # 4.378 GHz ( +- 1.22% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% ) 10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% ) 195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% ) 8.828660235 seconds time elapsed ( +- 0.38% ) Signed-off-by: Emilio G. Cota <cota@braap.org> -
exec-all: move tb->invalid to the end of the struct
This opens up a 4-byte hole to be used by upcoming work. Note that moving this field to the 2nd cache line of the struct does not affect performance: tb->page_addr is in the 2nd cache line as well, and both are accessed during code lookup. Besides, the tb->invalid check is easily predicted. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
exec-all: shrink tb->invalid to uint8_t
To avoid wasting a byte. I don't have any use in mind for this byte, but I think it's good to leave this byte explicitly free for future use. See this discussion for how the u16 came to be: https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04564.html We could use a bool but in some systems that would take > 1 byte. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg/mips: constify tcg_target_callee_save_regs
Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg/i386: constify tcg_target_callee_save_regs
Signed-off-by: Emilio G. Cota <cota@braap.org>
-
translate-all: make have_tb_lock static
It is only used by this object, and it's not exported to any other. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
exec-all: fix typos in TranslationBlock's documentation
Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org>
-
tcg: fix corruption of code_time profiling counter upon tb_flush
Whenever there is an overflow in code_gen_buffer (e.g. we run out of space in it and have to flush it), the code_time profiling counter ends up with an invalid value (that is, code_time -= profile_getclock(), without later on getting += profile_getclock() due to the goto). Fix it by using the ti variable, so that we only update code_time when there is no overflow. Note that in case there is an overflow we fail to account for the elapsed coding time, but this is quite rare so we can probably live with it. "info jit" before/after, roughly at the same time during debian-arm bootup: - before: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 998 JIT cycles -615191529184601 (-256329.804 s at 2.4 GHz) translated TBs 302310 (aborted=0 0.0%) avg ops/TB 48.4 max=438 deleted ops/TB 8.54 avg temps/TB 32.31 max=38 avg host code/TB 361.5 avg search data/TB 24.5 cycles/op -42014693.0 cycles/in byte -121444900.2 cycles/out byte -5629031.1 cycles/search byte -83114481.0 gen_interm time -0.0% gen_code time 100.0% optim./code time -0.0% liveness/code time -0.0% cpu_restore count 6236 avg cycles 110.4 - after: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 1010 JIT cycles 1996899624 (0.832 s at 2.4 GHz) translated TBs 297961 (aborted=0 0.0%) avg ops/TB 48.5 max=438 deleted ops/TB 8.56 avg temps/TB 32.31 max=38 avg host code/TB 361.8 avg search data/TB 24.5 cycles/op 138.2 cycles/in byte 398.4 cycles/out byte 18.5 cycles/search byte 273.1 gen_interm time 14.0% gen_code time 86.0% optim./code time 19.4% liveness/code time 10.3% cpu_restore count 6372 avg cycles 111.0 Signed-off-by: Emilio G. Cota <cota@braap.org>
-
cputlb: bring back tlb_flush_count under !TLB_DEBUG
Commit f0aff0f ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
translate-all: remove redundant !tcg_enabled check in dump_exec_info
This check is redundant because it is already performed by the only caller of dump_exec_info -- the caller was updated by b7da97e ("monitor: Check whether TCG is enabled before running the "info jit" code"). Checking twice wouldn't necessarily be too bad, but here the check also returns with tb_lock held. So we can either do the check before tb_lock is acquired, or just get rid of it. Given that it is redundant, I am going for the latter option. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
Commit e7b161d ("vl: add tcg_enabled() for tcg related code") adds a check to exit the program when !tcg_enabled() while parsing the -tb-size flag. It turns out that when the -tb-size flag is evaluated, tcg_enabled() can only return 0, since it is set (or not) much later by configure_accelerator(). Fix it by unconditionally exiting if the flag is passed to a QEMU binary built with !CONFIG_TCG. Signed-off-by: Emilio G. Cota <cota@braap.org>
-
scripts: add "git.orderfile" for ordering diff hunks by pathname patt…
…erns When passed to git-diff (and to every other git command producing diffs and/or diffstats) with "-O" or "diff.orderFile", this list of patterns will place the more declarative / abstract hunks first, while changes to imperative code / details will be near the end of the patches. This saves on scrolling / searching and makes for easier reviewing. We intend to advise contributors in the Wiki to run git config diff.orderFile scripts/git.orderfile once, as part of their initial setup, before formatting their first (or, for repeat contributors, next) patches. See the "-O" option and the "diff.orderFile" configuration variable in git-diff(1) and git-config(1). Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Commits on Jul 6, 2017
-
Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170706…
…' into staging s390x/kvm/migration: fixes, enhancements and cleanups - new email address for Cornelia - Fixes: 3270, flic, virtio-scsi-ccw, ipl - Enhancements, cpumodel, migration # gpg: Signature made Thu 06 Jul 2017 08:18:19 BST # gpg: using RSA key 0x117BBC80B5A61C7C # gpg: Good signature from "Christian Borntraeger (IBM) <borntraeger@de.ibm.com>" # Primary key fingerprint: F922 9381 A334 08F9 DBAB FBCA 117B BC80 B5A6 1C7C * remotes/borntraeger/tags/s390x-20170706: hw/s390x/ipl: Fix endianness problem with netboot_start_addr virtio-scsi-ccw: use ioeventfd even when KVM is disabled s390x: return unavailable features via query-cpu-definitions s390x/MAINTAINERS: Update my email address s390x: fix realize inheritance for kvm-flic s390x: fix error propagation in kvm-flic's realize s390x/3270: fix instruction interception handler s390x: vmstatify config migration for virtio-ccw Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into…
… staging * qemu-thread portability improvement (Fam) * virtio-scsi IOMMU fix (Jason) * poisoning and common-obj-y cleanups (Thomas) * initial Hypervisor.framework refactoring (Sergio) * x86 TCG interrupt injection fixes (Wu Xiang, me) * --disable-tcg support for x86 (Yang Zhong, me) * various other bugfixes and cleanups (Daniel, Peter, Thomas) # gpg: Signature made Wed 05 Jul 2017 08:12:56 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (42 commits) target/i386: add the CONFIG_TCG into Makefiles target/i386: add the tcg_enabled() in target/i386/ target/i386: move TLB refill function out of helper.c target/i386: split cpu_set_mxcsr() and make cpu_set_fpuc() inline target/i386: make cpu_get_fp80()/cpu_set_fp80() static target/i386: move cpu_sync_bndcs_hflags() function tcg: add the CONFIG_TCG into Makefiles tcg: add CONFIG_TCG guards in headers exec: elide calls to tb_lock and tb_unlock tcg: move tb_lock out of translate-all.h tcg: add the tcg-stub.c file into accel/stubs/ vapic: use tcg_enabled monitor: disable "info jit" and "info opcount" if !TCG tcg: make tcg_allowed global cpu: move interrupt handling out of translate-common.c tcg: move page_size_init() function vl: add tcg_enabled() for tcg related code vl: convert -tb-size to qemu_strtoul configure: add --disable-tcg configure option configure: early test for supported targets ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commits on Jul 5, 2017
-
hw/s390x/ipl: Fix endianness problem with netboot_start_addr
The start address has to be stored in big endian byte order in the iplb.ccw block for the guest. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1499268345-12552-1-git-send-email-thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
virtio-scsi-ccw: use ioeventfd even when KVM is disabled
This patch is based on a similar patch from Stefan Hajnoczi - commit c324fd0 ("virtio-pci: use ioeventfd even when KVM is disabled") Do not check kvm_eventfds_enabled() when KVM is disabled since it always returns 0. Since commit 8c56c1a ("memory: emulate ioeventfd") it has been possible to use ioeventfds in qtest or TCG mode. This patch makes -device virtio-scsi-ccw,iothread=iothread0 work even when KVM is disabled. Currently we don't have an equivalent to "memory: emulate ioeventfd" for ccw yet, but that this doesn't hurt and qemu-iotests 068 can pass with skipping iothread arguments. I have tested that virtio-scsi-ccw works under tcg both with and without iothread. This patch fixes qemu-iotests 068, which was accidentally merged early despite the dependency on ioeventfd. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20170704132350.11874-2-haoqf@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
s390x: return unavailable features via query-cpu-definitions
The response for query-cpu-definitions didn't include the unavailable-features field, which is used by libvirt to figure out whether a certain cpu model is usable on the host. The unavailable features are now computed by obtaining the host CPU model and comparing it against the known CPU models. The comparison takes into account the generation, the GA level and the feature bitmaps. In the case of a CPU generation/GA level mismatch a feature called "type" is reported to be missing. As a result, the output of virsh domcapabilities would change from something like ... <mode name='custom' supported='yes'> <model usable='unknown'>z10EC-base</model> <model usable='unknown'>z9EC-base</model> <model usable='unknown'>z196.2-base</model> <model usable='unknown'>z900-base</model> <model usable='unknown'>z990</model> ... to ... <mode name='custom' supported='yes'> <model usable='yes'>z10EC-base</model> <model usable='yes'>z9EC-base</model> <model usable='no'>z196.2-base</model> <model usable='yes'>z900-base</model> <model usable='yes'>z990</model> ... Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> Message-Id: <1499082529-16970-1-git-send-email-mihajlov@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> -
s390x/MAINTAINERS: Update my email address
Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20170704092215.13742-2-cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
s390x: fix realize inheritance for kvm-flic
Commit f6f4ce4211 ("s390x: add property adapter_routes_max_batch", 2016-12-09) introduces a common realize (intended to be common for all the subclasses) for flic, but fails to make sure the kvm-flic which had its own is actually calling this common realize. This omission fortunately does not result in a grave problem. The common realize was only supposed to catch a possible programming mistake by validating a value of a property set via the compat machine macros. Since there was no programming mistake we don't need this fixed for stable. Let's fix this problem by making sure kvm flic honors the realize of its parent class. Let us also improve on the error message we would hypothetically emit when the validation fails. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Fixes: f6f4ce4211 ("s390x: add property adapter_routes_max_batch") Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Reviewed-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> -
s390x: fix error propagation in kvm-flic's realize
From the moment it was introduced by commit a2875e6f98 ("s390x/kvm: implement floating-interrupt controller device", 2013-07-16) the kvm-flic is not making realize fail properly in case it's impossible to create the KVM device which basically serves as a backend and is absolutely essential for having an operational kvm-flic. Let's fix this by making sure we do proper error propagation in realize. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Fixes: a2875e6f98 "s390x/kvm: implement floating-interrupt controller device" Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Reviewed-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> -
s390x/3270: fix instruction interception handler
Commit bab482d ("s390x/css: ccw translation infrastructure") introduced instruction interception handler for different types of subchannels. For emulated 3270 devices, we should assign the virtual subchannel handler to them during device realization process, or 3270 will not work. Fixes: bab482d ("s390x/css: ccw translation infrastructure") Reviewed-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
s390x: vmstatify config migration for virtio-ccw
Let's vmstatify virtio_ccw_save_config and virtio_ccw_load_config for flexibility (extending using subsections) and for fun. To achieve this we need to hack the config_vector, which is VirtIODevice (that is common virtio) state, in the middle of the VirtioCcwDevice state representation. This is somewhat ugly, but we have no choice because the stream format needs to be preserved. Almost no changes in behavior. Exception is everything that comes with vmstate like extra bookkeeping about what's in the stream, and maybe some extra checks and better error reporting. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Message-Id: <20170703213414.94298-1-pasic@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
target/i386: add the CONFIG_TCG into Makefiles
Add the CONFIG_TCG for frontend and backend's files in the related Makefiles. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
target/i386: add the tcg_enabled() in target/i386/
Add the tcg_enabled() where the x86 target needs to disable TCG-specific code. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>