Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging interacts with tmxr devices and system timers #1108

Open
sethm opened this issue Jan 4, 2022 · 5 comments
Open

Debugging interacts with tmxr devices and system timers #1108

sethm opened this issue Jan 4, 2022 · 5 comments

Comments

@sethm
Copy link
Contributor

sethm commented Jan 4, 2022

  • Context

I am not able to characterize this issue very well, but there appears to be an undesirable interaction between debug logging and tmxr devices/system timers. The interaction only occurs when debug logging is enabled. It is characterized by the simulator running its clock more and more slowly (trending toward 0 instructions per second) whenever a telnet session to a tmxr MUX is connected. Dropping the telnet connection to the MUX causes the simulator to run at normal speed immediately. It seems to be 100% reproducible as long as debug logging is enabled, and not reproducible at all when debug logging is disabled.

  • the output of "sim> SHOW VERSION" while running the simulator which is having the issue

sim> show version
MicroVAX 3900 simulator V4.0-0 Current
    Simulator Framework Capabilities:
        64b data
        64b addresses
        Threaded Ethernet Packet transports:PCAP:TAP:VDE:NAT:UDP
        Idle/Throttling support is available
        Virtual Hard Disk (VHD) support
        RAW disk and CD/DVD ROM support
        Asynchronous I/O support (Lock free asynchronous event queue)
        Asynchronous Clock support
        FrontPanel API Version 12
    Host Platform:
        Compiler: GCC 9.3.0
        Simulator Compiled as C arch: x64 (Release Build) on Jan  3 2022 at 06:21:22
        Build Tool: simh-makefile
        Memory Access: Little Endian
        Memory Pointer Size: 64 bits
        Large File (>2GB) support
        SDL Video support: SDL Version 2.0.10, PNG Version 1.6.37, zlib: 1.2.11
        PCRE RegEx (Version 8.39 2016-06-14) support for EXPECT commands
        OS clock resolution: 1ms
        Time taken by msleep(1): 1ms
        Ethernet packet info: libpcap version 1.9.1 (with TPACKET_V3)
        OS: Linux motherbrain 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
        tar tool: tar (GNU tar) 1.30
        curl tool: curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
        git commit id: 0c7bf366
        git commit time: 2021-12-27T15:09:20-0800
  • how you built the simulator or that you're using prebuilt binaries

$ make vax
lib paths are: /lib/ /lib/i386-linux-gnu/ /lib/x86_64-linux-gnu/ /lib32/ /usr/lib/x86_64-linux-gnu/libfakeroot/ /usr/local/lib/ /usr/local/lib/x86_64-linux-gnu/ /usr/lib/
include paths are:  /usr/lib/gcc/x86_64-linux-gnu/9/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
using libm: /lib/x86_64-linux-gnu/libm.so
using librt: /lib/x86_64-linux-gnu/librt.so
using libpthread: /lib/x86_64-linux-gnu/libpthread.so /usr/include/pthread.h
using libpcre: /lib/x86_64-linux-gnu/libpcre.so /usr/include/pcre.h
using semaphore: /usr/include/semaphore.h
using libdl: /lib/x86_64-linux-gnu/libdl.so /usr/include/dlfcn.h
using libpng: /lib/x86_64-linux-gnu/libpng.so /usr/include/png.h
using zlib: /lib/x86_64-linux-gnu/libz.so /usr/include/zlib.h
using mman: /usr/include/x86_64-linux-gnu/sys/mman.h
using libSDL2: /usr/include/SDL2/SDL.h
using libpcap: /usr/include/pcap.h
using libvdeplug: /lib/libvdeplug.so /usr/include/libvdeplug.h
***
*** vax Simulator being built with:
*** - compiler optimizations and no debugging support. GCC Version: 9.3.0.
*** - dynamic networking support using Linux provided libpcap components.
*** - Local LAN packet transports: PCAP VDE TAP NAT(SLiRP)
*** - video capabilities provided by libSDL2 (Simple Directmedia Layer).
*** - Per simulator tests will be run.
***
*** git commit id is 0c7bf366aed46b14f749ccf3b1b0b7e54bb0bc47.
*** git commit time is 2021-12-27T15:09:20-0800.
***
gcc -std=gnu99 -U__STRICT_ANSI__  -O2 -finline-functions -fgcse-after-reload -fpredictive-commoning -fipa-cp-clone -fno-unsafe-loop-optimizations -fno-strict-overflow -DSIM_GIT_COMMIT_ID=0c7bf366aed46b14f749ccf3b1b0b7e54bb0bc47 -DSIM_GIT_COMMIT_TIME=2021-12-27T15:09:20-0800  -DSIM_COMPILER="GCC Version: 9.3.0" -DSIM_BUILD_TOOL=simh-makefile -I . -D_GNU_SOURCE -DUSE_READER_THREAD -DSIM_ASYNCH_IO  -DHAVE_PCRE_H -DHAVE_SEMAPHORE -DHAVE_SYS_IOCTL -DHAVE_LINUX_CDROM -DSIM_HAVE_DLOPEN=so -DHAVE_UTIME -DHAVE_LIBPNG -DHAVE_ZLIB -DHAVE_GLOB -DHAVE_SHM_OPEN  ./VAX/vax_cpu.c ./VAX/vax_cpu1.c ./VAX/vax_fpa.c ./VAX/vax_io.c ./VAX/vax_cis.c ./VAX/vax_octa.c  ./VAX/vax_cmode.c ./VAX/vax_mmu.c ./VAX/vax_stddev.c ./VAX/vax_sysdev.c ./VAX/vax_sys.c  ./VAX/vax_syscm.c ./VAX/vax_syslist.c ./VAX/vax_vc.c ./VAX/vax_lk.c ./VAX/vax_vs.c ./VAX/vax_2681.c ./PDP11/pdp11_rl.c ./PDP11/pdp11_rq.c ./PDP11/pdp11_ts.c ./PDP11/pdp11_dz.c ./PDP11/pdp11_lp.c ./PDP11/pdp11_tq.c ./PDP11/pdp11_xq.c ./PDP11/pdp11_vh.c ./PDP11/pdp11_cr.c ./PDP11/pdp11_td.c ./PDP11/pdp11_io_lib.c ./scp.c ./sim_console.c ./sim_fio.c ./sim_timer.c ./sim_sock.c ./sim_tmxr.c ./sim_ether.c ./sim_tape.c ./sim_disk.c ./sim_serial.c ./sim_video.c ./sim_imd.c ./sim_card.c -DVM_VAX -DUSE_INT64 -DUSE_ADDR64 -DUSE_SIM_VIDEO -I ./VAX -I ./PDP11 -DHAVE_PCAP_NETWORK -I/usr/include/ -DBPF_CONST_STRING -DUSE_SHARED -DHAVE_VDE_NETWORK -DHAVE_TAP_NETWORK -Islirp -Islirp_glue -Islirp_glue/qemu -DHAVE_SLIRP_NETWORK -DUSE_SIMH_SLIRP_DEBUG slirp/*.c slirp_glue/*.c -DHAVE_LIBSDL -DUSE_SIM_VIDEO `/usr/bin/sdl2-config --cflags` `/usr/bin/sdl2-config --libs` -o BIN/vax -lm -lrt -lpthread -lpcre -ldl -lpng -lz -lvdeplug -Wl,-R,/lib/ -L/lib/ 
cp BIN/vax BIN/microvax3900
BIN/vax RegisterSanityCheck /home/seth/Projects/simh/VAX/tests/vax-diag_test.ini </dev/null 
 Running internal register sanity checks on MicroVAX 3900 simulator.
*** Good Registers in MicroVAX 3900 simulator.

MicroVAX 3900 simulator V4.0-0 Current        git commit id: 0c7bf366
Running Hardware Core Test (EHKAA)

*** PASSED - MicroVAX 3900 Hardware Core Instruction test EHKAA
  • the simulator configuration file (or commands) which were used when the problem occurred.

SET CPU IDLE=ULTRIX
SET CPU 64M

ATTACH NVR ./nvram.bin

ATT DZ 9002

SET RQ0 RA92
SET RQ1 DIS
SET RQ2 DIS
SET RQ3 CDROM

ATTACH RQ0 d0.dsk
ATTACH -r RQ3 cd.iso

SET RL DISABLE
SET TS DISABLE
SET TQ DISABLE
SET XQ DISABLE

SET DZ DEBUG
SET DEBUG debug.log

BOOT CPU
  • the expected behavior and the actual behavior

When debug logging is disabled, the simulator runs at normal speed when users are connected via telnet to tmxr mux devices (for example, the DZ device in the VAX simulator). I expect that this should be true when debug logging is enabled as well, but when debug logging is turned on, I observe the following behavior:

  1. Console input and output slows down progressively until the console is unresponsive.
  2. The telnet-connected mux device is slow or unresponsive.
  3. The instructions per tick reported in "SHOW CLOCKS" decreases rapidly and trends toward 0.
  4. Dropping the telnet connection to the mux device, or disabling debug logging resolves the issue immediately.
  • you may also need to provide specific pointers to data files that may be necessary to demonstrate the problem

I have a pre-configured environment that can reproduce the issue, which is available here:

https://archives.loomcom.com/simh/simh-vax.tar.gz

The included "vax" binary was built on Linux Mint 20.3, but is not needed to reproduce the problem, any current build of the vax binary should work.

@markpizz
Copy link
Member

markpizz commented Jan 5, 2022

I'm glad you've gotten some combination of details which reproduces this problem.

Would you be willing to do a git bisect to see where this behavior first cropped up. I'd start with a commit about 5 years ago and bisect from there to now.

If you're not up to it, let me know and I'll get around to it...

@sethm
Copy link
Contributor Author

sethm commented Jan 5, 2022

I'd be happy to do that, my environment is already set up for it.

@sethm
Copy link
Contributor Author

sethm commented Jan 6, 2022

I did a git bisect and found the last good and first bad commit.

Last good commit: b0ac932
First bad commit: 9cb93a6

@markpizz
Copy link
Member

markpizz commented Jan 6, 2022

Thanks a million for doing that digging.

I'm sure you read the discussion in #957.

I'm very surprised that this is happening when writing debug output to a file since nothing explicitly opens the debug output file non-blocking which should be the only reason that fwrite() would set errno to EAGAIN. fwrite() is used in other places which absolutely assume that fwrite() wouldn't do that.

Since you've got this easily reproducible, in scp.c, will you try changing this:

while (len > 0) {
len_written = fwrite (buf, 1, len, f);
len -= len_written;
buf += len_written;
if (errno == EAGAIN) /* Non blocking file descriptor buffer full? /
sim_os_ms_sleep(10);/
wait a bit to retry */
errno = 0;
}

to:

while (len > 0) {
errno = 0;
len_written = fwrite (buf, 1, len, f);
len -= len_written;
buf += len_written;
if (errno == EAGAIN) /* Non blocking file descriptor buffer full? /
sim_os_ms_sleep(10);/
wait a bit to retry */
}

EAGAIN only appears once in scp.c.

Let us know if this fixes the problem.

Thanks.

  • Mark

@sethm
Copy link
Contributor Author

sethm commented Jan 6, 2022

Yes, that change does resolve the issue! Looks like a good fix.

markpizz added a commit that referenced this issue Jan 6, 2022
The original solution provided in response to #957 needs to be adjusted
to reflect that errno is only guaranteed to be set by fwrite() if an error
occurred. otherwise a non zero value may have been set by some other
call elsewhere in the program.  All other cases where errno is explicitly
check in simh, it is only done after receiving a return status from a
system call.

Fix problem reported in #1108
leobru pushed a commit to leobru/simh that referenced this issue Jan 23, 2022
The original solution provided in response to simh#957 needs to be adjusted
to reflect that errno is only guaranteed to be set by fwrite() if an error
occurred. otherwise a non zero value may have been set by some other
call elsewhere in the program.  All other cases where errno is explicitly
check in simh, it is only done after receiving a return status from a
system call.

Fix problem reported in simh#1108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants