Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running under cygwin #103

Closed
masinter opened this issue Dec 22, 2020 · 34 comments
Closed

Running under cygwin #103

masinter opened this issue Dec 22, 2020 · 34 comments

Comments

@masinter
Copy link
Member

Currently in maiko/cygwin2 branch:

  • Add to os_version *-*-cygwin*) echo cygwin ;;
  • copy makefile for Linux, add -DCYGWIN
  • ifdef out check for d_reclen because it isn't defined in cygwin (I suppose len== strlen(dp->d_name) would have been better (in dsk.c )
  • ifdef out fcntl in timer.c

now it compiles, but dies when run:

$ ./run-medley -full
sysout is /cygdrive/c/Users/Larry/home/ilisp/medley/loadups/xfull35.sysout
running /cygdrive/c/Users/Larry/home/ilisp/maiko/cygwin.x86_64/lde -g 1440x900 -sc 1440x900
start /cygdrive/c/Users/Larry/home/ilisp/medley/greetfiles/SIMPLE-INIT
Failed to find UNIXCOMM file handles; no processes
./run-medley: line 159:  2091 Segmentation fault      (core dumped) $prog $geometry $screensize -t "Medley Interlisp" $passthrough_args "$LDESRCESYSOUT"

waywardmonkeys added a commit to waywardmonkeys/maiko that referenced this issue Dec 22, 2020
This code was treating the `d_reclen` incorrectly from some of the
original old porting work. It is not the same as `d_namlen` and
`d_namlen` is not widely supported or standard. It should be using
`strlen(d_name)`.

This addresses an issue mentioned in Interlisp/medley#103.
@waywardmonkeys
Copy link

Submitted PR Interlisp/maiko#128 to address the d_reclen issue.

@masinter masinter added the maiko label Dec 23, 2020
nbriggs pushed a commit to Interlisp/maiko that referenced this issue Dec 23, 2020
This code was treating the `d_reclen` incorrectly from some of the
original old porting work. It is not the same as `d_namlen` and
`d_namlen` is not widely supported or standard. It should be using
`strlen(d_name)`.

This addresses an issue mentioned in Interlisp/medley#103.
@masinter
Copy link
Member Author

i spent some time with gdb documentation and stepping through main in gdb but couldn't get by the execvp
Changes to compile in cygwin3 branch

@nbriggs
Copy link
Contributor

nbriggs commented Dec 26, 2020

You appear to have merged your cygwin3 branch into master without telling anyone?

@nbriggs
Copy link
Contributor

nbriggs commented Dec 26, 2020

See https://jeffkreeftmeijer.com/git-rebase/ for an explanation of what you probably wanted to do (git rebase, to re-apply your local branch changes over a copy of the updated master branch, without merging your experiment back into master)

@masinter
Copy link
Member Author

time for me to go back through the git tutorials

@waywardmonkeys
Copy link

The current state of this is that it crashes on master, right?

Do you have a stack trace?

@masinter
Copy link
Member Author

cygwin-gdb.txt

@waywardmonkeys
Copy link

So, that's stack corruption it seems ...

Looking at the code, we see something that seems dangerous:

* sys_size is sysout size in megabytes */
int sysout_loader(char * sysout_file_name, int sys_size)
{
  int sysout; /* SysoutFile descriptor */

  IFPAGE ifpage; /* IFPAGE */

  char *fptovp_scratch; /* scratch area for FPTOVP */
#ifdef BIGVM
  /* 1 "cell" per page for 256MB in 512 byte pages */
  unsigned int fptovp[0x80000]; /* FPTOVP */
#else
  DLword fptovp[0x10000]; /* FPTOVP */
#endif                /* BIGVM */
  long fptovp_offset; /* FPTOVP start offset */

That's quite a decently sized stack allocation there for FPTOVP.

You could try telling Cygwin to give it a bigger stack size via adding this to the gcc command line:

-Wl,--stack,33554432

Alternatively, you could malloc that and make sure it gets freed in the exit path (or just turn it into a malloc and leak it to see if that makes the crash go away...).

@masinter
Copy link
Member Author

Now it compiles ok and doesn't crash and runs the opcode tests in test.vm, but doesn't recognize keypress or mouse clicks (same problem as with running under WSL1)

@waywardmonkeys
Copy link

Which fix did you apply to prevent the crash?

@masinter
Copy link
Member Author

masinter commented Dec 29, 2020

I take back what I said:

I think I just compiled it with gcc instead of clang. I wasn't sure now to apply either.

I also added -Wl,--stack,33554432 to the gcc line

@masinter
Copy link
Member Author

yes, fixes segv on startup. Now it's just hanging because it doesn't get mouse clicks or keystrokes

@nbriggs
Copy link
Contributor

nbriggs commented Feb 12, 2021

Good news. Presumably that's after recompiling without the 32MB stack option.
I'm trying to come up with a principled way of getting the X events processed.

@masinter
Copy link
Member Author

Steve reported having problems with running on WSL2 in his configuration -- screen comes up but no characters recognized. Is it possible to fork a little process to act as a keystroke forwarder that waits for the keyboard and sends events whenever it gets a keystroke or mouse move?

@nbriggs
Copy link
Contributor

nbriggs commented Apr 30, 2021

We could use some documentation on what WSL2 (or cygwin) supports for polled or interrupt driven I/O, and then we can try to fix it. I don't have access to a WSL2 system (or one with cygwin), so there's not much I can do, or suggestions I can make to those who do, without some input...

@devhawala
Copy link

Hello,

cygwin apparently does not support asynchronous i/o or a virtual timer, as the callback routines int_io_service resp. int_timer_service in timer.c are never called, so for example when pressing the right mouse button, the menu appears but then the system is stuck. (networking does not work either, as incoming packets are not noticed)

As work-around in my dodo-nethub-support branch of Maiko, i added simple count-down mechanism to the instruction dispatcher (dispatch in xc.c): the compile variable CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN defines the interval as number of instructions for simulating an external i/o interrupt, so the interrupt checking machinery is fired in more or less regular intervals.

This variable is set in makefile-cygwin.x86_64-x (and only there) to 20000. To prevent compile errors, enabling async i/o in ether.c for the Nethub socket is made dependent of the presence of O_ASYNC (which is missing in Cygwin)

So i have now a working version of Maiko running natively on Windows10 as Cygwin executable which can do networking with Dodo XNS services and uses a local X11 server (VcXsrv 1.20.14.0). The Maiko process behaves normally in my understanding, the overall CPU load of the system is around 1-3% when the Medley environment is idle and goes up to ~15% (100% on about 1 out of 8 logical CPUs) when the right mouse button is pressed in Medley. So even if i/o is checked all 20.000 instructions, only minimal CPU resources are consumed when the Medley system is idle and only one CPU goes to 100% while a context menu is shown.

However networking is a bit "lethargic" compared to async i/o on Linux, but it is usable. Maybe using a lower value for CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN could make it more snappy, but this value must be adjusted to the local hardware anyway...

Greetings, Hans

@nbriggs
Copy link
Contributor

nbriggs commented Oct 1, 2022

The configuration information that is specific to all instances of an OS like cygwin should be incorporated in the inc/maiko/platform.h -- we have carefully removed most of the system specific defines from the makefile fragments and I'd rather not move backwards there, if you don't mind.

Since there are likely to be other systems where simulating the async timer would be useful, I would suggest that the #define not be CYGWIN_... but rather MAIKO_EMULATE_TIMER_INTERRUPTS (or similar) and look at how to integrate it should other systems need it.

Oddly, cygwin defines FASYNC rather than O_ASYNC -- not sure what's going on there.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022 via email

@devhawala
Copy link

Here some intermediate results:

  • setting the timer type to ITIMER_REAL for setitimer breaks the program startup: the connection to the Nethub is opened, the main X-window opens and the UI is shown, but then the programs dies very quickly without further information (no console output at all); i tried it with in normal mode as well as elevated mode (i.e. with or without admin rights, the Windows equivalent to running as root or with setuid), so no idea what's wrong
    so ITIMER_REAL cannot replace ITIMER_VIRTUAL (may have worked up to Windows-XP resp. before Windows-Vista)
  • FASYNC is present in the cygwin libs, but does not help, incoming TCP/IP packets keep being unnoticed

Meaning the work-around simulating timer and i/o interrupts will be needed for the cygwin version.

So next things will be incorporating the Addr68k/NativeAlignment{24} changes and moving cygwin-specific dependencies to platform.h

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

I suspect that running with ITIMER_REAL may be causing an interrupt before the rest of the system is ready for it. I'll check out the situation with ITIMER_REAL on other systems and see if it's debuggable there.

I also notice that looks as though you're making the ether fd async before the async signal handling is set up -- it can't be made to generate interrupts before int_io_init() is called -- which in the USE_DLPI case of raw ethernet is where the I_SETSIG is applied to the ether fd.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

OK -- the issue is probably that if you change to ITIMER_REAL you have to have a signal handler for SIGALRM instead of a handler for SIGVTALRM... did you change that too?

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

Yes... with ITIMER_REAL and SIGALRM it works the same on macOS as it does with ITIMER_VIRTUAL/SIGVTALRM.

@devhawala
Copy link

well ... not on cygwin: the active section of int_timer_init() is as follows in my local test version, but defining USE_ITIMER_REAL in platform.h for cygwin lets ldex abort the startup (not defining it works ok):

  struct itimerval timert;
  struct sigaction timer_action;

  timer_action.sa_handler = int_timer_service;
  sigemptyset(&timer_action.sa_mask);
  timer_action.sa_flags = SA_RESTART;
  
#ifdef USE_ITIMER_REAL
  int intAlarm = SIGALRM;
  int intTimer = ITIMER_REAL;
  printf("** using ITIMER_REAL for timer\n");
#else
  int intAlarm = SIGVTALRM;
  int intTimer = ITIMER_VIRTUAL;
#endif

  if (sigaction(intAlarm, &timer_action, NULL) == -1) {
    perror("sigaction: SIGVTALRM");
  }

  /* then attach a timer to it and turn it loose */
  timert.it_interval.tv_sec = timert.it_value.tv_sec = 0;
  timert.it_interval.tv_usec = timert.it_value.tv_usec = TIMER_INTERVAL;
  setitimer(intTimer, &timert, NULL);

@devhawala
Copy link

for "making the ether fd async": i removed that from connectToHub() as it also happens in the SUBRs ether_resume() and ether_ctrlr(), which both should be called after the Lisp system is up (and it still works of course)

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

You changed all the instances?

timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:/*	Error handling routine for SIGALRM.  Called when any		*/
timer.c:/*	Set up the signal handler for SIGALRM, to catch TIMEOUTs:	*/
timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
unixfork.c:  sigaddset(&signals, SIGALRM);
unixfork.c:  sigaddset(&signals, SIGALRM);

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

Inspecting the code, I believe the timer emulation code is going to fail if you compile with OPDISP.

@devhawala
Copy link

no, not all instances, but i think there was no need for:

  • timer.c / int_timer_init(): this is the code snippet in my previous post (ok, the perror() still says "SIGVTALRM", but it's only a log message)
  • timer.c / int_block(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here
  • timer.c / int_unblock(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here
  • timer.c / int_file_init(): sigaddset is only done for SIGALRM, so nothing changed here
  • unixfork.c / fork_Unix(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here

These were the only instances of the string "ALRM" that i found (except for comments or strings) in those files.

for the OPDISP: is this still used?
Scanning with grep, it seems that is was used for the sunos-variants (Makefiles in bin/legacy) and possibly with DOS, but it is not defined in platform.h or in any Makefiles in bin.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

OPDISP with the threaded dispatch is new. It improves the performance.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

OPDISP is not turned on by default yet, when we do turn it on, that could be done for systems other than Cygwin.

BTW -- where do you want bug reports for Dodo services submitted? The ECHO XIP packets that are returned by, for example, the file server implementation, don't match the echo packets it receives (missing the data, 2 bytes shorter than the one original sent)

@nbriggs
Copy link
Contributor

nbriggs commented Oct 2, 2022

See the commit log for

commit b7e9529322269553d6c7bdd4741677270959cf09
Author: Bruce Mitchener <bruce.mitchener@gmail.com>
Date:   Sat Jan 23 03:28:16 2021 +0700

We replaced the assembler fast dispatch with gcc's (and clang's) computed goto extension.

@devhawala
Copy link

bug reports please directly to issues in the Dodo project.

for OPDISP dispatch: if there is no common code segment where all instructions come through (i get lost each time i try to follow the #define and goto maze in xc.c), then timer emulation may be excluded from fast dispathing. No problem, Maiko seems fast enough for me as it is now...

@nbriggs
Copy link
Contributor

nbriggs commented Oct 3, 2022

Missing include for <netinet/in.h> prevents it compiling on FreeBSD.

clang -m32 -std=gnu99 -fno-strict-aliasing -c -O2 -g -DMAIKO_ENABLE_NETHUB -I/usr/local/include -DXWINDOW -DRELEASE=351 -I../inc/ -I../include/ ../src/ether.c -o ../freebsd.386-x/ether.o
../src/ether.c:260:22: error: variable has incomplete type 'struct sockaddr_in'
  struct sockaddr_in hubaddr;
                     ^
../src/ether.c:260:10: note: forward declaration of 'struct sockaddr_in'
  struct sockaddr_in hubaddr;
         ^
1 error generated.

@devhawala
Copy link

added the include and just created the merge request Dodo Nethub support #445

@masinter
Copy link
Member Author

masinter commented Mar 8, 2023

Running with cygwin works fine now.
It would probably be useful for running on older Windows systems.

I am closing this issue -- is it worth a mention in documentation or a special build ?

@masinter masinter closed this as completed Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants