-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building KLH10 on MacOS Ventura fails #2270
Comments
When I go and step through the mark.tcl script manually by running
|
@eswenson1, are you running KLH10 on a Mac? If so, anything to add here with regards to your success or lack thereof? |
I'm not currently. Let me try to build and run. Update: I did just try to do a
I tried repeatedly downloading klh10.tgz from hactrn.kostersitz.com, and each time, while the download appears to work perfectly fine, the .tgz has errors extracting. Perhaps it was not properly uploaded the last time? I'll try a full build, which, of course, is needed to test out this ticket. I was just curious to try the |
I'm failing in the same way as previously described:
|
I manually tried to start KLH10 (in the right directory with the right command-line parameters), and got this:
It hung at this point. So either there is something wrong with the files loaded into KLH10 (e.g. dskdmp.216bin or @.ddt-u), which I doubt, or the built KLH10 doesn't work any more when built on mac. I'll see if I can find an old klh10 I've built and run to see if it does better. |
When I start my full build manually with this |
Is it normal to get this error on startup of kn10-ks-its? I don't recall seeing it before.
I wonder if this is the cause? |
I'm getting the same issue as you are:
I suspect the inability to allocate memory (first error message on kn10-ks-its startup) is the cause. |
I guess that would be a problem. |
I am wondering if this has to do with the new memory protection features in MacOS. For example I had to codesign gdb to actually use it to get it to run and start debugging |
I dunno. I had to codesign gdb way before this problem started happening. |
Found this in the klh10 install.txt
maybe this is a red herring? |
The code for this lives in |
For some reason, I cannot run my old version of kn10-ks-its under gdb. So I'm unable to debug. I get this:
I'm hung after the [New Thread...] message from gdb. I never see any messages from kn10-ks-its. The "mark^[g" was a silly/vain attempt to see if it had started without messages and NSALV was waiting for input (it wasn't). |
I wonder if shared memory settings need to get updated. This is what I have:
|
I increased kern.sysv.shmmax to double that amount and I still get a failure. But the value emitted in the error message is still 4194304. I wonder if I have to reboot the system after changing the value? I thought you could simply do
So my change appears to take effect. Not sure why kn10-ks-is asking for 4194304. Can you tell from the code what it wants? |
It does look like kn10-ks-its is only wanting 4M, so the 4194304 value is what it is asking for. My "shmmax" value is double that, so the shmget call should succeed. Not sure why it isn't. Googling..... |
I found that once it crashes in the terminal window I have to start a new terminal session for the emulator to get to the prompt again. Something in that terminal session gets borked after the crash |
Can you use gdb to determine what the error code from the shmget failure is? That might provide a clue. |
For me, creating a new shell doesn't fix my inability to run kn10-ks-its under gdb. I never get to see any output from kn10-ks-its after gdb reports that a new thread is created. |
I may have to rebuild my kn10-ks-its (good idea anyway). I was able to attach to my kn10-ks-its process from gdb after it was started and see this:
I'll rebuild and retry. |
the memsiz is calculated in klh10.c |
this is what I get in gdb
|
Need to do a CONTINUE in GDB after attaching so it takes the input on the emulator side. |
Yeah, I did. I got:
and in the gdb session:
So no help. |
At least we are both seeing the same. |
I do not think it has to do with the amount of memory the emulator tries to allocate I changed it to just try 50% of the allocation. No change in behavior. |
What is the error code you’re getting from shmget? |
When I run under lldb on an M2 Mac, I get this:
So in addition to the shmget failure (which we've noted already), I'm getting errors setting up dsk0 too. Are others seeing this? And yes, I do have an rp0.dsk file at ../../out/klh10/rp0.dsk. |
Doesn't klh10 do disk i/o in subprocesses? If so, and it can't establish shared memory, that seems like the same kaboom. |
Two thoughts:
|
I think I ran ipcs and saw none created. My shmmax value is double the 4M that KLH10 is requesting. Someone (Mike) already tried reducing the amount of requested and it still fails. However Mike said that if the shared memory request fails, the code tries local memory — which appears to succeed, and KLH10 continues. However, for me, KLH10 can’t setup access to the disk (rp0.dsk) which may well be why it segfault a on first disk write. |
I see in the output and in the source where it falls back to local memory for general purposes, but I'm having trouble finding a similar thing for the RPXX. |
Not sure what I broke, but I can no longer build klh10 on my M2 Mac. I get linker errors:
Anyone have a clue why this might happen? It seems that it is only linking one object file (klh10.o). How do I find out how it is invoking ld? I tried adding to the command line, but that didn't help. |
Where does |
This thread is getting unwieldy though ..
|
I suspect it is defined by KLH10 in one of its sources and somehow my make is only linking the one object file. Probably screwed up adding “-g -Oo” to the compile phase. |
I've tracked down the cause of the
In other words, the shmget failure directly results in the disk initialization (and IMP initialization) failure. So we need to get to the bottom of the shmget failure. Note that there are two shmget failures -- the one reported earlier, where klh10 retries with local memory, and THIS shmget failure, where there is no local memory retry and where the failure directly causes a disk initialization failure. Also: when I run under macOS on my M2 mac, I don't get errno = 11 in my two shmget failures, but rather errno = 12 -- Cannot allocate memory. |
Ok, I resolved the shmget error. I did this:
This will change these settings for the current bootload. I'm not sure how to make the change permanent on macOS (there is no /etc/sysctl.conf file). |
In order to make the changes permanent, you have to create a PLIST. See this article for instructions: https://arc.net/l/quote/hghlubid |
Can we do a similar retry to get local memory there? |
Probably not. The main process communicates with the disk and network subprocesses through shared memory. |
I did manage to complete the ITS build and run the resulting system successfully after I increased the shared memory limits. Not really sure why the defaults weren't sufficient -- since we're trying to allocate an amount equal to the default limit, but I suspect that there is already some shared memory allocated, and therefore the amount requested by kn10-ks-its exceeds the total limit of the system. I had, at least one one machine, tried simply to double the maximum value -- and that didn't work. Allocated 16 times as much as the default did work, so I guess I should figure out the minimum value. That value, however, might be different for each person depending on the amount of shared memory the existing running programs are consuming. Also note that updating smhmax should be accompanied by a corresponding, scaled, value for shmall. |
So it turns out that you don't need to set the shared memory limit to as much as 64MB. 32MB works fine too. 16MB doesn't, however, and the default of 4MB, of course, doesn't work either. So the two commands to manually update the shared memory settings you need are:
To make these changes permanent, create the file /Library/LaunchDaemons/sysctl.plist with the following contents:
And then make sure it will get run on system boot by invoking:
|
We should add these instructions to the KLH10 documentation and to the ITS build documentation. @larsbrinkhoff: do you agree, and if so, where should this go? And we should say that macOS Ventura and later will need this fix. Earlier releases appear not to. |
A bit more info -- since people are wondering (on IRC) why we need this raised limit. The
After starting klh10 (as root, for network reasons), I see this:
Those entries for root are those for klh10. As you can see, we are allocating 4 shared memory regions. The sizes are: 4488, 4194304, 5416, and 4016. All told, that is 4,208,224 bytes, which is slightly over 4MB. Since the sysctl shared memory limits are per-system (all users), clearly, 4MB isn't enough. 16Mb isn't enough either due to the other processes using shared memory as well. However, I think there may be an issue with shared memory freeing in klh10. All those entries, for eswenson, in the ipcs output above do NOT correspond to processes that still exist. I suspect these are the old klh10 allocations that are not getting freed properly. Perhaps only on error exits. I'll have to wait until I can logout and log back in again, or reboot, (have too many active work-related things going on on my machine to reboot now). Then I'll check the shared memory segments to see if there are any allocated. Then I'll experiment with klh10 to see if they persist after various conditions. In any case, the default 4MB allocation size is NOT enough to allow a single KLH10 instance to start under Ventura or later. It MAY be that we don't need to go as high as 32MB -- that the only reason I needed 32MB was because some of the other shared memory segments were not freed when klh10 bombs out. More experimentation is needed. |
NATTCH of 0 seems diagnostic. If there are none after boot, it might be informative to see if emulator crash leaves different debris vs clean exit. If the above is typical, then a bit over 20 MB is probably enough. I don't recall the cost of raising the limit; it may or may not be worth getting too fancy with the instructions. |
FWIW: I tried the download and unpack from the link multiple times and it comes across fine. both on my Fiber and mobile 5G connection |
Good catch on that. Seems that all of those shared memory segments are detritus. And yes, we probably should cite (in the documentation/prerequisites) a real minimum. I'll try this out on a cleanly booted session and try to come up with a minimum value. But yes, 20MB is probably sufficient.. |
Yes, I agree instructions should be added. I think most of it should go in the KLH10 repository. Readme and doc update, and possibly some script that a user can run. Then the ITS repository could refer to that, and offer to run the script. |
Can you upload it some place? |
I created the plist file and ran the launchctl command as specified. The suggested command for "richer errors" complained with a Usage message. |
Any chance this is about System Integrity Proteciton? |
I'm now getting the same error as @rmaldersoniii. However, this may be because the sysctl plist is already loaded. I did this:
which shows that it is present. And then I did a:
and didn't get any errors. So I'd recommend doing the same two commands. And then running:
to see if you already have the two settings. Doing this after a reboot, of course, would confirm that the PLIST was executed on boot. It is possible that you need to enable this daemon as well:
And you can get detailed info on the daemon with:
This should provide you status information about the daemon and indicate success/failure of running it. |
The issue appears to be that shared memory segments, allocated by kn10-ks-its (or its child processes) are not always freed upon exit. You can use the I'm running a build with EMULATOR=klh10 now (on macOS Sonoma 14.6.1) and when I run Shared Memory:
m 65536 0x06f2f8a7 --rw------- eswenson staff eswenson staff 6 56 1229 1229 18:49:44 18:49:44 18:49:44
m 458753 0x00000000 --rw------- eswenson staff eswenson staff 1 4194304 15084 15087 9:56:14 9:56:52 9:56:14
m 458754 0x00000000 --rw------- eswenson staff eswenson staff 1 5416 15073 15079 9:55:39 9:55:39 9:55:31
m 786435 0x00000000 --rw------- eswenson staff eswenson staff 3 4016 15073 15077 9:55:38 9:55:39 9:55:31
m 786436 0x00000000 --rw------- eswenson staff eswenson staff 2 4194304 15093 15096 9:56:53 9:56:58 9:56:53
m 720901 0x00000000 --rw------- eswenson staff eswenson staff 1 5416 15084 15089 9:56:19 9:56:19 9:56:14
m 65542 0x00000000 --rw------- root wheel root wheel 0 5416 20712 20719 9:57:43 9:58:46 9:57:30
m 65543 0x00000000 --rw------- root wheel root wheel 0 4016 20712 20717 9:57:42 9:59:02 9:57:30
m 196616 0x00000000 --rw------- eswenson staff eswenson staff 1 4016 15084 15088 9:56:17 9:56:19 9:56:14
m 589833 0x00000000 --rw------- eswenson staff eswenson staff 2 4488 15093 15096 9:56:53 9:56:58 9:56:53
m 65546 0x00000000 --rw------- root wheel root wheel 0 5416 23998 24015 10:08:14 10:19:39 10:07:18
m 65547 0x00000000 --rw------- root wheel root wheel 0 4016 23998 24013 10:08:12 10:20:06 10:07:18
m 524300 0x00000000 --rw------- eswenson staff eswenson staff 1 5416 15093 15099 9:56:58 9:56:58 9:56:53
m 65549 0x00000000 --rw------- root wheel root wheel 0 5416 25893 25900 10:20:45 10:23:21 10:20:32
m 65550 0x00000000 --rw------- root wheel root wheel 0 4016 25893 25898 10:20:43 10:23:21 10:20:32
m 458767 0x00000000 --rw------- eswenson staff eswenson staff 1 4016 15093 15098 9:56:56 9:56:58 9:56:53
m 327711 0x00000000 --rw------- eswenson staff eswenson staff 1 4194304 15073 15076 9:55:31 9:56:11 9:55:31 All those shared memory segments of size 4194304 are from KLH10. Some of the smaller ones are too. It seems that kn10-ks-its has been run three times already (as part of the build), and there are three sets of shared memory allocations -- none of them cleaned up. |
After a quick browsing of the code, one theory is that The Best Fix (tm) would be to implement option (2) mentioned in dpsup.h. I believe/hope we now have reliable threads support "everywhere", no? I'm not sure about the amount of work needed, though... |
When I do
and results in three shared memory segments with NATTCH 0 and CPID/LPID for processes that don't exist (anymore). After updating shmall and shmmax using (from #2270 (comment)):
the build process completes, with no extra shared memory segments.
And why doesn't this happen in Linux? (Or does it, if you turn down the shmmax/shmall?) |
Logging an issue to keep track of the work here.
The build fails really quickly at the start
The text was updated successfully, but these errors were encountered: