Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs-uae hang on macos sonoma #243

Closed
terriblefire opened this issue Dec 2, 2023 · 46 comments
Closed

fs-uae hang on macos sonoma #243

terriblefire opened this issue Dec 2, 2023 · 46 comments

Comments

@terriblefire
Copy link

Clean MacOS install on Sonoma. Installed the extension and try to run a clean newly "inited" project. Debugger hangs forever before any screen activity is shown.

If i compile a clean fs-uae and ln -s it in place it runs but quits because it cannot connect to the debugger. are the source level changes available somewhere for fs-uae so i can see if its just a binary thats no longer compatible?

@BartmanAbyss
Copy link
Owner

yep, the modified FS-UAE github repo is linked from the README file

@terriblefire
Copy link
Author

terriblefire commented Dec 2, 2023 via email

@terriblefire
Copy link
Author

terriblefire commented Dec 5, 2023

Ok looks like deadlocks are happening randomly on the FS-UAE on macos. I'm sure people must be seeing this randomly and having to kill the fs-uae app and restart. My best guess is that there is a race condition between the debug features and some other part of the code. There are so many differences between base FS-UAE and that repo its tough to trace. I spent a couple of days trying to figure out where the hang occured but it seems random. I can move the hang around by adding delays. So my guess is a timer triggers causing it.

I raised an issue on @grahambates's repo but he's since disabled issue tracker for the repo. So i guess that means its not getting fixed. TBH i'd remove MacOS support if there is no intention to fix this.

@grahambates
Copy link
Contributor

I haven't deliberately disable the issue tracker, I think that just happens by default on forks. I will look into this. Please don't remove support!

@grahambates
Copy link
Contributor

For context, the reason for the large divergence from the base FS-UAE is because I backported changes to bring it in line with current WinUAE. This is necessary because the profiler relies on internal data structures, which have changed significantly since WinUAE 4.2, which is where the official FS-UAE build is at.

FWIW I use this extension frequently on MacOS Ventura without encountering this issue. I'll try to get to the bottom of it.

@terriblefire
Copy link
Author

OK it just seemed odd the issues tab vanished after i raised an issue so i made an assumption.

I will help you figure the issue out and if Bartman is happy we can use this thread to track the issue.

FWIW i didnt see the issue on Monteray and i only see it intermittently on my iMac 2020. I see it every time on my macbook pro. I never used Ventura i jumped from Monteray to Sonoma.

I dont know if its relevant but i have a "normal" FS-UAE installed as an App.

@grahambates
Copy link
Contributor

Looks like you opened the issue on the upstream repo FrodeSolheim/fs-uae#339. I've enabled issue tracking on my fork now, so you can open it there or just continue the discussion here.

@terriblefire
Copy link
Author

Looks like you opened the issue on the upstream repo FrodeSolheim/fs-uae#339. I've enabled issue tracking on my fork now, so you can open it there or just continue the discussion here.

Opps. Happy to continue here as if users see the issue they are more likely to look here.

@grahambates
Copy link
Contributor

I'm going to try to get hold of a machine running Sonoma. Unfortunately my Mac is a work machine with OS updates locked down so I can't go past 13.6.1.

@terriblefire
Copy link
Author

terriblefire commented Dec 6, 2023 via email

@grahambates
Copy link
Contributor

Thanks, I might take you up on that. I'll let you know.

@terriblefire
Copy link
Author

Digging a bit more...

psynch_cvwait(0x60000185AB50, 0x8D6601008D6700, 0x0) = -1 Err#316
psynch_cvwait(0x7FCCEEA2FB78, 0x1D4101001D4200, 0x1D4100) = 0 0

I get a bunch of these when it hangs. I suspect a deadlock of some kind.

@terriblefire
Copy link
Author

terriblefire commented Dec 9, 2023

@grahambates Ok i cannot yet tell you why this is happening but i know what is happening..

Basically after a lot of tracing i found that we're getting stuck in an infinite loop here..

https://github.com/grahambates/fs-uae/blob/remote_debugger_barto/src/vm.cpp#L180

Log except...

[UAE] [000] VM: trying 0x1d62ac000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62ad000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62ae000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62af000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b0000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b1000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b2000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b3000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b4000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b5000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b6000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b7000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b8000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62b9000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62ba000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62bb000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62bc000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62bd000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62be000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62bf000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c0000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c1000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c2000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c3000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c4000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c5000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c6000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c7000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c8000 step is 0x1000 = 0x25034e000
[UAE] [000] VM: trying 0x1d62c9000 step is 0x1000 = 0x25034e000

@terriblefire
Copy link
Author

terriblefire commented Dec 9, 2023

@grahambates i've figured it out. Your code assumes a 32bit address space. If macos allocates an address above this area your code discards the address.

else if (((uintptr_t) address) + size > (uintptr_t) 0xffffffff) {
munmap(address, size);
address = NULL;
}

You need a better strategy here if you need an address below 4G.

@grahambates
Copy link
Contributor

Awesome, thanks so much for getting to the bottom of this. I'll make a start on a fix.

@terriblefire
Copy link
Author

Not gonna lie that was a nasty one to find.

@terriblefire
Copy link
Author

DIgging a bit more I suspect -no-pie is probably no longer working in Sonoma. Do not know that for sure though its a guess.

@grahambates
Copy link
Contributor

Yeah I think it must be something like that, because I hadn't actually touched that part of the code. It's present in FS-UAE since v2.7.

If the official version works fine then it could be my build setup. I did have to use -std=c++17 in order to include Bartman's debugger code, and there's a chance the official build uses GCC rather than Clang.

@terriblefire
Copy link
Author

terriblefire commented Dec 9, 2023

I doubt this its your direct build setup because if i switch between master and your branch i see the issue come and go.

I can see that you have made changes to the vm.cpp file. Is there any functional change there

vm.patch

However if i just patch this into your branch it has the same issue. So perhaps some other part of your build has changed.

@grahambates
Copy link
Contributor

The changes will be upstream stuff from WinUAE. I basically just took a diff of WinUAE 4.2 to current and applied it to FS-UAE, manually fixing problems as I went. Tbh I wish I'd done this incrementally through minor versions, rather than as a single massive patch as it might be easier to track down where problems have been introduced. I did start redoing it that way and I think this would be worthwhile to complete.

@grahambates
Copy link
Contributor

Just trying to get my head around the logic:

My understanding is that it only follows this code path if the UAE_VM_32BIT flag is set, to deliberately prevent the VM from allocating memory outside of a 32 bit address space.

It looks like this flag should be getting unset in mman.cpp when not using the Jit compiler (which is off by default), so I'm not too sure why it's set.

	if (!g_fs_uae_jit_compiler) {
		/* Not using the JIT compiler, so we do not need "32-bit memory". */
		vm_flags &= ~UAE_VM_32BIT;
	}

It keeps trying to allocate until it gets an address within range. One thing I do notice is the code is trying to prevent an infinite loop:

		uae_u8 *p_end = natmem_reserved - size;
// ...
		int step = uae_vm_page_size();
// ...
		while (address == NULL) {
			if (p > p_end) {
				break;
			}
// ...
			p += step;
		}

I wonder why this isn't breaking...

@grahambates
Copy link
Contributor

It's not breaking because natmem_reserved is zero...

@terriblefire
Copy link
Author

when mmap() allocates an address above 4Gb, the code unmaps it. the next mmap() just gets the same address or higher... it hangs forever.

@grahambates
Copy link
Contributor

Yeah I can see that it will never succeed. The code seems to be designed to break after a specified number of steps though, but that behaviour is broken because natmem_reserved is zero and therefore p_end is negative. This is the case on master too, so this isn't what's changed.

@terriblefire
Copy link
Author

looking at your branch only win32 code sets natmem_reserved. Maybe a hangover from WinUAE?

@grahambates
Copy link
Contributor

Yeah I'm pretty sure the natmem_reserved thing is an existing bug in FS-UAE, and is just causing this to fail more spectacularly than it would otherwise when the allocation inevitably fails.

I'm more confused how the 32 bit allocation could ever work at all, than why it doesn't work on my fork!

From what I can see, when JIT is not enabled, the only place that allocates with the UAE_VM_32BIT flag is set is init_fpucw_x87(). I'm sure if/why it's needed, but the emulator still runs if I remove the flag. I could just be that I'm getting lucky with where it get allocated though! I'm wondering if this could be a quick fix to just disable it.

Do you think you could give this a try in src/od-win32/fpp_native_msvc_80bit.cpp?

	x87_fldcw_code = (uae_u8 *)uae_vm_alloc(uae_vm_page_size(), 0, UAE_VM_READ_WRITE_EXECUTE);

Be aware that some files in od-win32 are used on all platforms because... who knows why?!

@terriblefire
Copy link
Author

terriblefire commented Dec 9, 2023

Your patch didnt help but i've since determined that a call to alloc vm memory happens in you branch before preinit_shm() is called (and therefore natmem_reserved is set). I think this is the issue.

Yes indeed. The master branch calls preinit_shm (); in real_main2 and sets up these variables. This is the root cause.

Update: Simply putting preinit_shm() back causes a hard crash. So i guess this was removed for this reason without all the consequences being explored?

@grahambates
Copy link
Contributor

Yeah either that or it got lost resolving a conflict in the WinUAE patch where there's FSUAE specific code. In current WinUAE the call to preinit_shm in main is commented out, and there's a different call in od-win32/win32.cpp.

@terriblefire
Copy link
Author

At this point you should be able to assert that natmem_reserved != NULL in vm.cpp as a means to test if this is working or not.

@grahambates
Copy link
Contributor

Can you let me know if this works for you?
patch.txt

@terriblefire
Copy link
Author

terriblefire commented Dec 10, 2023

This fixes the hang in the sense the application exits in the cases where it would have hung before.

Whereas the standard FS boots every time.

EDIT: Also turning on console logging with this patch causes a hard crash.

Ah the crash comes from turning on "TRACK_ALLOCATIONS".

@grahambates
Copy link
Contributor

Right, so we've fixed a problem with the error handling, but not the prevented the error. It's progress I guess!

@terriblefire
Copy link
Author

i actually think this is dumb luck that standard FSUAE doesnt hit the problem. The first allocation size it makes is 4096, yours is 256. This code is pretty broken IMHO in that it will always get the same address if it hits an issue.

@grahambates
Copy link
Contributor

Yeah it seems to be relying on the fact that setting the addr arg on mmap will result in a low address, but there's really no guarantee of this from what I can see:

If addr is not NULL, then the
kernel takes it as a hint about where to place the mapping; on
Linux, the kernel will pick a nearby page boundary (but always
above or equal to the value specified by
/proc/sys/vm/mmap_min_addr) and attempt to create the mapping
there. If another mapping already exists there, the kernel picks
a new address that may or may not depend on the hint.

@grahambates
Copy link
Contributor

This is interesting https://stackoverflow.com/questions/67085329/allocating-on-a-32-bit-address-in-macos

There is a path in the code for native support for this:

#ifdef HAVE_MAP_32BIT
		address = mmap(0, size, mmap_prot, mmap_flags | MAP_32BIT, -1, 0);
		if (address == MAP_FAILED) {
			address = NULL;
		}
#endif

Perhaps the solution is just to set HAVE_MAP_32BIT on Mac if it's now supported. I'll give this a try.

@terriblefire
Copy link
Author

terriblefire commented Dec 10, 2023

I'm a bit confused why we need an address below 4Gb at all, its not like our pointers are 32bit? i just brain damaged that code out and things worked so far.

@grahambates
Copy link
Contributor

Yeah I'm not sure either and that's certainly one option.

Setting HAVE_MAP_32BIT at least doesn't explode on my machine, so it looks like this is worth a try too.

@terriblefire
Copy link
Author

it just gives me -1 back with that bit set. I started to see errors with out the 64bit check. What confuses me so much is why fs-uae standard manages to get away with this.

@terriblefire
Copy link
Author

terriblefire commented Dec 10, 2023

Ok perhaps this might be useful.

I have been comparing FS-UAE master vs your branch. In your branch JIT_EXCEPTION_HANDLER is defined. Whereas it is not in master. Practically this means the exception handler allocates VM RAM in your branch and it doesnt in master.

Basically for whatever reason your branch (be it due to libraries etc) is running out of 32bit memory space and when that happens FS-UAE cannot continue. It might be worth considering putting any allocations you can after the emulator is running.

@terriblefire
Copy link
Author

terriblefire commented Dec 10, 2023

For me if i add the flag MAP_FIXED to the mmap call it works. This means the (input) address is not taken as a hint but an absolute.

I've no idea about the consequences of that.

EDIT: Been testing this for an hour. Its working perfectly for me.

@grahambates
Copy link
Contributor

That's great news. I can also see why the JIT exception handler is getting installed by default. The following had been lost in compemu_support.cpp in my branch:

void build_comp(void)
{
#ifdef FSUAE
	if (!g_fs_uae_jit_compiler) {
		jit_log("JIT: JIT compiler is not enabled");
		return;
	}
#endif

I'll add it back in.

Re MAP_FIXED, I wonder whether this needs to be for Mac only, or if it's good for Linux too.

@terriblefire
Copy link
Author

terriblefire commented Dec 10, 2023

Linux manpage says MAP_FIXED should behave the same. TBH its probably the safest option here because if you do end up out of the range of native mem you want to fail anyways.

When you dont have MAP_FIXED you just always get the same location back.

grahambates added a commit to grahambates/fs-uae that referenced this issue Dec 10, 2023
Issue raised and investigated by @terriblefire here:
BartmanAbyss/vscode-amiga-debug#243

Fixes several problems relating to allocating VM memory in 32 bit
address space when `UAE_VM_32BIT` flag is set.

Some existing FS-UAE conditional blocks had been lost when patching
WinUAE upstream changes:

- `preinit_shm` was not called. This meant that `natmem_reserved` was
  not set, and failed allocations would result in an infinite loop.
- Shouldn't set exception handler if JIT is not enabled. This was
  resulting in the 32 bit allocation that exposed the problem.

These changes were required for 32 bit allocation to work correctly:

- Ensure initial value for `p` <= `p_max`. The initial value of
  `0x40000000` can be higher than `natmem_reserved`, causing the code to
  break before any allocations happen.
- Use `MAP_FIXED` option with `mmap`. This ensures that the requested
  address is absolute, rather than just a hint.
@terriblefire
Copy link
Author

looks like this is fixed now. thanks.

@grahambates
Copy link
Contributor

Thanks so much for you help on this. I'll test the Linux build and open a PR on this project if everything looks good.

I did notice that JIT in general is not working on my branch, but that's a separate issue. I'd imagine not many people will be enabling that for dev anyway because of its effect on accuracy, but I should fix it anyway.

grahambates added a commit to grahambates/vscode-amiga-debug that referenced this issue Dec 11, 2023
@terriblefire
Copy link
Author

terriblefire commented Dec 11, 2023 via email

grahambates added a commit to grahambates/fs-uae that referenced this issue Jan 26, 2024
Issue raised and investigated by @terriblefire here:
BartmanAbyss/vscode-amiga-debug#243

Fixes several problems relating to allocating VM memory in 32 bit
address space when `UAE_VM_32BIT` flag is set.

Some existing FS-UAE conditional blocks had been lost when patching
WinUAE upstream changes:

- `preinit_shm` was not called. This meant that `natmem_reserved` was
  not set, and failed allocations would result in an infinite loop.
- Shouldn't set exception handler if JIT is not enabled. This was
  resulting in the 32 bit allocation that exposed the problem.

These changes were required for 32 bit allocation to work correctly:

- Ensure initial value for `p` <= `p_max`. The initial value of
  `0x40000000` can be higher than `natmem_reserved`, causing the code to
  break before any allocations happen.
- Use `MAP_FIXED` option with `mmap`. This ensures that the requested
  address is absolute, rather than just a hint.
BartmanAbyss added a commit that referenced this issue Jan 28, 2024
Updated FS-UAE build to fix 32bit VM alloc issue (#243)
@BartmanAbyss
Copy link
Owner

Closing this. If JIT is still a problem, please open a new issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants