Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserving memory ranges #769

Closed
PatrickvL opened this issue Oct 19, 2017 · 31 comments · Fixed by #1872
Closed

Reserving memory ranges #769

PatrickvL opened this issue Oct 19, 2017 · 31 comments · Fixed by #1872
Labels
cpu-emulation LLE CPU enhancement general improvement of the emu high-priority this needs fixing asap kernel xbox kernel related LLE Low Level Emulation memory memory manager problem

Comments

@PatrickvL
Copy link
Member

PatrickvL commented Oct 19, 2017

As long as Cxbx-Reloaded runs code directly, it'll be needed to gain access to certain memory address ranges. This issue goes into detail on this subject.

Our prerequisite is, we must be running as a 32 bit process under 64 bit Windows (thus, relying on wow64).

To be able to run Xbox code directly, memory ranges that must be available are:
0x00010000 : 64 mb, for loading Xbox xbe's (128 mb for Chihiro)
0x80000000 : 64 mb, for kernel stub and contiguous memory shared with other devices
? : Tiled memory
0xF8000000 : 64 mb for various mmio ranges (TODO: split up per device in more detail)

Additionally, titles might request specific address ranges, some of those might assume these requests succeed.

The emulation itself, the DLL's used for it, and memory allocations during run time occupy address ranges too.

Previously, Cxbx was composed of 2 parts : the GUI and an emulation DLL.
The GUI was loaded at 0x00010000, the DLL somewhere else.
When running emulation, the GUI would be launched in "emulation-mode", handing over control to the emulation DLL. Xbe's were loaded by overwriting the GUI code.
This led to unstable and hard to debug code.

Nowadays, Cxbx-Reloaded is a single executable, stripped from relocation information, set to a fixed load address of 0x00010000. The first 128 mb of the executable is occupied by a memory placeholder. Following that, both the GUI and the emulating code are linked into this one big single executable.
This setup makes it possible to load xbe's at their designated addres, while keeping all code together.
The downside of this setup is, linking times are high, the executable contains 128 mb of otherwise useless space, and both the GUI as the emulation environment contain eachothers code, even though they don't use eachother.

(TODO: Move and expand the above into the appropriate wiki article, just link to that here).

Going forward, we envision a setup that would allow better control over available memory, and removes most of the problems we have right now. It works like this:

We'll split up the GUI from the emulation again, and add a loader into the mix.

The emulation code will be put in a DLL once again.
The GUI will be a separate executable, sharing only the configuration data with the emulation DLL.
The loader will be a tiny executable, loaded at address 0x00010000.

The GUI will stay the main executable.
The loader executable will arrange memory reservation and load the emulation DLL.
The emulation DLL will bootstrap our kernel and run emulation.

Since the tiny loader will reside at 0x00010000, the few pages occupied by it are already reserved.
As a first matter, it will reserve the rest of the lowest 128 mb of virtual memory, the contiguous memory and all other ranges that are Xbox specific (and thus off-limits for other purposes).
If we ever come across xbe's that reserve specific memory ranges, this is the time to claim those too.

This is done, so that all DLL's that need to be loaded afterwards, and all heap allocations following from it, won't end up in ranges we must have control over - the reserved addresses won't be available anymore, so all virtual memory requests by the host have no other choice but to use other addresses.

Only after all required virtual memory addresses are reserved, the emulation DLL is loaded, and control is handed over to it.

The emulation DLL will take ownership of the reserved ranges, so that the kernel can hand out memory requests from them, keeping track of the amount of pages a normal Xbox would have available. Thus, the out of memory condition will be reached at the exact same number of allocations as on a real machine. Also, memory reporting API's will use the same bookkeeping for getting statistics.

Obviously, the emulation DLL will do much more, like create a window handle for processing events, access shared memory for configuration data, etc etc

This setup avoids needlessly big executables, returns linking times to normal, won't load GUI code into the emulation process and vice versa, will gain control over all required memory ranges, will behave identical to real hardware and if needed, allows title specific customisation.

The down side is, we'll have 3 projects to compile, so debugging becomes a tiny bit more cumbersome again, as you'll need to make sure the correct project is recompiled after making changes. (Once the GUI and the loader are stabelized, this normally won't be a problem - developers will always have the emulation project active, with the GUI set as host, and child process debugging enabled.)

@PatrickvL PatrickvL added cpu-emulation LLE CPU enhancement general improvement of the emu high-priority this needs fixing asap kernel xbox kernel related LLE Low Level Emulation labels Oct 19, 2017
@PatrickvL
Copy link
Member Author

@ObiKKa : You added a confused reaction, is there anything you would like to see explained in more detail?

@ObiKKa
Copy link

ObiKKa commented Nov 8, 2017

LOL, even the other leading programmer Luke was confused as well! There are so many things to follow here for the mere mortals in us! The start of the post started out confusing. But I think I understand the very last paragraph which implies that you're planning to have this software create perhaps three separate threads to run the internal Xbox emulation code (Which accesses the memory addresses and is intended to behave like on the real hardware), a separate GUI software/front-end and the loader. What is the loader? Is it some set of kernels that boot up to a dashboard operating system software or a Xbox game?

The loader will be a tiny executable, loaded at address 0x00010000.

Probably it would almost mimic that of a typical BIOS kernel that boots up your computer and then the Windows OS or another OS.


Are you sure that running the emulator in 32-bit process via WOW64 will still work on a 64-bit Windows OS? That probably may prevent it from being ported to other operating systems like Linux? Or is there a way to do that?


The downside of this setup is, linking times are high, the executable contains 128 mb of otherwise useless space, and both the GUI as the emulation environment contain eachothers code, even though they don't use eachother.

(TODO: Move and expand the above into the appropriate wiki article, just link to that here).

Going forward, we envision a setup that would allow better control over available memory, and removes most of the problems we have right now. It works like this:

We'll split up the GUI from the emulation again, and add a loader into the mix.

I can see why you are planning on making it this way rather than it being a single executable. This is probably too early to ask, but does the current way the emulator is run make it harder to debug emulation issues? Is everyone in the team on board with this plan of yours?

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

This idea was already discussed between Luke and me, I haven't heard from anyone else.

The three parts will be three separate binaries: two executables (the GUI and the loader) and one DLL (for the emulation code).

The Loader's sole task is to reserve otherwise hard to claim memory. Only once that's done, the emulation DLL is loaded, which will take ownership of these memory ranges. All the while, no GUI code will be present in this process.

@ghost
Copy link

ghost commented Nov 8, 2017

The loader:

  • executes the XBE executable segment at 0x10000, and is always constant;
  • can only run in 32-bit (Pentium 3 instruction set);
  • runs in a separate process than the GUI;
  • as a separate process, the host can reserve the 0x00010000-0x08000000 virtual memory range that is dedicated for the XBE.

For doing MMIO in the 0xFD000000-0xFEF00400 memory range, a signal handler for SIGSEGV is fast enough on a Linux machine to do reads and writes. Use sigaction to catch the signal. Also, mmap this range with a protection of PROT_NONE to catch the signal and handle it.

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

Our current (and the above described) architecture requires running a 32 bit process under a 64 bit OS;
Direct code execution requires the running as a 32 bit process, while running under a 64 bit OS (Wow64) is needed to have as much as possible virtual memory available in the 32 bit address range.

Only the zero page, a high page and the range for kernel32.dll will be unavailable.

Would we run under a 32 OS, much more memory would be unavailable.

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

As for the loader: no, it won't be a BIOS nor kernel.

The loader is merely intended too gain access to all required memory ranges before anything else happens - besides the standard kernel32.dll no DLL's are loaded, to avoid anything taking away memory from under our hands.

Directly after the required memory ranges are reserved, the emulation DLL will be loaded, and control will be handed over to it.

The emulation DLL will take over from there, will be responsible for the memory ranges just reserved, will initialize the rest of the environment (much like a real kernel) and will launch the chosen Xbox software.

The GUI executable will still be the main program to start, it will allow you to configure settings, initiate the launcher to start emulating Xbox software, and have a small piece of shared memory for interacting with the emulation code (EmuShared has been part of Cxbx for more than a decade already, so nothing new there).

@RadWolfie
Copy link
Member

RadWolfie commented Nov 8, 2017

Windows 32 bit OS have the limit of 4 GB of RAM. That's plenty of room. In reality, Cxbx-R would be expecting to use up to 256 or 512 MB of RAM usage. Not more than that.

As for 3 projects, you meant like the purpose below for each?

  • GUI executable (.exe)
  • Xbox emulator (.exe, 0x00010000 to 0x08000000 )
  • LLE module (.dll, 0x0F800000? to 0x0FFFFFFF?)

P.S. Any of Windows' 32 bit API functions, documented, are still accessible to 32 bit OS anyway. It depends on which functions has been introduce in each OS upgrade.

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

No, I'm sorry I haven't explained it clearly enough.

The above described new architecture is not motivated to reduce the amount of host memory used, but to allow us to have near absolute control over memory ranges in the 32 bit address space. If any other code than our loader would be executed before the loader starts running (which would happen if any DLL was statically loaded into this process), there would be a chance these DLL's (like DirectX, OpenGL and other DLL's) already took possession of any of the memory ranges that we want to be in control of.

Thus, the setup is as follows:

  • GUI executable (Cxbx-Reloaded.EXE)
  • The tiny loader (Cxbx-Loader.EXE)
  • The emulation (Cxbx-Reloaded.DLL)

But yes, once emulation starts, all subsequent API's that need to be used, can safely be called once their containing DLL's (system or otherwise) are loaded.

Actually, since our emulation DLL won't be loaded until after the required memory ranges are reserved, it's perfectly okay to statically link all other DLL's to the emulation DLL.
Once the emulation DLL is dynamically loaded, all statically linked DLL's will automatically be loaded by the OS.
Because our required memory ranges are already reserved by then, the OS has no other option than to load these DLL's into other memory ranges.

(Sidenote: This does require that all DLL's we call into from the emulation code, need to be relocatable. If any of these DLL's contain no relocation information block, they won't be able to be loaded if their fixed image base address collides with our reserved memory ranges. In practise however, this is unlikely to happen.)

Given this, is there anything else that needs to be explained in more detail about this proposed new architecture?

@RadWolfie
Copy link
Member

RadWolfie commented Nov 8, 2017

There is no absolute some control over memory ranges in both 32 and 64 bit address space that I@PatrickvL know of after executable is running. Even 32 bit executable are still at limitation of 2 GB RAM accessible anyway with or without 64 bit OS version. It can be increase by 1 more GB with IMAGE_FILE_LARGE_ADDRESS_AWARE defined.

Only if executable itself and reserved address dll preload, then any other dlls cannot be load in those address space.

Also, tiny loader won't work anyway. Unless you have a working sample for us to verify it.

Edit: Thumb down if you want @PatrickvL, my fact is still true anyway.

Edit: Okay, so there is some control over memory ranges.

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

The tiny loader will be made as small as possible, and will avoid loading any DLL until the memory reservations at made. This means the entire 32 bit virtual address space is nearly unused and available to our process, allowing us to do as we please with it.

(Note, that according to MSDN, Large Address Aware 32 bit applications running under Wow64 may address almost the entire 4 GB virtual memory address range. Barring only a few pages that Windows claims.)

I'm quite certain this will work, given we are already doing this to a certain degree.

As soon as I find the time, I'll be happy to write some code showing how the above could be implemented.

@RadWolfie
Copy link
Member

Good luck

@LukeUsher
Copy link
Member

LukeUsher commented Nov 8, 2017

The Large Address Aware flag allows use of the full 4GB address space when running a 32-bit process on 64-bit windows, this is documented on the MSDN and we already know this works: Cxbx-Reloaded already allocates regions in high memory: for the GPU PRAMIN region as well as the "tiled" memory region.

The tiny loader method also works: It was used to reserve the 0x10000 base address by Xenoborg, and we use the same approach in Cxbx-Reloaded currently, except that we have all the code in the same executable: this will just be moved to external libraries so the loader executable takes up less address space.

Not much will change in the way of functionality, this just gives us the ability to map memory a similar way to the real xbox.

Xbox memory mapping is different than windows:

The 64 (128 on Debug kits) of PHYSICAL memory is identity mapped at 0x80000000, this means that any memory accesses in this virtual region, actually address physical memory directly (so 0x80000000 is physical address 0)

All other address space (not counting hardware devices) is virtual memory, and can be mapped to physical memory in any way the title sees fit: It is this feature that we need control over the whole address space to properly support.

We know this will work already, as most of this is already done in Cxbx-Reloaded, we are just restructuring to allow finer control.

@PatrickvL
Copy link
Member Author

One can only but admire everyone's dedication to these subjects! Thumbs up to everyone!

@PatrickvL
Copy link
Member Author

@RadWolfie @LukeUsher Here a mock-up of how the tiny loader could be implemented : PatrickvL@5c5d396

@RadWolfie
Copy link
Member

Your mock up to reserve memory address range will not work 100% accurately. Reason? We're dealing with Windows' modules, antivirus modules, and various modules being load before entry point is call. This apply for pre-load and load DLLs. Once all modules are load by requirement, then executable entry point will be trigger.

If one of our testers having issue because of one of module (before entry point was call) already took the reserved address. Then, we can't provide support for them. ☹️

P.S. I was not aware of VirtualAlloc actually does accept specific address to be input for starting point of reserve address. So, props to you @PatrickvL. 👍 Had been looking for memory range reservation at specific address for ages. Though, I don't need it anymore.

P.S.S. I seriously hope none of the existing modules to this day do load in these address range. 🤞

@RadWolfie
Copy link
Member

I see what you did with the mock up now, You actually used both VirtualAllocResults array to reserve the file size itself and VirtualAlloc with MEM_RESERVE flag combo to lock it down.

In my point of view of "tiny" loader is less than few hundred kilobytes. It's the method that's actually tiny.

@ghost
Copy link

ghost commented Nov 8, 2017 via email

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 8, 2017

Cxbx-Loader.exe is 9 kilobyte right now, and has one dependency : kernel32.dll

@LukeUsher
Copy link
Member

@haxar Xbe's don't run in real mode, real mode is limited to 16-bit instructions, and emulated the 8086, before 32-bit extensions were added.

Actually, Xbox code runs in protected mode, but in ring 0 (kernel space) as opposed to Ring 3, which is user-space (and where Cxbx-Reloaded runs code)

All the other functionality mentioned in your comment is the job of the emulator core DLL, not the loader. The only purpose of the loader is to reserve memory, and nothing more.

For persisting memory when changing apps, we support that the same way the real xbox does currently: Titles can call a kernel API which tells the kernel not to throw away certain memory regions on reboot, and we respect that, this is currently implemented by persisting memory to a memory-mapped file,

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 11, 2017

A status update:

The loader can be seen here: https://github.com/PatrickvL/Cxbx-Reloaded/blob/Loader/src/Cxbx-Loader/loader.cpp
The address ranges that are reserved can be seen here: https://github.com/PatrickvL/Cxbx-Reloaded/blob/Loader/src/Common/XboxAddressRanges.h

When this is run on 64 bit Windows 7, all address ranges succeed, except for USB1, Flash mirror 4, and MCPX.

(Please report results on other Windows versions.)

The above results are sufficient to continue this endeavour, since the address ranges that can't be reserved can safely be ignored; There's still USB0 (neither of which we currently LLE anyway), the other three Flash memory mirrors are still available (but we won't be emulating Flash memory anyway) and the MCPX won't be emulated either (since we implement our own bootstrap and kernel).

Next step would be to move all emulation code towards the Cxbx-Reloaded.DLL and let that take ownership of the just reserved address ranges. (See issue #788 for more details about emulating physical and virtual memory.)

The DLL will need to do it's bootstrap a little differently from what we do now, as it has to use the new address range declarations as source for its device lookup.

Also, the command line still has to be retrieved (see https://msdn.microsoft.com/en-us/library/windows/desktop/ms683156(v=vs.85).aspx for details), as we don't do anything with that, since the CRT has been stripped from the tiny loader to keep any uncontrolled side-effects to a minimum.

Another thing we need to investigate, is if the DLL can still create a channel for handing window messages (wndproc).

Once this works, we also need to strip all emulation code from the GUI - only Xbe access and EmuShared (plus it's dependencies) will have to be kept in the GUI.

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 14, 2017

Another progress report on https://github.com/PatrickvL/Cxbx-Reloaded/commits/Loader :

We're now at a point where Direct Input can't be initialized, but up till that, everything works as designed.
The tiny loader reserves all required address ranges, then loads and transfers execution over to the emulation DLL, which can use the reserved addresses for physical, virtual, contiguous, write-combined and tiled memory, and for mmio handling.
The NV2A handler is once again processing all accesses to it's mmio address range, so the other ranges will work just as well once we attach device handler functions to them.
Anyway, the fact we're now emulating from within the context of a DLL instead of an executable seems to complicate a few things that worked just fine before, like we have trouble creating a console window (for logging output), the aforementioned initialization in Direct Input, and whatever else will rear it's head.

Luke has an idea that, once implemented, could improve stability:

"
The Emulate() function cannot safely return, as the caller in the loader might be overwritten at runtime, ExitProcess should be used instead.

Also a suggestion, regarding thread contexts and the CRT (It may fix your issue, it may not, but it's good practice for this use case)

Implement DllMain, and in DllProcessAttach, spawn a thread, that thread should wait on a Condition Variable/Critical Section indefinitely. This thread runs in the DLL context, rather than the EXE context

Then the Emulate() function should set some global variables in the DLL, release the lock allowing the Dll thread to continue, and then spin in a loop forever (Sleep(HIGH_NUMBER) in a loop. This is so it doesn't return and cause the emulator to crash/exit, and so the only code running in the .EXE context is a single null function, within the DLL, so it doesn't matter if the EXE gets trashed.

After the lock is released, the Dll thread should then call CxbxKrnlMain and startup emulation.
"

@PatrickvL
Copy link
Member Author

PatrickvL commented Nov 20, 2017

Status update: the DirectInput issue has been fixed. As it turns out it had nothing to do with DirectInput, nor the fact that emulation code is now running from a DLL once again. Instead, it was due to a difference in memory layout between the GUI and the emulator.

After fixing this, Release builds started to run again too, so that means this branch can now be reviewed, verified, tested for regressions and submitted as a pull request afterwards.

Any volunteers for that??

@PatrickvL
Copy link
Member Author

The work can be found here : https://github.com/PatrickvL/Cxbx-Reloaded/commits/Loader

@PatrickvL
Copy link
Member Author

for fine grained control over memory ranges, the reservation of pages might need to be done per page, instead of one VirtualAlloc reservation call per range
When an entire range is reserved up front, individual pages in this range cannot be "unreserved" on their own - VirtualFree is an all or nothing thing, really
So, I suspect address range reservation must be done per page, so that we can VirtualFree each one separately later on

@PatrickvL
Copy link
Member Author

This work should be picked up again soon, given it was in a nearly deliverable state...

@PatrickvL
Copy link
Member Author

PatrickvL commented Feb 10, 2019

Well, soon turned out to be "within one year" (but only just!);

As it turns out, a rebase of this year-old branch isn't feasible anymore, as too much has changed to resolve easily. So, instead, I've taken it upon me to redo the work, by applying the same steps done before in the Loader branch, into a freshly started Loader reloaded branch (pun intended) - which can be seen here : https://github.com/PatrickvL/Cxbx-Reloaded/commits/Loader_reloaded

This time, I'll avoid breakage by keeping the existing code in working shape. This, by delivering 3 new projects (a GUI, a loader and the emulation code), separate from (but sharing code with) the existing Cxbx project. (Any shared code that needs to be updated, will differentiate between compilation of the existing Cxbx versus the new projects, using feature-specific #ifdefs.)

Current status is :

  • the tiny loader is copied over from the old Loader branch
  • the tiny loader is refactored so:
    • the hard-coded list of memory ranges is redesigned and marked with system-specifiers
    • command-line argument are now available for selecting the required memory-layout (/xbox, /chihiro, or /devkit)
    • additional memory-ranges we nowadays use were added
    • obsolete ranges were deleted
    • related functions are split up over goal-specific files, allowing re-use in the other projects
  • the GUI is just an empty stub now (since we can use the existing Cxbx.exe to test for now)
    • later on, if this endeavor turns out successful, the GUI code will be copied over from Cxbx.exe
  • the emulation DLL contains all emulation code also in Cxbx
  • everything compiles successfully

Currently, I'm refactoring the emulation startup code sequences such, so that it can be shared between existing Cxbx and the new emulation DLL project.
Once that's done, actual testing and problem-fixing can start!

These things will need to be solved before this can be delivered at all:

  • the memory-manager classes will have to take the pre-reserved ranges into account
  • our VirtualProject kernel API will have to be changed so it changes the given range in single-block increments (as a result from the way the loader reserves memory ranges)

@RadWolfie
Copy link
Member

I'm thinking we can drop /xbox, /chihiro, or /devkit option from CLI. Instead, we can allocate two 64MB memory ranges next to each other. Then free 2nd 64MB memory range if xbe is a retail or any method internally say so before start emulation.

@RadWolfie
Copy link
Member

@LukeUsher's initial idea is:

  • cxbxr.exe
  • cxbxr-ldr.exe (because it loads the emulator)
  • cxbxr-emu.dll (because it handles the emulation)

I agree with his idea for the goal in this wip project.

@RadWolfie
Copy link
Member

Since we have finished the memory ranges reservation in loader project as of #1851 pull request. We can close this issue now?

@GXTX
Copy link
Contributor

GXTX commented Apr 7, 2020

Should wait to close until the branch is merged into master.

@PatrickvL
Copy link
Member Author

There's still the issue of supporting title-specific address ranges

@RadWolfie RadWolfie linked a pull request Apr 15, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpu-emulation LLE CPU enhancement general improvement of the emu high-priority this needs fixing asap kernel xbox kernel related LLE Low Level Emulation memory memory manager problem
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants