Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X64 boot fixes #115

Closed
wants to merge 21 commits into from
Closed

Conversation

Barracuda72
Copy link

Purpose

A bunch of x64-related fixes. Now loader is able to boot kernel even on machines with > 1 GB of RAM and make some initialization progress. Still crashes somewhere in MiInsertVad though.

Proposed changes

Trivial fixes here and there. The most confusing part can be checking and mapping kernel PTEs, I think.

@encodedpr
Copy link
Contributor

Yes please! give us more!

@encodedpr
Copy link
Contributor

Why are the changes in ntoskrnl/ex/handle.c necessary?

@Barracuda72
Copy link
Author

Barracuda72 commented Nov 9, 2017

Why are the changes in ntoskrnl/ex/handle.c necessary?

Well, they all not exactly necessary. The problem lies here:
HandleArray = PointerArray[Handle.MidIndex]
at some point in time HandleArray becomes NULL (probably this should be investigated further). Then after fall-through
Entry = &HandleArray[Handle.LowIndex]
Entry becomes garbage (something like 0x0000000000000028) and then returned to caller. This produces page fault in caller.

Check for PointerArray was added just for homogeneity.

Probably these checks should be written as ASSERT's instead, as this case is not mean to happen as far as I see.

@encodedpr
Copy link
Contributor

It could be that some HANDLE somewhere is incorrectly cast (sign-extension vs zero-extension). An assert and some back traces would be helpful.

@Barracuda72
Copy link
Author

It could be that some HANDLE somewhere is incorrectly cast (sign-extension vs zero-extension). An assert and some back traces would be helpful.

There is no assert, just PF around handle.c:740, here:

/* Now get the next value and do the compare */
NewValue = *(volatile ULONG*)&Entry->NextFreeTableEntry;

because Entry is garbage. Backtrace here (can't figure out how to attach it to comment):
https://pastebin.com/raw/TUxMfqKj

Copy link
Member

@ThFabba ThFabba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the hacks here should be applied to other architectures, so they at least need an _M_AMD64 check.
We probably won't want to commit most of the hacky stuff anyway, and instead work on root-causing them one by one, to identify the proper fixes. If you have any ideas about those root causes, it would be great if you could add them as comments near the corresponding hacks though.

If you want to make a second PR that contains just the EFER define fixes and the disabling of the debug code in MmArmInitSystem, we could get that into master already, to reduce the size of the diff here.

Thanks!

@@ -207,7 +208,10 @@ MiInitializePageTable(VOID)
ASSERT(__readcr4() & CR4_PGE);

/* Enable no execute */
__writemsr(X86_MSR_EFER, __readmsr(X86_MSR_EFER) | EFER_NXE);
FeatureBits = KiGetFeatureBits();
if ((FeatureBits & KF_NX_BIT) == KF_NX_BIT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, AMD64 by definition must support NX. Do you have documentation that cases exist where it doesn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AMD64 Architecture manual treats it as optional on section 5.6.3 https://support.amd.com/TechDocs/24593.pdf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it should be the other way around: feature is unavailable unless clearly stated otherwise. Intel SDM, v.3A does not state nonconditional availability of this feature:

Detect the presence of the execute disable bit capability using the CPUID instruction. CPUID.80000001H. EDX[bit 20] = 1 indicates the bit is available.
If the bit is available and PAE is enabled, enable the execute disable bit capability by setting the IA32_EFER.NXE[bit 11] = 1. IA32_EFER is available if CPUID.80000001H.EDX[bit 20 or 29] = 1.
If the execute disable bit capability is not available, a write to IA32_EFER.NXE produces a #GP exception.

So I think there is no harm in checking it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NX is required on x64 versions of Windows/ReactOS. It is already checked in KiInitializeCpu and the system will bugcheck if not available. KiInitializeCpu also enables NX in MSR_EFER already, so if can be removed entirely from this place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I missed that because it uses its own set of constants. I would argue EFER_xxx is the better naming scheme, so we should remove MSR_SCE, MSR_NXE et al, and move the EFER_xxx versions to NDK. Agreed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure, whether NDK is the right place to put these low level architecture specific things, a shared header seems right though. Maybe we create a (set of) header(s) for this stuff? In that case I would also vote for using structs/bitfields for all kinds of CPU related registers/structures rather than constants, since that is usually more convenient to use.

#define EFER_LMA 0x0400
#define EFER_NXE 0x0800
#define EFER_SVME 0x1000
#define EFER_FFXSR 0x4000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This commit LGTM.

@@ -1091,6 +1091,9 @@ ExpInitializeExecutive(IN ULONG Cpu,
/* Initialize the executive at phase 0 */
if (!ExInitSystem()) KeBugCheck(PHASE0_INITIALIZATION_FAILED);

/* HACK: Initialize RtlpUse16ByteSLists as ARM3 uses list functions */
KiInitMachineDependent(); // Does nothing, just init our variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break x86. I would suggest simply moving the RtlpUse16ByteSLists initialization to KiInitializeKernel, e.g. right after this:

/* Set boot-level flags */
KeFeatureBits = Prcb->FeatureBits;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, moved that

// Continue, if it is not at a higher address than previous address
if (MemoryDescriptor->BasePage < PageLookupTableStartPage) continue;
// Continue, if it is not at a lower address than previous address
if ((PageLookupTableStartPage) && (MemoryDescriptor->BasePage > PageLookupTableStartPage)) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know anything about why the higher addresses are not mapped?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... It's possible to enable 32-bit protected mode without paging. Even more, all MM code is written under that assumption. But in case of long mode it's not true: you need prepared page tables to enter it. So Freeldr creates temporary tables, but maps only 1 Gb of RAM on x64 (identically to kernel scape and to user space). And because of aforementioned assumption MM tries to find most high memory area to place PLT. On x86 it's valid; on x64 it's not, because in case there are more than 1 Gb of RAM this will crash. So I think there isn't any harm of placing PLT in lower physical memory (isn't it?). Another solution would be rewritting Freeldr init code so that it correctly determines amount of RAM available and initializes structures accordingly.

*/
PVOID
NTAPI
MiCheckAddressRoute(PVOID address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify this hack by using MiMakePdeExistAndMakeValid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems much complicated as for me. MiMakePdeExistAndMakeValid requires Process descriptor and interrupt level info, it uses cycle while my hack is linear, it calls some functions... Questionable, though.

@@ -68,12 +68,20 @@ ExpLookupHandleTableEntry(IN PHANDLE_TABLE HandleTable,
/* Get the mid level pointer array */
PointerArray = PointerArray[Handle.HighIndex];

/* TODO: maybe ASSERT? */
if (PointerArray == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be symptoms of Mm bugs, so we should figure out why some pages seem to be mapped improperly.

@Barracuda72
Copy link
Author

If you want to make a second PR that contains just the EFER define fixes and the disabling of the debug code in MmArmInitSystem, we could get that into master already, to reduce the size of the diff here.

I removed old changes and put only ones that aren't so questionable. Is it OK now?

@@ -207,7 +208,10 @@ MiInitializePageTable(VOID)
ASSERT(__readcr4() & CR4_PGE);

/* Enable no execute */
__writemsr(X86_MSR_EFER, __readmsr(X86_MSR_EFER) | EFER_NXE);
FeatureBits = KiGetFeatureBits();
if ((FeatureBits & KF_NX_BIT) == KF_NX_BIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NX is required on x64 versions of Windows/ReactOS. It is already checked in KiInitializeCpu and the system will bugcheck if not available. KiInitializeCpu also enables NX in MSR_EFER already, so if can be removed entirely from this place.

@@ -2118,6 +2118,7 @@ MmArmInitSystem(IN ULONG Phase,
TestPte = MiProtoPteToPte(&TempPte);
ASSERT(PointerPte == TestPte);

#ifndef _M_AMD64 // Not working on x64 for obvoius reason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on anything else than x86?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it might work on PPC/ARM. We seem to be copying the x86 memory layout for those.

@@ -28,6 +28,7 @@
#define X86_FEATURE_CX8 0x00000100 /* CMPXCHG8B instruction present */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are not even used and all the values are hardcoded in KiGetFeatureBits. Feel free to improve that. Or not :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple enough, I think I will do it then. Readablility of code is important.

@Barracuda72
Copy link
Author

NX is required on x64 versions of Windows/ReactOS. It is already checked in KiInitializeCpu and the system will bugcheck if not available. KiInitializeCpu also enables NX in MSR_EFER already, so if can be removed entirely from this place.

OK. Will remove it then.

@Barracuda72
Copy link
Author

OK, removed duplicate NX setup and beatified KiGetFeatureBits a bit, fixing tiny bug on the way (Hyper-Threading support is reported in Edx, not Ecx).

@@ -190,6 +190,7 @@ MiInitializePageTable(VOID)
ULONG64 PxePhysicalAddress;
MMPTE TmplPte, *PointerPxe;
PFN_NUMBER PxePfn;
ULONG FeatureBits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, sorry. Fixed.

@@ -152,31 +152,31 @@ KiGetFeatureBits(VOID)
Prcb->InitialApicId = (UCHAR)(CpuInfo.Ebx >> 24);

/* Convert all CPUID Feature bits into our format */
if (CpuInfo.Edx & 0x00000002) FeatureBits |= KF_V86_VIS | KF_CR4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: while touching such lines, do we want to enforce the multi-line it () {} pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would accept, but not enforce.
In this particular case, I think the function would become unnecessarily verbose and thus less readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Readability nit: shall we maintain vertical alignment of FeatureBits |= ..., as they were and as defines are?

@@ -45,7 +45,7 @@ extern NTSTATUS MiRosTrimCache(ULONG Target, ULONG Priority, PULONG NrFreed);
VOID
INIT_FUNCTION
NTAPI
MiCreateArm3StaticMemoryArea(PVOID BaseAddress, ULONG Size, BOOLEAN Executable)
MiCreateArm3StaticMemoryArea(PVOID BaseAddress, SIZE_T Size, BOOLEAN Executable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.
Nit:
MmCreateMemoryArea grep
mm.h has SIZE_T, but marea.c has ULONG_PTR: should the latter be syn'ed too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all it doesn't matter at all, since ULONG_PTR and SIZE_T are technically the same. I also suggest not putting too much "nits" into PR reviews (that's 90% of my PRs at work and I hate it), since we don't want to turn providing an improvement into an obligation to also "fix" the rest of our code. Think about why there is Wine-Staging and why we use it. We don't have Ros-Staging so it could become dev/null at some point. Let's not go there.

@Barracuda72
Copy link
Author

Made tiny bit of a progress. With help of another hack (MiCheckAddressRoute in two places, not included in commit) system now boots and loads some drivers.
untitled
There are still some major errors, with registry and such though. One of the most frustrating is about lists: RtlInitializeSListHead on x64 requires alignment on 16-byte boundary, but in almost all calls to it it's not respected (have to do something with memory allocation, I think).


/* Print EBP */
DbgPrint("Ebp %p\n", Ebp);

/* Walk it */
while(Ebp != 0 && Ebp >= (PULONG)Thread->Tcb.StackLimit)
while(Ebp != 0 && Ebp >= (PULONG_PTR)Thread->Tcb.StackLimit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is x86 specific code. A proper fix would be using KeRosDumpStackFrames() or something similar. Alternatively use #ifdef _M_IX86 ... #else DbgPrint("FIXME: Backtrace skipped on non-x86\n") #endif

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, made it conditional. Didn't know that these is x86-specific.

@@ -71,7 +71,7 @@ FsRtlpIsDfsEnabled(VOID)
return TRUE;
}

return ((ULONG)KeyQueryOutput.KeyInfo.Data != 1);
return ((ULONG)KeyQueryOutput.KeyInfo.Data[0] != 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is UCHAR Data[1];, so it would need to be either return (*(PULONG)KeyQueryOutput.KeyInfo.Data != 1); or fix the structure definition above to be something like union { KEY_VALUE_PARTIAL_INFORMATION KeyInfo; struct { ULONG Header[3]; ULONG KeyValue; } } KeyQueryOutput; and return (KeyQueryOutput.KeyValue != 0); or struct KEY_VALUE_PARTIAL_INFORMATION KeyInfo; UCHAR Fill[3];} and PULONG KeyValue = (PULONG)KeyQueryOutput.KeyInfo.Data; return (*KeyValue != 1);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see. There are some nasty little things going on here. Well, okay, will do first option. But on little-endian machine my option will work too (BTW, is PPC port still in game?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a portable construct would be nice in any case. Because digging each time in the code to check whether the thing will work when you switch the endianness is a pain.

@@ -67,12 +69,14 @@ ExpLookupHandleTableEntry(IN PHANDLE_TABLE HandleTable,

/* Get the mid level pointer array */
PointerArray = PointerArray[Handle.HighIndex];
ASSERT(PointerArray == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean !=?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, probably. Will look into that later


typedef union _EXHANDLE
{
struct
{
ULONG_PTR TagBits:2;
ULONG_PTR Index: HANDLE_MEANINGFUL_BITS;
ULONG_PTR TagBits: HANDLE_TAG_BITS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: source alignment broken? (or usage of tabs instead of spaces?)


#define ROUND_UP(n, align) \
ROUND_DOWN(((ULONG)(n)) + (align) - 1, (align))
ROUND_DOWN(((ULONG_PTR)(n)) + (align) - 1, (align))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be carefully synced with the other ALIGN_MEM_xxx macros etc everywhere else (FreeLdr and elsewhere).
I don't know whether @tkreuzer or @ThFabba have some particular comments about that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any ALIGN_MEM_...? Can't find them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, these are called ALIGN_UP_... and ALIGN_DOWN_... :
https://git.reactos.org/?p=reactos.git&a=search&h=HEAD&st=grep&s=ALIGN_UP
https://git.reactos.org/?p=reactos.git&a=search&h=HEAD&st=grep&s=ALIGN_DOWN

Also I think some other ROUND_UP (not necessary in the kernel, but in other places in the code, but related to manipulating (what would be 64 bits) points) may need fixup:
https://git.reactos.org/?p=reactos.git&a=search&h=HEAD&st=grep&s=ROUND_UP
and check also for ROUND_DOWN ...

@@ -18,6 +18,8 @@
LIST_ENTRY HandleTableListHead;
EX_PUSH_LOCK HandleTableListLock;
#define SizeOfHandle(x) (sizeof(HANDLE) * (x))
#define MAKE_HANDLE(x) ((x) << 2)
#define CLEAR_HANDLE(x) MAKE_HANDLE(x.Index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this defined like that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you a right, it's a bit unclear. Probably the best way to write this will be

#define MAKE_HANDLE(x) ((x) << HANDLE_TAG_BITS)

so it will be more obvious.

BTW, the error with handles was quite interesting: it's one of those "32-bit chauvinism" cases, there the code was working just by chance of (sizeof(HANDLE) * (x)) == (x << HANDLE_TAG_BITS). As I can see, there is a whole bunch of similar cases in ReactOS code...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my point was more about the "CLEAR_HANDLE" definition :P

@ThFabba ThFabba self-assigned this Dec 15, 2017
@tkreuzer
Copy link
Contributor

@Barracuda72 Please check #200

@ThFabba ThFabba assigned tkreuzer and unassigned ThFabba Dec 18, 2017
tkreuzer added a commit to tkreuzer/reactos that referenced this pull request Dec 28, 2017
Based on patch by Ivan Labutin. See PR reactos#115
tkreuzer added a commit to tkreuzer/reactos that referenced this pull request Dec 28, 2017
Based on patch by Ivan Labutin. See PR reactos#115
tkreuzer added a commit that referenced this pull request Dec 29, 2017
Based on patch by Ivan Labutin. See PR #115
@learn-more
Copy link
Member

You might wanna rebase this on master (and resolve the conflicts) to avoid duplicating work.

@tkreuzer
Copy link
Contributor

The relevant parts of this PR have been committed. The rest is basically hacks that won't be required after my related PRs have been merged. Closing.

@tkreuzer tkreuzer closed this Dec 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants