Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instruction decoding and emulating support for IOIO instructions #391

Merged
merged 12 commits into from
Sep 11, 2024

Conversation

cxdong
Copy link
Contributor

@cxdong cxdong commented Jun 20, 2024

Add the instruction decoding and emulating support for ins/outs instructions. To support emulating, the InsnMachineCtx has been extended with more methods for the instruction decoder to get CPU state. And implemented these new methods for the X86ExceptionContext.

Beside provides more CPU states, it also provides the memory accessing ability to the instruction decoder via the new trait InsnMachineMem. The PageTable has been extended to support translating virtual addresses to the corresponding physical addresses, with the accessing rights been checked. And the physical addresses are temporarily mapped in the page table with contiguous virtual addresses via a PerCPUPageMappingGuard, which is wrapped in the new structure MemMappingGuard. This new structure implements the trait InsnMachineMem and is used by the instruction decoder to access memory.

Besides emulating ins/outs instruction, in/out instruction can also be emulated. The test cases have been enhanced to cover the emulation for ins/outs/in/out instructions with byte, word and double word size.

In the end, the SEV #VC handler has been simplified by directly using the emulation interface to handle IOIO instructions. The test_in_svsm test_port_io_string_16_get_last() now is enabled. Has set up a TDX machine which can boot the SVSM in TD, so performed similar test as what test_port_io_string_16_get_last() did with TD, but didn't have a chance to test on with SEV due to lack of AMD machine. Appreciate it if someone can help to try this PR on an AMD machine.

@cxdong cxdong force-pushed the ioio branch 3 times, most recently from 1bc8213 to a16cdf3 Compare June 20, 2024 07:55
@00xc 00xc requested a review from p4zuu July 10, 2024 11:43
@00xc
Copy link
Member

00xc commented Jul 10, 2024

Please rebase on main, thanks!

@p4zuu
Copy link
Collaborator

p4zuu commented Jul 10, 2024

SVSM fails to boot in debug mode (RELEASE=0) starting from commit 8188603 "kernel/cpu/vc: Simplify IOIO instruction handling in #VC handler":

[Stage2] COCONUT Secure Virtual Machine Service Module (SVSM) Stage 2 Loader
[Stage2] ERROR: Panic: panicked at kernel/src/stage2.rs:353:14:
Failed to load kernel ELF: Mem

However, this boots fine (and test in SVSMs pass as well) on AMD for release builds.

@cxdong
Copy link
Contributor Author

cxdong commented Jul 12, 2024

Please rebase on main, thanks!

Rebased and force updated.

@cxdong
Copy link
Contributor Author

cxdong commented Jul 12, 2024

SVSM fails to boot in debug mode (RELEASE=0) starting from commit 8188603 "kernel/cpu/vc: Simplify IOIO instruction handling in #VC handler":

[Stage2] COCONUT Secure Virtual Machine Service Module (SVSM) Stage 2 Loader
[Stage2] ERROR: Panic: panicked at kernel/src/stage2.rs:353:14:
Failed to load kernel ELF: Mem

However, this boots fine (and test in SVSMs pass as well) on AMD for release builds.

Thanks for testing this on AMD machine!

I think this issue is related with the stage2.bin size which is increased quite a bit in DEBUG version by using the instruction decoder crate to emulate the in/out instructions, and results in the heap area becomes smaller. So in the latest update, that commit is refactored only for the SVSM #VC handler and leaving stage#2 #VC handler as before. The DEBUG version worked fine from my side on TDX machine because I made some hack to increase the stage2 boot memory larger than 640K, and the reason I increased this because stage2.bin size did become larger, which I thought was caused by introducing TDX specific code. But seems using the instruction decoder to emulate in/out instructions in stage2 also increased the size.

Thanks again for the testing! Please let me know if this issue can still be observed.

kernel/src/insn_decode/decode.rs Outdated Show resolved Hide resolved
kernel/src/mm/pagetable.rs Outdated Show resolved Hide resolved
@p4zuu
Copy link
Collaborator

p4zuu commented Jul 12, 2024

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

@cxdong
Copy link
Contributor Author

cxdong commented Jul 12, 2024

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

Thank you for investigating this. This error represents svsm kernel is failed to translate a virtual address with checking permissions in the page table. So what I can image is that something might be wrong in the translate_addr_region() or the check_access_rights() functions. Please let me know if I can provide any help on this.

kernel/src/cpu/vc.rs Show resolved Hide resolved
kernel/src/insn_decode/decode.rs Outdated Show resolved Hide resolved
kernel/src/insn_decode/decode.rs Outdated Show resolved Hide resolved
kernel/src/insn_decode/decode.rs Show resolved Hide resolved
@p4zuu
Copy link
Collaborator

p4zuu commented Jul 12, 2024

I think this issue is related with the stage2.bin size which is increased quite a bit in DEBUG version by using the instruction decoder crate to emulate the in/out instructions, and results in the heap area becomes smaller. So in the latest update, that commit is refactored only for the SVSM #VC handler and leaving stage#2 #VC handler as before. The DEBUG version worked fine from my side on TDX machine because I made some hack to increase the stage2 boot memory larger than 640K, and the reason I increased this because stage2.bin size did become larger, which I thought was caused by introducing TDX specific code. But seems using the instruction decoder to emulate in/out instructions in stage2 also increased the size.

Good catch, I also doubted it was a binary size issue, it already happened in the past. I'm not sure if we'll need INS/OUTS in stage 2, so enabling this only in the SVSM #VC handler makes sense to me. However, I think we'll need to really discuss this stage2 binary size issue with Jörg once he's back since this is a recurrent problem now

@cxdong
Copy link
Contributor Author

cxdong commented Jul 24, 2024

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

Thank you for investigating this. This error represents svsm kernel is failed to translate a virtual address with checking permissions in the page table. So what I can image is that something might be wrong in the translate_addr_region() or the check_access_rights() functions. Please let me know if I can provide any help on this.

Some hint from my side, according to the reported error code 9, PageFaultError::R is marked, which represents the code has detected some reserved bit has been set in the page table entry. There are 3 places in the code will set the error, but I guess the most likely place is the function "PTEntry::has_reserved_bits()". It takes PagingMode as an input, which should be PagingMode::PML4 for AMD SEV, which represents 4 level paging in long mode. Probably some logic inside this function is not correct.

@p4zuu
Copy link
Collaborator

p4zuu commented Aug 5, 2024

In in-svsm testing, it hangs now before jumping to the kernel main:

[Stage2] COCONUT Secure Virtual Machine Service Module
[Stage2] Order-00: total pages:    15 free pages:     1
[Stage2] Order-01: total pages:     1 free pages:     1
[Stage2] Order-02: total pages:     0 free pages:     0
[Stage2] Order-03: total pages:     1 free pages:     1
[Stage2] Order-04: total pages:     0 free pages:     0
[Stage2] Order-05: total pages:     0 free pages:     0
[Stage2] Total memory: 100KiB free memory: 44KiB
[Stage2]   kernel_region_phys_start = 0x0000008000000000
[Stage2]   kernel_region_phys_end   = 0x0000008001000000
[Stage2]   kernel_virtual_base   = 0xffffff8000000000
*hang here*

I suspect a kvm panic that doesn't show on my screen. This doesn't happen in a regular (non in-svsm testing) SVSM boot

@cxdong
Copy link
Contributor Author

cxdong commented Aug 6, 2024

In in-svsm testing, it hangs now before jumping to the kernel main:

[Stage2] COCONUT Secure Virtual Machine Service Module
[Stage2] Order-00: total pages:    15 free pages:     1
[Stage2] Order-01: total pages:     1 free pages:     1
[Stage2] Order-02: total pages:     0 free pages:     0
[Stage2] Order-03: total pages:     1 free pages:     1
[Stage2] Order-04: total pages:     0 free pages:     0
[Stage2] Order-05: total pages:     0 free pages:     0
[Stage2] Total memory: 100KiB free memory: 44KiB
[Stage2]   kernel_region_phys_start = 0x0000008000000000
[Stage2]   kernel_region_phys_end   = 0x0000008001000000
[Stage2]   kernel_virtual_base   = 0xffffff8000000000
*hang here*

I suspect a kvm panic that doesn't show on my screen. This doesn't happen in a regular (non in-svsm testing) SVSM boot

A possible reason may still be the stage2 binary size. If the size of stage2 is too large, the remaining memory for heap in stage2 might not be large enough to load svsm kernel. So, is it possible to try with "make test-in-svsm RELEASE=1"? Suppose the release version should give a smaller stage2 binary.

I also want to do some investigation about the in-svsm test failure, but seems I didn't encounter the hang issue. From my side, there is a panic error which is reasonable as the in-svsm test expects to run on SEV, not TDX.

[SVSM] 1 CPU(s) present
[SVSM] Brought 0 AP(s) online
[SVSM] [CPU 0] Virtual memory pages used: 0 * 4K, 0 * 2M
[SVSM] Launching request-processing task on CPU 0
[SVSM] running 128 tests
[SVSM] test cpu::tss::test::test_tss_offsets ...
[SVSM] test cpu::vc::tests::test_has_amd_cpuid ...
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:321:9:
assertion `left == right` failed
  left: Ok("GenuineIntel")
 right: Ok("AuthenticAMD")
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800010a33e]
[SVSM]   [0xffffff8000019a9d]
[SVSM]   [0xffffff8000098d51]
[SVSM]   [0xffffff80000835c1]
[SVSM]   [0xffffff8000066891]
[SVSM]   [0xffffff8000056498]
[SVSM]   [0xffffff800002dd95]
[SVSM]   [0xffffff80000a8794]
[SVSM]   Invalid frame
[SVSM] ---END---

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

@p4zuu
Copy link
Collaborator

p4zuu commented Aug 6, 2024

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

So this bug indeed doesn't show up with make test-in-svsm RELEASE=1 (weirdly, make test-in-svsm RELEASE=0 is also working) but it shows with make test-in-svsm. It appears that make test-in-svsm RELEASE=1 and make test-in-svsm RELEASE=0 both build in release mode. For this, maybe #432 can help. I'll work on it as well
I'll continue investigating the bug in this comment in release mode

@cxdong
Copy link
Contributor Author

cxdong commented Aug 7, 2024

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

So this bug indeed doesn't show up with make test-in-svsm RELEASE=1 (weirdly, make test-in-svsm RELEASE=0 is also working) but it shows with make test-in-svsm. It appears that make test-in-svsm RELEASE=1 and make test-in-svsm RELEASE=0 both build in release mode.

By looking into the compiling log of running "make test-in-svsm RELEASE=0", seems cargo still applies the parameter --release, which may be because the RELEASE is defined even in this case for the Makefile.

For this, maybe #432 can help. I'll work on it as well I'll continue investigating the bug in this comment in release mode

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

@p4zuu
Copy link
Collaborator

p4zuu commented Aug 7, 2024

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

A few more hints here: PageFaultError::R is returned after entry.has_reserved_bits() returns a non-null value. This happens in check_access_write() in translate_addr_lvl0(). The faulty virtual address is 0xFFFFFF8000062000, the level 0 page table entry is 0x8008008000062161.
In has_reserved_bits() we're correctly hitting the PagingMode::PML4, that returns common = 0xFFC0000000000 , resulting in has_reserved_bits() returning ((0x8008008000062161 & 0xFFC0000000000) != 1) = true

@cxdong
Copy link
Contributor Author

cxdong commented Aug 8, 2024

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

A few more hints here: PageFaultError::R is returned after entry.has_reserved_bits() returns a non-null value. This happens in check_access_write() in translate_addr_lvl0(). The faulty virtual address is 0xFFFFFF8000062000, the level 0 page table entry is 0x8008008000062161. In has_reserved_bits() we're correctly hitting the PagingMode::PML4, that returns common = 0xFFC0000000000 , resulting in has_reserved_bits() returning ((0x8008008000062161 & 0xFFC0000000000) != 1) = true

Thanks so much for the investigation!

On TDX machine, the PHYS_ADDR_SIZE is 52 and the PML4 has reserved mask 0x0 for the level 0. As for SEV, if the reserved mask is 0xFFC0000000000 then seems the PHYS_ADDR_SIZE is not 52 but 42 which is not the actual maximal physical address size (containing the shared/private mask bit). If PHYS_ADDR_SIZE is not 52 for SEV, then probably the guest_phys_addr_size == 42, which leads the PHYS_ADDR_SIZE to be 42, the effective_phys_addr_size to be 42 and MAX_PHYS_ADDR to be 0x40000000000.

Based on this, if I hard-code PHYS_ADDR_SIZE as 42 for TDX, I can observe the same failure when translates for a shared memory (TDX has encryption bit mask set for shared memory, which is opposite with SEV). So did an update which caves out the encryption bit from the reserved bits mask for this case in the function has_reserved_bits. With this change, I didn't observe the issue anymore on TDX when hard-codes the PHYS_ADDR_SIZE as 42.

@cxdong
Copy link
Contributor Author

cxdong commented Aug 21, 2024

Hi @p4zuu,

I noticed that PR #432 has been merged. When you have a moment, could you please retest this PR to check if the boot hang issue, as mentioned in the comment, has been resolved?

Additionally, this PR has been updated to address the test-in-svsm failure on SEV. While I was able to replicate the failure manually on a TDX machine and test the fix, I haven't had the opportunity to verify it on an actual SEV machine. Could you assist by running the test-in-svsm on SEV to confirm whether the issue is indeed fixed?

Thank you for your help!

@cxdong
Copy link
Contributor Author

cxdong commented Aug 22, 2024

As you can see, the data read in MemMappingGuard_read are wrong (for the 4 u32 got in testing port): [0xe8be, 0x9e7f, 0x2b2, 0xaff6].

Yes, correct.

I guess there is something wrong maybe in address translation, or maybe the MemMappingGuard (through PerCPUPageMappingGuard::create_4k_pages() is wrong?

The data should be in the .rodata section of the svsm kernel ELF. The virtual address of the data is 0xffffff8000066aa8, which is printed in the rep_outsw function. This is the original address value passed via rsi register. The instruction decoder has decoded the same address value, so the decoding part is fine.

[SVSM] translate_addr_region: va 0xffffff8000066aa8 len 2

This decoded virtual address is translated to physical address 0x8000066000 with the start offset 0xaa8.

As my knowledge on TDX, the svsm kernel ELF is mapped starting from physical address 0x8000000000, and the virtual address starts from 0xffffff8000000000. And each ELF section is mapped contiguously. On my TDX machine, the .rodata section physical address is the virtual address minus the offset 0xffffff0000000000. So based on this, the virtual address 0xffffff8000066aa8 should be mapped to physical address 0x8000066aa8, which is the same with translation result. From this point of view, the address translation is also correct. But we can double-check this by dumping the section headers.

For the MemMappingGuard, the new virtual address is 0xffffff0040000aa8. And this virtual address can be translated to the expected physical address as below, which looks also correct.

[SVSM] translate_single_addr: va 0xffffff0040000aa8 -> Ok(
    (
        PhysAddr(
            0x8000066aa8,
        ),
        true,
        0x1000,
    ),
)

The test_port_io_string_16_get_last with striping off SEV functions can run successfully on TDX. Not sure if the failure on SEV is still related with the private/shared bit, as this part is different between SEV and TDX.

Another suspect is about the data value in the address 0xffffff8000066aa8. This can be confirmed by dump the .rodata section in the test-kernel.elf. I did this from my side by #objdump -s -j .rodata bin/test-kernel.elf. The data virtual address from my side is 0xffffff80001c794a, and the .rodata section shows the data at 0xffffff80001c794a is 0x1234, which is expected.

ffffff80001c7940 5f626974 6d61702e 72733412 7856bc9a
ffffff80001c7950 f0de6e6f 7420696d 706c656d 656e7465

So could you please help to dump the .rodata section to see if the value in address 0xffffff8000066aa8 is expected?
As well as dumping the section headers by #readelf -a bin/test-kernel.elf since it may be helpful to double-check if the address translation is correct or not?

Copy link
Member

@joergroedel joergroedel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed out the bug in my review. As a general note I question the need for re-mapping already mapped memory for the sake of instruction emulation in a VC/VE handler. For these use-cases the code can just access the virtual addresses directly using a pointer guard like GuestPtr.

@cxdong
Copy link
Contributor Author

cxdong commented Aug 23, 2024

I pointed out the bug in my review. As a general note I question the need for re-mapping already mapped memory for the sake of instruction emulation in a VC/VE handler. For these use-cases the code can just access the virtual addresses directly using a pointer guard like GuestPtr.

The consideration is to make this instruction decoder general for both the SVSM kernel and the guest (SEV-SNP or TDP). If the instruction is from guest, the decoded linear address is mapped in guest's page table and cannot be directly used by the svsm kernel. From another side, the guest memory region represented by the linear address may cross pages, which may not be contiguous. Based on these, translate the guest memory region to a set of physical addresses, and then re-map.

Just realized another thing, the guest memory is not statically mapped in svsm kernel's page table, right? Then seems we need to do the re-map anyway because guest memory doesn't have an accessible virtual address in the svsm kernel.

@joergroedel
Copy link
Member

The consideration is to make this instruction decoder general for both the SVSM kernel and the guest (SEV-SNP or TDP). If the instruction is from guest, the decoded linear address is mapped in guest's page table and cannot be directly used by the svsm kernel. From another side, the guest memory region represented by the linear address may cross pages, which may not be contiguous. Based on these, translate the guest memory region to a set of physical addresses, and then re-map.

I understand and agree with the general need for these abstractions. You are right that for decoding and emulating instructions from the guest they are absolutely necessary. But for emulation in VE/VC handlers there can be a much simpler abstraction which accesses the virtual addresses directly (protected by an exception guard).

Just realized another thing, the guest memory is not statically mapped in svsm kernel's page table, right? Then seems we need to do the re-map anyway because guest memory doesn't have an accessible virtual address in the svsm kernel.

Yes, guest-memory accesses always need the re-map. Note that this also means guest page-table traversals.

@cxdong
Copy link
Contributor Author

cxdong commented Aug 23, 2024

I understand and agree with the general need for these abstractions. You are right that for decoding and emulating instructions from the guest they are absolutely necessary. But for emulation in VE/VC handlers there can be a much simpler abstraction which accesses the virtual addresses directly (protected by an exception guard).

Just a quick question, will there be VE/VC caused by SVSM user mode in future? The reason I ask because for the VE/VC generated by the user mode, it looks like we still need to do the user page-table traversal. And as the user mode memory is not guaranteed to be contiguous, the user mode virtual address may cross pages, so also need a re-mapping.

As currently there is no VE/VC caused by SVSM user mode, as well as no need to decode/emulate for guest, I can remove the page-table traversal and remapping code from this PR, and leverage the GuestPtr to emulate VE/VC generated by the SVSM kernel only. Please let me know if this is a correct understanding or not.

@joergroedel
Copy link
Member

Just a quick question, will there be VE/VC caused by SVSM user mode in future? The reason I ask because for the VE/VC generated by the user mode, it looks like we still need to do the user page-table traversal. And as the user mode memory is not guaranteed to be contiguous, the user mode virtual address may cross pages, so also need a re-mapping.

User-mode will cause VE/VC exceptions, but there is no need to support IOIO/MMIO from user-space for now. If that ever changes the user-mode virtual addresses can still be accessed directly with a guard, there just need to be some extra checks (e.g. whether the decoded instruction only targets user memory and whether the exception reason matches the instruction).

As currently there is no VE/VC caused by SVSM user mode, as well as no need to decode/emulate for guest, I can remove the page-table traversal and remapping code from this PR, and leverage the GuestPtr to emulate VE/VC generated by the SVSM kernel only. Please let me know if this is a correct understanding or not.

You can leave it there for now, it will be needed for decoding guest instructions in the not too distant future. Just use another abstraction for VE/VC handlers.

@cxdong
Copy link
Contributor Author

cxdong commented Aug 23, 2024

You can leave it there for now, it will be needed for decoding guest instructions in the not too distant future. Just use another abstraction for VE/VC handlers.

Sure. Thanks for the suggestion! I will refactor for the VE/VC handlers.

@cxdong
Copy link
Contributor Author

cxdong commented Aug 26, 2024

Hi @joergroedel, this PR has been rebased and updated with adding a simpler abstraction for VC/VE handler. The PTE attribute checking for the page table traversal and the guard for memory re-mapping are left there, and the virtual address translation in PageTable are removed from this PR. The considerations are:

  1. For guest, we need guest page table traversal, but seems cannot simply reuse the PageTable to represent a guest page table.
  2. To emulate MMIO VC/VE generated by COCONUT-SVSM kernel, decoder will require translating a MMIO virtual address to the MMIO physical address, to do the MMIO emulation accordingly. But if the MMIO will be mapped in the page table like memory, probably the translating is just a virt_to_phys() for the kernel MMIO VC/VE.

So the virtual address translation code in PageTable is removed in this update. Please let me know if you have any concern. Thanks!

@joergroedel joergroedel added the wait-for-review PR needs for approval by reviewers label Aug 27, 2024
Copy link
Collaborator

@p4zuu p4zuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few new comments, otherwise this looks good to me :)

kernel/src/insn_decode/decode.rs Show resolved Hide resolved
kernel/src/insn_decode/decode.rs Outdated Show resolved Hide resolved
kernel/src/insn_decode/decode.rs Outdated Show resolved Hide resolved
kernel/src/mm/ptguards.rs Outdated Show resolved Hide resolved
kernel/src/mm/ptguards.rs Outdated Show resolved Hide resolved
Add instruction decoding logic to support for the outs instruction. The
emulation of outs is achieved by reading data from the decoded memory
location and writing it to an IO port.

The InsnMachineCtx has been expanded with new methods to facilitate
reading and writing CPU registers, accessing CPU flags, determining the
current privilege level, handling IO out operations, and mapping memory
via linear addresses. Additionally, the TestCtx has been augmented to
provide read_reg and cpl methods, ensuring compatibility with fuzz
testing.

Introduce new trait InsnMachineMem to support memory access. This trait
includes the read_integer() method, which is essential for reading data
from memory when emulating the outs instruction.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add instruction decoding logic to support for the ins instruction. The
emulation of ins is achieved by reading data from an IO port and then
writing it to the decoded memory location.

Extend InsnMachineCtx to include a new method to manage IO in operations.
Additionally, the InsnMachineMem trait has been extended with a method
write_integer for writing integer data to memory, which is crucial for
the accurate emulation of the ins instruction.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
DecodeInsnCtx has been enhanced to support the emulation of in and out
instructions. The in instruction is emulated by fetching data from the
IO port and writing it to the EAX/AX/AL register, while the out
instruction is emulated by reading data from the EAX/AX/AL register and
sending it to the IO port. Prior to emulation, the IO permission bitmap
is consulted to ensure proper access rights.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Enhance the TestCtx to implement the required methods of InsnMachineCtx.
Also introduce TestMem structure which implements InsnMachineMem trait.
With these, extend the test to support testing decoding of ins and outs
instructions, as well as emulation for ins/outs/in/out instructions.
These new tests ensure support for byte, word and doubleword IO port
size.

The instruction fuzz is also enhanced to test the emulate interface.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
X86ExceptionContext captures the general purpose registers, flags and CS
values when exception occured, which can be used to implement the
InsnMachineCtx trait method read/write_reg, read_flags and read_cpl.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Whether an access is permitted by a translation is determined by the
access rights specified by the paging-structure entries controlling the
translation: paging-mode modifiers in CR0, CR4, and the IA32_EFER MSR;
EFLAGS.AC; the supervisor/user mode access.

Add PTWalkAttr to include all these factors and implement function
check_access_rights() to validate the access rights of a pte. A Ok
containing the translated physical address and a bool flag representing
if the translated PTEntry is leaf or not will be returned if it passes
the validation, otherwise returns Err with the corresponding page fault
error code flags.

The access rights validation will be used when traversals a page-table
to translate a virtual address.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Introduce MemMappingGuard to represent a temporary memory mapping. It
contains a PerCPUPageMappingGuard, which is extended to remap multiple
non-contigous 4k pages to a contigous virtual address space. The
MemMappingGuard provides read and write that facilitate a specific data
type access within the remapped memory using a specified offset. When a
MemMappingGuard instance is dropped, the PerCPUPageMappingGuard is also
dropped which will unmap the associated memory range.

Additionally, the InsnMachineMem trait has been implemented for
MemMappingGuard, enabling it to support memory access operations
required by the instruction decoder. This is essential for the emulation
of specific instructions.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
To handle the instruction emulation for the #VC/#VE happens in the
COCONUT-SVSM kernel, implementing the map_linear_addr method for
X86ExceptionContext to convert the linear address to the COCONUT-SVSM
kernel virtual address by using GuestPtr, which implements the
InsnMachineMem.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
NativeIOPort leverages the default implementations of outl and inl
provided by the IOPort trait, which utilize the outl/inl assembly
instructions. Conversely, SVSMIOPort provides custom implementations of
outl and inl that interact with the GHCB interfaces.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
The ioio_perm() is implemented by returning false to indicate that the
user mode is not allowed to access any IO port. The ioio_in() and
ioio_out() are implemented by calling the relevant in/out interfaces
provided by the console IO of the SVSM_PLATFORM.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
The DecodedInsnCtx support emulating IOIO instructions, including
in/out/ins/outs, which can be utilized by the SVSM #VC handler to support
handling all the IOIO instructions.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
With the #VC handler now supporting ins/outs instructions, the test case
test_port_io_string_16_get_last() can be enabled. The test case is also
extended to test rep insw by reading data from the test port and writing
to the memory.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Copy link
Member

@joergroedel joergroedel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@joergroedel joergroedel merged commit 13d4a0b into coconut-svsm:main Sep 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wait-for-review PR needs for approval by reviewers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants