Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BadImageFormatException when loading valid assembly #63639

Closed
nickbclifford opened this issue Jan 11, 2022 · 14 comments
Closed

BadImageFormatException when loading valid assembly #63639

nickbclifford opened this issue Jan 11, 2022 · 14 comments
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner

Comments

@nickbclifford
Copy link

Description

I am currently writing an experimental .NET compiler framework, and I've reached the point where I can write full ECMA-335-compliant DLLs to disk. However, when attempting to run my output with dotnet, I receive the following error:

$ dotnet test.dll

Unhandled exception. System.BadImageFormatException: Could not load file or assembly '/home/nick/Desktop/dotnetdll/test.dll'. An attempt was made to load a program with an incorrect format.

File name: '/home/nick/Desktop/dotnetdll/test.dll'
fish: Job 1, 'dotnet test.dll' terminated by signal SIGABRT (Abort)

Frustratingly, this provides no meaningful error message as to what the incorrect format actually is. Enabling COREHOST_TRACE adds no other useful information.

I've run this DLL through ildasm, dotnet-ilverify, and even mono, all of which have worked without issues. Any advice would be greatly appreciated!

Reproduction Steps

  1. Ensure Rust and Cargo are both installed.
  2. Clone dotnetdll and enter the directory.
  3. Run cargo test write_all to generate the test.dll file.
  4. Add a test.runtimeconfig.json. I used the following, based on the official template:
{
  "runtimeOptions": {
    "tfm": "net6.0",
    "framework": {
      "name": "Microsoft.NETCore.App",
      "version": "6.0.0"
    },
    "configProperties": {
      "System.GC.Concurrent": false,
      "System.Threading.ThreadPool.MinThreads": 4,
      "System.Threading.ThreadPool.MaxThreads": 25
    }
  }
}
  1. Run dotnet test.dll.

Expected behavior

The assembly should be run without issues and print Hello, world! to standard output.

Actual behavior

The assembly fails to load with the exception shown above, and dotnet exits with error code 134.

Regression?

No response

Known Workarounds

No response

Configuration

Output from dotnet --info:

.NET SDK (reflecting any global.json):
 Version:   6.0.100
 Commit:    9e8b04bbff

Runtime Environment:
 OS Name:     arch
 OS Version:  
 OS Platform: Linux
 RID:         arch-x64
 Base Path:   /usr/share/dotnet/sdk/6.0.100/

Host (useful for support):
  Version: 6.0.0
  Commit:  4822e3c3aa

.NET SDKs installed:
  6.0.100 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.NETCore.App 6.0.0 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET runtimes or SDKs:
  https://aka.ms/dotnet-download

Other information

The Rust function generating the file is available here.

The output from ildasm:

//  Microsoft (R) .NET IL Disassembler.  Version 7.0.0-dev



// Metadata version: Standard CLI 2005
.assembly extern System.Console
{
  .publickeytoken = (B0 3F 5F 7F 11 D5 0A 3A )                         // .?_....:
  .ver 6:0:0:0
}
.assembly extern System.Runtime
{
  .publickeytoken = (B0 3F 5F 7F 11 D5 0A 3A )                         // .?_....:
  .ver 6:0:0:0
}
.assembly test
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}
.module test.dll
// MVID: {cd02ca7d-d1ba-454e-bf5f-1b7df193ce36}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003       // WINDOWS_CUI
.corflags 0x00000001    //  ILONLY
// Image base: 0x00007F4A7CAD9000


// =============== CLASS MEMBERS DECLARATION ===================

.class public auto ansi beforefieldinit Foo
       extends [System.Runtime]System.Object
{
  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [System.Runtime]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Foo::.ctor

  .method public hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       11 (0xb)
    .maxstack  8
    IL_0000:  ldstr      "Hello, world!"
    IL_0005:  call       void [System.Console]System.Console::WriteLine(string)
    IL_000a:  ret
  } // end of method Foo::Main

} // end of class Foo


// =============================================================

// *********** DISASSEMBLY COMPLETE ***********************
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 11, 2022
@ghost
Copy link

ghost commented Jan 11, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I am currently writing an experimental .NET compiler framework, and I've reached the point where I can write full ECMA-335-compliant DLLs to disk. However, when attempting to run my output with dotnet, I receive the following error:

$ dotnet test.dll

Unhandled exception. System.BadImageFormatException: Could not load file or assembly '/home/nick/Desktop/dotnetdll/test.dll'. An attempt was made to load a program with an incorrect format.

File name: '/home/nick/Desktop/dotnetdll/test.dll'
fish: Job 1, 'dotnet test.dll' terminated by signal SIGABRT (Abort)

Frustratingly, this provides no meaningful error message as to what the incorrect format actually is. Enabling COREHOST_TRACE adds no other useful information.

I've run this DLL through ildasm, dotnet-ilverify, and even mono, all of which have worked without issues. Any advice would be greatly appreciated!

Reproduction Steps

  1. Ensure Rust and Cargo are both installed.
  2. Clone dotnetdll and enter the directory.
  3. Run cargo test write_all to generate the test.dll file.
  4. Add a test.runtimeconfig.json. I used the following, based on the official template:
{
  "runtimeOptions": {
    "tfm": "net6.0",
    "framework": {
      "name": "Microsoft.NETCore.App",
      "version": "6.0.0"
    },
    "configProperties": {
      "System.GC.Concurrent": false,
      "System.Threading.ThreadPool.MinThreads": 4,
      "System.Threading.ThreadPool.MaxThreads": 25
    }
  }
}
  1. Run dotnet test.dll.

Expected behavior

The assembly should be run without issues and print Hello, world! to standard output.

Actual behavior

The assembly fails to load with the exception shown above, and dotnet exits with error code 134.

Regression?

No response

Known Workarounds

No response

Configuration

Output from dotnet --info:

.NET SDK (reflecting any global.json):
 Version:   6.0.100
 Commit:    9e8b04bbff

Runtime Environment:
 OS Name:     arch
 OS Version:  
 OS Platform: Linux
 RID:         arch-x64
 Base Path:   /usr/share/dotnet/sdk/6.0.100/

Host (useful for support):
  Version: 6.0.0
  Commit:  4822e3c3aa

.NET SDKs installed:
  6.0.100 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.NETCore.App 6.0.0 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET runtimes or SDKs:
  https://aka.ms/dotnet-download

Other information

The Rust function generating the file is available here.

The output from ildasm:

//  Microsoft (R) .NET IL Disassembler.  Version 7.0.0-dev



// Metadata version: Standard CLI 2005
.assembly extern System.Console
{
  .publickeytoken = (B0 3F 5F 7F 11 D5 0A 3A )                         // .?_....:
  .ver 6:0:0:0
}
.assembly extern System.Runtime
{
  .publickeytoken = (B0 3F 5F 7F 11 D5 0A 3A )                         // .?_....:
  .ver 6:0:0:0
}
.assembly test
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}
.module test.dll
// MVID: {cd02ca7d-d1ba-454e-bf5f-1b7df193ce36}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003       // WINDOWS_CUI
.corflags 0x00000001    //  ILONLY
// Image base: 0x00007F4A7CAD9000


// =============== CLASS MEMBERS DECLARATION ===================

.class public auto ansi beforefieldinit Foo
       extends [System.Runtime]System.Object
{
  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [System.Runtime]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Foo::.ctor

  .method public hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       11 (0xb)
    .maxstack  8
    IL_0000:  ldstr      "Hello, world!"
    IL_0005:  call       void [System.Console]System.Console::WriteLine(string)
    IL_000a:  ret
  } // end of method Foo::Main

} // end of class Foo


// =============================================================

// *********** DISASSEMBLY COMPLETE ***********************
Author: nickbclifford
Assignees: -
Labels:

area-AssemblyLoader-coreclr, untriaged

Milestone: -

@agocke
Copy link
Member

agocke commented Jan 11, 2022

Have you done a byte-for-byte comparison with what csc produces here? There may be an encoding error that wouldn't show up in the verification tools or ildasm

@nickbclifford
Copy link
Author

Well, looking at the two side-by-side, there are definitely differences. However, the PE headers appear sane, and while I can't immediately spot any encoding mistakes by eye, the intent is not to mirror exactly what csc/dotnet build produces, but to create a standards-compliant object that can still be run.

Ideally, I would like to step into dotnet with a debugger to find where the BadImageFormatException is thrown and figure it out myself, but I haven't exactly figured that part out yet.

@agocke
Copy link
Member

agocke commented Jan 12, 2022

Yeah, debugging with coreclr is the next best option, I think. You should build with the debug configuration, and then you should be able to open the corerun exe in visual studio (choose open as file and select in the explorer, it will load it as a project) and then you can "step into" it and single-step through the load process.

@MichalStrehovsky
Copy link
Member

I can quickly run it under corerun if you can attach the DLL to the issue. The runtime will throw a C++ exception so the key is to have the runtime under the debugger in a way that you can break in when the exception is thrown. I know where the checkbox is in VS, but I also see you're running a Unix-y system.

If you want to try yourself, corerun instructions are here: https://github.com/dotnet/runtime/blob/f3c705ef291ff89b53220a31d8321355471d1937/docs/workflow/testing/using-corerun.md. Corerun doesn't need runtimeconfig.json; it picks up assemblies from paths described in the linked doc.

@nickbclifford
Copy link
Author

Sure, I've uploaded the DLL here. In the meantime, I'll try my best to get corerun debugging properly.

@MichalStrehovsky
Copy link
Member

There are two problems with this:

  • The section that contains imports (.idata in your DLL) cannot be writable.
  • The relocation for the bootstrapping thunk is expected to be REL_BASED_DIR64 on x64 (it's HIGHLOW in your case, which is expected on 32-bit platforms).

You can fix this by just not generating the import table, relocs, or the jump thunk. The thunk to jump to mscoree.dll!_CorExeMain is not mandatory, hasn't been used to bootstrap the runtime since ~Windows XP, and never bootstrapped .NET Core.

@nickbclifford
Copy link
Author

Interesting - I was just going off of standards text, so good to know those are not strictly required. However, even after removing those, I get the same BadImageFormatException from dotnet.

@MichalStrehovsky
Copy link
Member

Yeah, we should remove it from the standard, it's just unused junk at this point. ILAsm has a /NOCORSTUB switch for this.

Can you send the DLL with the fix? I skipped over those two checks with a debugger and the library loaded, but I didn't test anything else.

@nickbclifford
Copy link
Author

Sure, here's the fixed version. Confirmed in objdump to have no .idata, .reloc, or thunks.

@MichalStrehovsky
Copy link
Member

It works on Windows, doesn't on Linux. On Windows, PE files are loaded by the OS. On Linux, CoreCLR comes with its own PE loader. Hard to say what it's unhappy about. These loaders are very hardened against untrusted inputs (it was used in Silverlight) so they check things very thoroughly to avoid something surprising getting through.

I'm not set up to debug on Linux right now, so I'll have to leave it to you. Generally, you would build the repo, run this under corerun, under a debugger, and in the debugger (I assume gdb), you would do catch throw. It should get you fairly close to where the problem is, unless we were propagating HRESULTs and are converting them to an exception. If it's the HRESULT case, PEDecoder::FindCorHeader() might be a good spot to set up a breakpoint and trace through it. The runtime is going to load CoreLib first, so you can ignore the first hit.

@nickbclifford
Copy link
Author

Good news - I figured it out!

At first I couldn't get corerun to work properly, but it turns out I was running into #62398 and my system version of LTTng wasn't up-to-date, so I upgraded that manually and finally managed to get it debugging.

Once I started investigating, it turned out that even though I did remove the import table/thunks, at first I still had the optional PE header entry point RVA set to the thunk address...which was no longer there, of course. In the process of debugging after figuring that out, I had re-enabled the import table, reloc, and thunk generation, and as soon as I added your fixes from #63639 (comment), it worked!

Thank you so much! I imagine you don't get people building their own assemblies from scratch very often, so I really appreciate your patience and willingness to help.

@MichalStrehovsky
Copy link
Member

I'm glad you figured it out! Always happy to see people building new things on top of .NET!

@ghost ghost locked as resolved and limited conversation to collaborators Feb 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants