Skip to content

[lldb] Fix ObjectFileMachO object format when missing version load commands #144177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented Jun 14, 2025

Context: See previous attempts: #142704, #143633

Problem

When ObjectFileMachO parses a Mach-O file (including yaml data in lldb unit tests) which doesn't have load commands like LC_BUILD_VERSION or LC_VERSION_MIN_*, its object format is mistakenly reported as ELF. Had the load commands existed, they would have caused ObjectFileMachO to set the correct OS name into the Triple (here and here), which would have led to a correct default MachO object format in getDefaultFormat() (here).

Alternative approaches considered

  1. Previously, [lldb] Set default object format to MachO in ObjectFileMachO #142704 tries to fix this by setting the correct object format in ObjectFileMachO::GetAllArchSpecs() for all input files. However, such wide range of explicit assignment (even when the files have the load commands) led the Triple to report "-macho" in its string form, causing change in UI and a test failure which depends on the UI.
  2. [lldb] Fix object format of some mach-o files by using vendor info in getDefaultFormat() #143633 updates getDefaultFormat(), so that it will return MachO when the OS is not set and the vendor is Apple. However, it's a change to one of the foundational llvm classes - it's not clear if such impact is worth it.
  3. Another approach is to reject such Mach-O files outright. However, there is no a Mach-O spec which states that the load commands are required, plus it is convenient to be able to leave them out when writing tests.

This patch

This patch tries to strike a good balance of the above:

  1. Set the correct object format in ObjectFileMachO::GetAllArchSpecs(), but only when the load commands do not exist. Compared to [lldb] Set default object format to MachO in ObjectFileMachO #142704, this patch minimizes the impacted cases (should be minimal).
  2. Fix any existing tests that are broken by this change (should be minimal), by adding the load commands.
  3. This patch shouldn't change anything for files/tests which already have the load commands.

Test

Ran ninja check-lldb on Linux and aarch64 macOS. Seems no tests were broken. See paste.

@royitaqi royitaqi requested a review from JDevlieghere as a code owner June 14, 2025 01:30
@llvmbot llvmbot added the lldb label Jun 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2025

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

Context: See previous attempts: #142704, #143633

Problem: When ObjectFileMachO parses a Mach-O file (including yaml data in lldb unit tests) which doesn't have load commands like LC_BUILD_VERSION or LC_VERSION_MIN_*, its object format is mistakenly reported as ELF. Had the load commands existed, they would have caused ObjectFileMachO to set the correct OS name into the Triple (here and here), which would have led to a correct default MachO object format in getDefaultFormat() (here).

Alternative approaches considered:

  1. Previously, #142704 tries to fix this by setting the correct object format in ObjectFileMachO::GetAllArchSpecs() for all input files. However, such wide range of explicit assignment (even when the files have the load commands) led the Triple to report "-macho" in its string form, causing change in UI and a test failure which depends on the UI.
  2. #143633 updates getDefaultFormat(), so that it will return MachO when the OS is not set and the vendor is Apple. However, it's a change to one of the foundational llvm classes - it's not clear if such impact is worth it.
  3. Another approach is to reject such Mach-O files outright. However, there is no a Mach-O spec which states that the load commands are required, plus it is convenient to be able to leave them out when writing tests.

This patch tries to strike a good balance of the above:

  1. Set the correct object format in ObjectFileMachO::GetAllArchSpecs(), but only when the load commands do not exist. Compared to #142704, this patch minimizes the cases where the Triple string will change.
  2. Fix any existing tests that are broken by this change (should be minimal), by adding the load commands.
  3. This patch shouldn't change anything for files/tests which already have the load commands.

Full diff: https://github.com/llvm/llvm-project/pull/144177.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+4)
  • (modified) lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp (+124)
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 3950454b7c90e..98bd3ea9dcfe9 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -5255,6 +5255,10 @@ void ObjectFileMachO::GetAllArchSpecs(const llvm::MachO::mach_header &header,
   }
 
   if (!found_any) {
+    // Explicitly set the object format to Mach-O if no LC_BUILD_VERSION or
+    // LC_VERSION_MIN_* are found. Without this, the object format will default
+    // to ELF (see `getDefaultFormat()` in `Triple.cpp`), which is wrong.
+    base_triple.setObjectFormat(llvm::Triple::MachO);
     add_triple(base_triple);
   }
 }
diff --git a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
index 0ef2d0b85fd36..dc1bfa86b0abf 100644
--- a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
+++ b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
@@ -94,4 +94,128 @@ TEST_F(ObjectFileMachOTest, IndirectSymbolsInTheSharedCache) {
   for (size_t i = 0; i < 10; i++)
     OF->ParseSymtab(symtab);
 }
+
+TEST_F(ObjectFileMachOTest, ObjectFormatWithVersionLoadCommand) {
+  // A Mach-O file of arm64 CPU type and macOS 10.14.0
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x0100000C
+  cpusubtype:      0x00000000
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      176
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_BUILD_VERSION
+    cmdsize:         24
+    platform:        1
+    minos:           658944
+    sdk:             658944
+    ntools:          0
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         208
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          208
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  // Perform setup.
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+
+  // Verify that the object file is recognized as Mach-O.
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().getObjectFormat(),
+            llvm::Triple::MachO);
+
+  // Verify that the triple string does NOT contain "-macho".
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().str(),
+            "arm64-apple-macosx10.14.0");
+}
+
+TEST_F(ObjectFileMachOTest, ObjectFormatWithoutVersionLoadCommand) {
+  // A Mach-O file of arm64 CPU type, without load command LC_BUILD_VERSION.
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x0100000C
+  cpusubtype:      0x00000000
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      152
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         184
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          184
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  // Perform setup.
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+
+  // Verify that the object file is recognized as Mach-O.
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().getObjectFormat(),
+            llvm::Triple::MachO);
+
+  // Verify that the triple string contains "-macho".
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().str(),
+            "arm64-apple--macho");
+}
 #endif

@royitaqi royitaqi changed the title [lldb] Set ObjectFileMachO to MachO when no version load commands are found [lldb] Set object format to MachO when no version load commands are found (3rd) Jun 14, 2025
@royitaqi royitaqi changed the title [lldb] Set object format to MachO when no version load commands are found (3rd) [lldb] Set object format to MachO when no version load commands are found Jun 14, 2025
@royitaqi royitaqi changed the title [lldb] Set object format to MachO when no version load commands are found [lldb] Set object format to MachO when no version load commands Jun 14, 2025
@royitaqi royitaqi changed the title [lldb] Set object format to MachO when no version load commands [lldb] Fix ObjectFileMachO object format when no version load commands Jun 14, 2025
@royitaqi royitaqi changed the title [lldb] Fix ObjectFileMachO object format when no version load commands [lldb] Fix ObjectFileMachO object format when missing version load commands Jun 14, 2025
@royitaqi royitaqi requested review from dmpots, clayborg, jeffreytan81 and jasonmolenda and removed request for JDevlieghere June 14, 2025 03:28
@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 14, 2025

Hi @jasonmolenda ,

What do you think about this patch?

Also, if/when you have the time, could you kindly help me by running ninja check-lldb on a x86_64 macOS machine and see if this breaks anything? I have checked that nothing is broken on Linux and aarch64 macOS.

Thanks,
Roy

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Had one question about the strictness of a test.


// Verify that the triple string contains "-macho".
ASSERT_EQ(object_file->GetArchitecture().GetTriple().str(),
"arm64-apple--macho");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should check the triple string here. The check in the other test is good because it guards against a behavior that we do not want (having -macho appended to the triple string).

But in this case I don't think we really care how the triple is printed as long as the getObjectFormat returns the correct value. We are not depending on a specific format here so if it changes in the future this test would fail unnecessarily.

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one early return in this function:

https://github.com/llvm/llvm-project/pull/144177/files#diff-1c2c090c5ff772b9b304dbdd4cf45803478f4f33d11d042817fea7358e0cc96bL5167

I wonder if we need to handle this case as well.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should get the object file format correct from a triple extracted from a LLDB ObjectFile at all times. In the mach-o parser we know it is a mach-o file, so we really should always set this IMHO to be as correct as possible and possibly fix the LLDB tests that fail due to the extra mach-o or make these tests not look for a complete triple that includes the mach-o extension. How many tests are we talking about?

@JDevlieghere @jimingham thoughts?

@jasonmolenda
Copy link
Collaborator

I don't want to go with this approach, I don't want to see -macho in a triple that we're creating for any of these modules; this is going to break things down the line with firmware environments, I'm sure of it. It's also ugly.

This is an undesired behavior from llvm's Triple, because we create a Triple in ArchSpec::SetArchitecture with only a cpu specified, and llvm's Triple getDefaultFormat() is called when constructing the Triple, and assumes that an OS is defined. If no OS is specified, it sets the object file format to Triple::ELF by default. This is clearly wrong, it should be UnknownObjectFormat. If we later call Triple::setObjectFormat to specify the correct object file format, Triple adds "-macho" to the triple unconditionally.

My guess is that other parts of llvm depend on the object file format being set to Triple::ELF when there is an unspecified OS. I wouldn't be surprised to see regressions if getDefaultFormat() changed to return UnknownObjectFormat when no OS is specified, as it really should be. And then if they got "-elf" added to triple strings when they called Triple::setObjectFormat they would also not want that behavior.

I don't think any of these approaches to fixing this in ObjectFileMachO / ArchSpec are the right approach. llvm's Triple needs to have a way to set the object file format without changing the triple string, it needs to not set the object file format when no OS is specified, or we need to live with a Triple having an incorrect ObjectFormat in lldb.

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 16, 2025

Hi @jasonmolenda,

I don't want to see -macho in a triple that we're creating for any of these modules; this is going to break things down the line with firmware environments, I'm sure of it. It's also ugly.

For context, could you elaborate where/why are we relying on the triple string in production? E.g. I don't understand the "firmware environments" part.

FWIW, I think this is me not fully understanding one of your previous comment ("you could definitely see something like this with a standalone firmware/kernel Mach-O binary").

--

llvm's Triple needs to have a way to set the object file format without changing the triple string

Is there any case at all where we want the "object format" component to be displayed in the triple string? If not, can we just remove the object format component from the triple string once and for all? (no matter whether the object format is set explicitly or get from getDefaultFormat())

Thanks,
Roy

@clayborg
Copy link
Collaborator

One idea is to add a new string method on the llvm::Triple class:

  const std::string strNoObjectFormat() const;

And this would check if the object format is in the string in llvm::Triple::Data and if so strip it and return it, else return the llvm::Triple::Data as is. Then we switch LLDB to use the llvm::Triple::strNoObjectFormat() function everywhere.

@jasonmolenda
Copy link
Collaborator

Hi @jasonmolenda,

I don't want to see -macho in a triple that we're creating for any of these modules; this is going to break things down the line with firmware environments, I'm sure of it. It's also ugly.

For context, could you elaborate where/why are we relying on the triple string in production? E.g. I don't understand the "firmware environments" part.

We've found that changes to triples have far reaching impact and fallout, over the years, and most of us who have made these changes have learned to be very wary of making them unless necessary because of the fragility in all the different environments we support. Our tests are not very comprehensive for non-user land process debugging -- kernel debugging, firmware debugging, bare board debugging.

From the SB API layer, the only thing we vend are triple strings, there is no SBArchSpec object that encompasses the separate features in the Triple.

Is there any case at all where we want the "object format" component to be displayed in the triple string? If not, can we just remove the object format component from the triple string once and for all? (no matter whether the object format is set explicitly or get from getDefaultFormat())

Personally, no, I don't see why you'd want to include the object file format in the triple string. But I don't know why that behavior was added to llvm's Triple class in the beginning so there may be a use case where it is important.

@jasonmolenda
Copy link
Collaborator

One idea is to add a new string method on the llvm::Triple class:

  const std::string strNoObjectFormat() const;

And this would check if the object format is in the string in llvm::Triple::Data and if so strip it and return it, else return the llvm::Triple::Data as is. Then we switch LLDB to use the llvm::Triple::strNoObjectFormat() function everywhere.

A Triple::setObjectFormatNoStringUpdate or whatever would be another way to approach this. Instead of stripping the object file name out of the triple string when returning it, have a way to specify/change the object format without any update to the triple string. We've been using Triples with ObjectFile set to ELF for ever and never had that show up in the triple string, I don't worry about it leaking in there unintentionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants