Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase MAX_SYMBOL_NAME_LENGTH #6534

Closed

Conversation

leechristensen
Copy link

Doubled the size of MAX_SYMBOL_NAME_LENGTH as I commonly encounter Windows C++ binaries with mangled names exceeding the limit, resulting in the "Symbol name exceeds maximum length" exception. I doubled the value for now, but would be open to doubling it again if we don't think it would have an adverse effect.

@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label May 17, 2024
@ghidra1
Copy link
Collaborator

ghidra1 commented May 17, 2024

The adverse affect comes into play based upon the way records are stored within the Ghidra database. If a record exceeds ~1/4 of our database buffer size (4068) bytes, the record will trigger the use of ChainedBuffer storage for each large record which will consume considerably more space. The minimum ChainedBuffer size is 16-KBytes which is a lot of storage for one symbol. The current limit is 2000 bytes which could probably be increased.

Could you please indicate the consequences you experience when such large names are truncated.

@leechristensen
Copy link
Author

Could you please indicate the consequences you experience when such large names are truncated.

Yes, the symbols do not get created automatically in Ghidra, resulting in functions not being auto-labeled and types not being complete. As an example, loading edgehtml.dll into Ghidra along with its PDB result in these exceptions.txt. The first exception there states:

Unable to create symbol at 180766288 due to exception: ghidra.util.exception.InvalidInputException: Symbol name exceeds maximum length of 2000, length=2480; symbolPathName: ??$_Insert_at@AEAU?$pair@QEAVCElement

Navigating to said location in Ghidra shows an undefined function:
image

Ida for comparison:
image

@ghidra1
Copy link
Collaborator

ghidra1 commented May 20, 2024

I found this statement online: Only the first 2048 characters of Microsoft C++ identifiers are significant.

At a minimum we should extend our limit to exceed this value and possibly truncate the name to preserve the first N chars and append "..." to inndicate that a trucation has occured. Such a truncated name will not demangle but shuold provide some information that is an improvement over a default label name.

It may be possible for PDB and DWARF to trigger demangling based on original string and not rely on truncated symbol name. Less certain how importer could handle this since they are not wired into analysis. Additional investigation required.

@ghidra1 ghidra1 added Reason: Internal effort This will be solved internally Feature: Symbol Table and removed Status: Triage Information is being gathered labels May 20, 2024
@ghidra1
Copy link
Collaborator

ghidra1 commented May 20, 2024

The PR will not be accepted, although we have an internal ticket to address this.

@ryanmkurtz ryanmkurtz closed this May 21, 2024
@leechristensen
Copy link
Author

Completely understandable given the current limitations. For an additional test case, recently opened Mso98win32client.dll from the Microsoft Office suite and hit 107 of the name length exceptions.

Thanks for the replies and all the context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants