Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compare class name signatures #16032

Merged
merged 1 commit into from
Jan 16, 2024
Merged

Conversation

WalterBright
Copy link
Member

This speeds up class name matches by comparing name md5 signatures rather than doing a string compare, as names can get really long. It matches names with only two compares. It also matches names across shared libraries and DLLs.

Since it also changes the layout of ClassInfo, you'll need to update library builds.

(md5 is also used to merge identical strings across compilation units.)

@WalterBright WalterBright added Review:Easy Review Review:Industry Applies to PRs pertaining to industry applications of D labels Jan 14, 2024
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#16032"

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a changlog entry.

also cc @ibuclaw @kinke

compiler/src/dmd/target.d Show resolved Hide resolved
compiler/src/dmd/target.d Show resolved Hide resolved
@WalterBright
Copy link
Member Author

added changelog entry

Comment on lines +1668 to 1671
ulong[2] nameSig; /// unique signature for `name`

immutable(void)* m_RTInfo; // data for precise GC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New fields should only ever be added at the end for strict abi compatibility. Might be a good idea to add a new class flag too hasNameSig = 0x200.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole reason for nameSig is performance. Having to test a flag for it won't kill the performance, but it will degrade it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole reason for nameSig is performance. Having to test a flag for it won't kill the performance, but it will degrade it.

It'll still be better than string comparison, but it at least gives a migration/fallback in case someone forgot to recompile all their code. Shouldn't need to keep it in for too many releases.

Copy link
Member

@ibuclaw ibuclaw Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I won't be the one explaining to someone why their project crashed on the NG. You've already explained the risk in the changelog, I'll leave it to you whether we also try avoid any known breakage in the runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the last field moving, it's only supposed to be accessed through the member function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that function do something sneaky such that it won't link if linked with the wrong ABI? Could one inject a symbol reference after the return statement or something devious like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way I think the risk of this is very low, (say) dub will rebuild per compiler version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea that last field was variadic - I was under the impression that it was just 0x0 or 0x1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked this, and it's a pointer to a symbol, not variadic. So there's no blocking reason to have it at the end.

@kinke
Copy link
Contributor

kinke commented Jan 14, 2024

Are we confident we don't need to worry about hash collisions for different names?

// take care of potential duplicates across binaries
return a.name == b.name;
// same class if signatures match, works with potential duplicates across binaries
return a.m_flags & TypeInfo_Class.ClassFlags.hasNameSig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be cleaner to do the old way in an if statement that returns early.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new way requires two comparisons max. The old way requires one and then a string comparison. I want to get rid of the old way, but @ibuclaw suggested keeping it for a release or two.

The reason for the change is deadalnix has identified it as serious performance bottleneck.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gathered (I've discussed this amaury in person), it just reads nicer to do not do the whole shebang in the same expression (this is my main gripe with a lot of DMD, too many subexpressions your majesty!)

@maxhaton
Copy link
Member

FWIW I haven't seen many if any people knowingly relying on ABI compatibility in the wild. I certainly haven't ever deliberately at least.

pointers with a fallback to doing a string compare on the name, which can
be rather slow.

The result is both druntime and phobos will need to be recompiled to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably misleading if a user sees this in the changelog of a release

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how it's misleading. What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that someone may see this as a recommendation that they have to do something to druntime or similar, which won't be true because they're using said release.

So something more explicit about where any breakage would actually come about would better, that and that the size has changed.

@maxhaton
Copy link
Member

Are we confident we don't need to worry about hash collisions for different names?

Probabilistically I think we're fine — there are better hashes if one cares, but the shortest md5 collisions I'm aware of are pretty big

@WalterBright
Copy link
Member Author

Are we confident we don't need to worry about hash collisions for different names?

https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions

@WalterBright
Copy link
Member Author

BTW, dmd uses md5 to merge identical string literals at link time, and has for many years.

@maxhaton
Copy link
Member

BTW, dmd uses md5 to merge identical string literals at link time, and has for many years.

Yes, I find it very unlikely we'd see one unless intentional

@maxhaton
Copy link
Member

LGTM other than minor style point. Confirmation from LDC/GDC then merge, one presumes.

@WalterBright WalterBright merged commit d91e035 into dlang:master Jan 16, 2024
45 of 46 checks passed
@WalterBright WalterBright deleted the nameSig branch January 16, 2024 06:27
(a.m_flags & TypeInfo_Class.ClassFlags.hasNameSig
? (a.nameSig[0] == b.nameSig[0] &&
a.nameSig[1] == b.nameSig[1]) // new fast way
: (a is b || a.name == b.name)); // old slow way for temporary binary compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a is b will always be false.

@benjones
Copy link
Contributor

Is it worth switching to BLAKE3 (no known vulnerabilities, faster)? https://github.com/BLAKE3-team/BLAKE3

@ibuclaw
Copy link
Member

ibuclaw commented Jan 20, 2024

@WalterBright did you forget to set hasNameSig from the compiler?

@ibuclaw
Copy link
Member

ibuclaw commented Jan 20, 2024

#16061.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review:Easy Review Review:Industry Applies to PRs pertaining to industry applications of D
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants