Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

core/exception: use less TLS storage for staticError #3802

Closed
wants to merge 1 commit into from

Conversation

ljmf00
Copy link
Member

@ljmf00 ljmf00 commented Apr 17, 2022

Since classInstanceSize trait is available, we can now know, at compile-time,
the size needed for the TLS storage used in staticError template. This should
reduce the allocated size in TLS significantly. On 64-bit it results on about
221 bytes and in 32-bit about 135 bytes.

Signed-off-by: Luís Ferreira contact@lsferreira.net

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ljmf00!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3802"

@ljmf00 ljmf00 added the Refactoring No semantic changes label Apr 17, 2022
@ljmf00 ljmf00 marked this pull request as draft April 17, 2022 22:24
@ljmf00 ljmf00 force-pushed the use-less-tls-storage branch 6 times, most recently from f1ff6c2 to 874dd53 Compare April 18, 2022 00:07
@ljmf00 ljmf00 marked this pull request as ready for review April 18, 2022 00:07
@ljmf00
Copy link
Member Author

ljmf00 commented Apr 18, 2022

Weird, this is side-effecting on src/core/lifetime.d(1187) unittest only on macOS. Perhaps the alignment of the TLS there is not correct? I also don't understand the current alignment applied to _store. Shouldn't it be the maximum alignment of void* and .tupleof of the Error derivatives?

I'm pinging since you made this (681e3e7) change @kinke .

@kinke
Copy link
Contributor

kinke commented Apr 21, 2022

Shouldn't it be the maximum alignment of void* and .tupleof of the Error derivatives?

It should be the maximum of all class instance alignments, for which there's no trait yet (classInstanceAlignment; the TypeInfo would provide it IIRC). The double-machine-word-alignment is an approximation.

src/core/exception.d Outdated Show resolved Hide resolved
@ljmf00 ljmf00 force-pushed the use-less-tls-storage branch from 874dd53 to d9bc37f Compare April 28, 2022 16:53
@ljmf00
Copy link
Member Author

ljmf00 commented Apr 28, 2022

Shouldn't it be the maximum alignment of void* and .tupleof of the Error derivatives?

It should be the maximum of all class instance alignments, for which there's no trait yet (classInstanceAlignment; the TypeInfo would provide it IIRC). The double-machine-word-alignment is an approximation.

That trait is available on phobos. I'm going try to add the correct alignment.

@kinke
Copy link
Contributor

kinke commented Apr 28, 2022

That trait is available on phobos.

I very much doubt it's correct. align(64) class C {}, that's not even respected by the GC…

I'm going try to add the correct alignment.

A unittest would IMO perfectly suffice and be simpler (just using the TypeInfos).

@ljmf00 ljmf00 force-pushed the use-less-tls-storage branch from d9bc37f to 1ed5018 Compare April 28, 2022 17:38
@ljmf00 ljmf00 marked this pull request as draft April 28, 2022 17:39
@ljmf00
Copy link
Member Author

ljmf00 commented Apr 28, 2022

That trait is available on phobos.

I very much doubt it's correct. align(64) class C {}, that's not even respected by the GC…

Right, that is considering that the align is the default. In this case, it is. For the rest of the cases, I think we can't do it right, with a library implementation.

A unittest would IMO perfectly suffice and be simpler (just using the TypeInfos).

I don't get it. The problem here is that I can't use TypeInfo in compile-time. Or do you mean at the emplace? I just discovered that TypeInfo_Class doesn't give the correct alignment anyway.

@ljmf00 ljmf00 marked this pull request as ready for review April 28, 2022 20:16
@kinke
Copy link
Contributor

kinke commented Apr 28, 2022

I just discovered that TypeInfo_Class doesn't give the correct alignment anyway.

Ouch, okay. [A runtime unittest would otherwise have been perfectly fine IMO, just verifying that no class has a greater alignment.]

@ljmf00
Copy link
Member Author

ljmf00 commented Apr 28, 2022

I just discovered that TypeInfo_Class doesn't give the correct alignment anyway.

Ouch, okay. [A runtime unittest would otherwise have been perfectly fine IMO, just verifying that no class has a greater alignment.]

Ah for sanity checking, I understand. Well, I'll put https://issues.dlang.org/show_bug.cgi?id=16508 on my TODO list. I will also fill a bug report about the TypeInfo_Class.talign.

Apart from it, since align(value) is not being used by any class extending Error, it should be fine. I can leave a note about this on the code to fix later. I made something similar to std.typecons.scoped payload.

@kinke
Copy link
Contributor

kinke commented Apr 29, 2022

I will also fill a bug report about the TypeInfo_Class.talign.

That's surely intended this way, just like C.alignof is machine-word-size for the class reference. I just thought the instance alignment would be exposed by another TypeInfo_Class member, but was mistaken. Anyway, we'll probably need a __traits(classInstanceAlignment) at some point.

// TLS storage shared for all errors, chaining might create circular reference
private align(2 * size_t.sizeof) void[256] _store;
private void[alignedStoreSize()] _store;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing proper alignment by increasing the allocation size? This would be a regression for LDC at least.

The problem surely isn't the approximation - for x86_64, it's 16, what the GC guarantees. So if you hit a unittest regression somewhere else after optimizing the TLS size here with DMD, that's probably a sign that another allocation hasn't been properly aligned and just happened to work because of the old round 256 size (or a DMD backend bug).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[And AFAICT, you're still using _store.ptr directly, no manual alignment.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing proper alignment by increasing the allocation size? This would be a regression for LDC at least.

The problem surely isn't the approximation - for x86_64, it's 16, what the GC guarantees. So if you hit a unittest regression somewhere else after optimizing the TLS size here with DMD, that's probably a sign that another allocation hasn't been properly aligned and just happened to work because of the old round 256 size (or a DMD backend bug).

This is pretty much the same as this: https://github.com/ldc-developers/phobos/blob/ldc/std/typecons.d#L8165 . This is doing the alignment by hand.

I don't see the problem with this, as is now. And this is trusted to do, unless a custom alignment is done on one of those classes. And to solve it, we need the trait you mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that line is 10 years old. Back then, align(N) might have been buggy. There shouldn't be any reason for manual alignments anymore if the alignment is known at compile-time, as in these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought void[] had some special case. I'll change it then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I forgot to change on the emplace side 🤦 . I think I found the side effect. The weirdest part is that it was only failing on macOS, so we probably got lucky all this time with some other introduced allocation on TLS in the other platforms.

I think now things align in my brain too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. It uses the class ref alignment, but 8 should still be fine for that little class; so it might be a DMD backend bug after all.

Yup, probably, given that A only have 8-byte aligned memory. I need to dig a bit more to spot the issue, but probably someone who has macOS would be more helpful to reproduce this in a sandbox with a minimal example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig a bit more to spot the issue, but probably someone who has macOS would be more helpful to reproduce this in a sandbox with a minimal example.

I've just tried to cross-compile with DMD, and then using llvm-objdump to inspect the Mach-O object file, for a trivial:

module mod;

int a;
align(64) int b;
$ dmd -betterC -c testalign.d -target=x86_64-apple-darwin
$ llvm-objdump --section-headers testalign.o

testalign.o:	file format Mach-O 64-bit x86-64

Sections:
Idx Name          Size     VMA              Type
  0 __text        00000000 0000000000000000 TEXT
  1 __data        00000000 0000000000000000 DATA
  2 __const       00000000 0000000000000000 DATA
  3 __bss         00000000 0000000000000030 BSS
  4 __const       00000000 0000000000000000 DATA
  5 __thread_bss  00000008 0000000000000030 DATA
  6 __thread_vars 00000030 0000000000000000 DATA

With LDC however:

  1 __thread_bss  00000044 0000000000000040 DATA

So with DMD, the __thread_bss section is only 8 bytes large, and the VMA only 16-bytes aligned. With LDC, the size is 68 bytes, and the VMA is 64-bytes aligned.

With DMD for Linux x64, same 68 bytes size:

 14 .tbss           00000044 0000000000000000 BSS

Copy link
Member Author

@ljmf00 ljmf00 May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So aligned TLS on macOS is clearly not working 🙃 . Maybe alignment in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljmf00
Copy link
Member Author

ljmf00 commented Apr 29, 2022

I will also fill a bug report about the TypeInfo_Class.talign.

That's surely intended this way, just like C.alignof is machine-word-size for the class reference. I just thought the instance alignment would be exposed by another TypeInfo_Class member, but was mistaken. Anyway, we'll probably need a __traits(classInstanceAlignment) at some point.

The same way TypeInfo_Class.initializer() is the class instance initializer, I though, but by that logic, .talign would be the alignment of the instance, not the reference.

@ljmf00 ljmf00 force-pushed the use-less-tls-storage branch from 1ed5018 to 9e5cd56 Compare April 29, 2022 00:33
Since classInstanceSize trait is available, we can now know, at compile-time,
the size needed for the TLS storage used in staticError template. This should
reduce the allocated size in TLS significantly. On 64-bit it results on about
221 bytes and in 32-bit about 135 bytes.

This patch also fixes interdependent missalignment on core.lifetime
unittests.

Signed-off-by: Luís Ferreira <contact@lsferreira.net>
@ljmf00 ljmf00 force-pushed the use-less-tls-storage branch from 9e5cd56 to da29f9e Compare April 29, 2022 01:34
@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work labels May 14, 2022
@Geod24
Copy link
Member

Geod24 commented Jul 9, 2022

Druntime have been merged into DMD. Please re-submit your PR to dlang/dmd repository.

@Geod24 Geod24 closed this Jul 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Needs Rebase needs a `git rebase` performed Needs Work Refactoring No semantic changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants