Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] write references to mutable pointers in data and tls segment #6534

Merged
merged 7 commits into from
Feb 18, 2017

Conversation

rainers
Copy link
Member

@rainers rainers commented Feb 12, 2017

While preparing a new release of Visual D, I noticed that the (rebuilt) dmd version I used so far (2.066 with the precise GC from DConf 2013) generates corrupt DLLs. Instead of trying to figure the reason for this, I tried the precise GC from dlang/druntime#1603, but Visual D still suffers a lot when editing std.datetime with it: there are too many false pointers in the data segment.

With this PR I reimplemented a simple version of the additional info required to scan the data and tls segments precisely: instead of storing type information, it just lists addresses of all mutable pointers. This even makes usage of the heap-precise GC optional.

For example, the phobos unittest executable has a data segment of about 3 MB (and 11 kB TLS), but only 262 (and 282) mutable pointers. So this PR currently adds about 2kB to the data segment.

This PR only implements emission for Windows OMF and COFF, but should be portable to other platforms pretty easily.

@rainers
Copy link
Member Author

rainers commented Feb 12, 2017

Corresponding druntime PR: dlang/druntime#1763

src/backend/dt.c Outdated
@@ -411,8 +411,9 @@ dt_t *DtBuilder::xoffpatch(Symbol *s, unsigned offset, tym_t ty)

/*************************************
* Create a reference to another dt.
* Return the internal symbol used for the other dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use proper Ddoc formatting, i.e. Returns:

src/toobj.d Outdated
@@ -220,6 +221,41 @@ void genModuleInfo(Module m)
objmod.moduleinfo(msym);
}

void write_pointers(Type type, Symbol *s, uint offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No documentation for function.

src/traits.d Outdated
* architecture). If set the corresponding memory might contain a pointer/reference.
*
* [T.sizeof, pointerbit0-31/63, pointerbit32/64-63/128, ...]
* returns the size of the type in bytes, d_uns64.max on error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use proper Ddoc formatting, Returns:

src/traits.d Outdated
*/
extern (C++) Expression pointerBitmap(TraitsExp e)
extern (C++) d_uns64 getTypePointerBitmap(Loc loc, Type t, Array!(d_uns64)* data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document arguments

if (sz == SIZE_INVALID)
return new ErrorExp();
return d_uns64.max;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems outside the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no functional change here. getTypePointerBitmap() has just been extracted from pointerBitmap() to be callable from elsewhere. Having to deal with the Expression return value in write_instance_pointers() would be wasteful.

src/traits.d Outdated
* architecture). If set the corresponding memory might contain a pointer/reference.
*
* [T.sizeof, pointerbit0-31/63, pointerbit32/64-63/128, ...]
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation on the return value.

@WalterBright
Copy link
Member

General comment: many added functions have zero documentation. Please follow Ddoc conventions for this, even in the .c files, since they'll be converted to D sooner or later.

@WalterBright
Copy link
Member

What is the effect of this on compilation speed?

@rainers
Copy link
Member Author

rainers commented Feb 13, 2017

Updated comments. I've been following the (missing) documentation style of functions nearby ;-)

@rainers
Copy link
Member Author

rainers commented Feb 13, 2017

What is the effect of this on compilation speed?

I haven't measured it yet, but I don't expect much. The phobos unittest build emits less than 600 pointer references with a data segment of about 3 MB.

I was conidering caching the result of getTypePointerBitmap() and hasPointers() in the type, though.

@rainers rainers changed the title Win32: write references to mutable pointers in data and tls segment [WIP] write references to mutable pointers in data and tls segment Feb 13, 2017

// defer writing pointer references until the symbols are written out
obj.ptrref_buf->write(&s, sizeof(s));
obj.ptrref_buf->write(&soff, sizeof(soff));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use write32() here.

@@ -3702,5 +3708,94 @@ void Obj::gotref(Symbol *s)
{
}

/*****************************************
* write a reference to a mutable pointer into the object file
* Input:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ddoc way is:

Params:
    s = symbol that contains the pointer
    soff = offset of the pointer inside the symbol's memory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies throughout the PR. Also, symbols' should be Symbol's globally to this PR.

@WalterBright
Copy link
Member

Thanks, your added comments have improved my comprehension of this greatly. I noticed that you put the tables in readonly sections in COFF. This can be done for OMF too if they are put in code sections. I'm currently doing that with the string literal output I've been re-implementing, it seems to work well. The advantage of RO segments is that multiple users of the DLL won't need extra copies of it, and the GC should not be scanning them.

Also, for data emitted to RO sections, there should not be pointer tables generated.

@WalterBright
Copy link
Member

I've been following the (missing) documentation style of functions nearby ;-)

Unfortunately, those functions predate Ddoc and the style is not recognized by Ddoc. They will eventually get fixed, but new stuff should follow Ddoc style.

@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

Thanks for review, I've updated the comments.

The advantage of RO segments is that multiple users of the DLL won't need extra copies of it

You mean multiple processes using the same physical pages? Due to relocations inside the text segment and ALSR, it's unlikely to never be modified.

and the GC should not be scanning them.

With the generated information used, they won't be scanned anyway. But indeed, it wouldn't hurt to exclude them for the generic case, too.

Also, for data emitted to RO sections, there should not be pointer tables generated.

The hope is that this data should be typed as immutable/const, so it fails the isMutable tests. Is this assumption wrong?

@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

But indeed, it wouldn't hurt to exclude them for the generic case, too.

Now moved the data into the CODE segment.

@WalterBright
Copy link
Member

For readonly sections, yes, multiple process can use the same physical pages.

The hope is that this data should be typed as immutable/const, so it fails the isMutable tests. Is this assumption wrong?

Some of the RO data is not emitted via typed data. I plan to look into emitting all immutable statically inited data into RO sections. The GC should just skip those sections, as they can never contain pointers to mutable data.

Thanks for making the other fixes.

@WalterBright WalterBright merged commit 2d7b074 into dlang:master Feb 18, 2017
@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

Hmm, thanks for merging, but I didn't expect that ATM. win32.mak was changed to pull in a druntime PR for better testing...

@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

See #6549. The appropriate druntime PR is dlang/druntime#1763

@MartinNowak
Copy link
Member

This is a different approach than Issue 14472 – add separate ptr data section, can we discuss the pros/cons of those? I'd assume that the memory access pattern of those 262 pointers in 3MiB is a mess and not prefetchable, while simply moving all data with mutable pointers (e.g. anything that needs GC scanning) shouldn't be much bigger.
Also the double dereference has an additional overhead.

@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

This is a different approach than Issue 14472 – add separate ptr data section, can we discuss the pros/cons of those?

Adding immutable data or data without pointers into a separate sections works too, but doesn't work for struct with a mixture of pointers and non-pointers. The version from DConf2013 linked in the bugzilla report saves full type information for variables with pointers (but ignored indirections inside mutable initializers, e.g. multi-dimensional arrays or class instances in the data segment). The approach in this PR is simpler, as it doesn't create more dependencies to type information. My impression is that it even needs less additional memory. I don't have a working version of the other implementation to compare, though.

I'd assume that the memory access pattern of those 262 pointers in 3MiB is a mess and not prefetchable, while simply moving all data with mutable pointers (e.g. anything that needs GC scanning) shouldn't be much bigger.

The current GC sorts ranges, so I expect the accesses not to be worse than scanning the full 3MB. One possible example of a bad data structure is a static array of strings, where you have a series of length and pointer pairs.

Also the double dereference has an additional overhead.

You mean the GC range pointing to the pointer? I think that's pretty unavoidable.

@rainers rainers mentioned this pull request Feb 18, 2017
@MartinNowak
Copy link
Member

It's a good start in any case.

but doesn't work for struct with a mixture of pointers and non-pointers

Sure, but that could be enhanced by precise scan information.

Can we at least get the number, what size would it be if we moved all data containing a mutable pointer into a separate section.

You mean the GC range pointing to the pointer? I think that's pretty unavoidable.

I mean an array of pointers each pointing to the .ptr of an array. In particular the "static array of strings" is an example where conservatively (or precisely) scanning that array should be faster than iterating a separate array pointing at every 2nd element.

@rainers
Copy link
Member Author

rainers commented Feb 18, 2017

Can we at least get the number, what size would it be if we moved all data containing a mutable pointer into a separate section.

I've added some output to dmd to print some statistics, this yields (modulo duplicates due to COMDATs in separate object files) for the phobos unittest:

  • 70 data items with pointers only
  • 357 data items with a mixture of pointers and non-pointers
  • these contain 456 pointers in 995 words.

Looks very much like a lot of slices in global/tls memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants