Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use memory-mapped files for updating and for library writing #12560

Merged
merged 2 commits into from
Jul 4, 2021

Conversation

andralex
Copy link
Member

This is larger than usual. Functionality added:

  • Add support for memory-mapped files to OutBuffer. That way, its clients may transparently write straight to files.
  • Use that functionality for file updating purposes. It turns out large-scale applications have very large binaries (hundreds of megs to gigs) so loading the entire file in RAM may cause the build to run out of memory.
  • Also use that functionality to write libraries straight to file instead of in memory, which is then written to file.

Even modest libs can get to hundreds of megs. One particular library build at Symmetry produced a library that's 1.7 GB, meaning at a point 3.4 GB of RAM had to be allocated just to write the library out. This saves that much memory.

Hopefully more uses of memory mapping are possible.

@dlang-bot
Copy link
Contributor

dlang-bot commented May 22, 2021

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#12560"

@andralex andralex force-pushed the memory-mapped branch 3 times, most recently from 734484f to c778271 Compare May 22, 2021 01:38
@UplinkCoder
Copy link
Member

UplinkCoder commented May 22, 2021

Needs more documentation, clearer structure, and a rationale in the commit message.
I'll have a closer look tomorrow,

src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Show resolved Hide resolved
src/dmd/root/outbuffer.d Outdated Show resolved Hide resolved
@andralex
Copy link
Member Author

Of course OSX has yet another different but not distinct mmfile API...

src/dmd/root/outbuffer.d Outdated Show resolved Hide resolved
The `Datum` type encodes the mapping mode: Use `ubyte` for read/write mapping
and `const ubyte` for read-only mapping.
*/
struct FileMapping(Datum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ubyte -> RW / const ubyte -> R mapping is nice, however, someone that sees FileMapping!(ubyte)("bla.txt") might not understand it right away. FileMapping!(Read)/FileMaping(ReadWrite) reads better in my opinion. Also, what happened to plain Write ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to allow support for other units such as uint or ulong. Also, most of the code is common to the read and read-write modes so a template was scoring on both these points.

There's no write-only mapping as far as I understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to allow support for other units such as uint or ulong.

Martin says don't build in features 'till ya need them. You're going to be sorry you gave me that book :-)

It makes the interface "heavier", and the unused features likely don't even work because they are never tested. It will be trivial to add this if/when it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Martin says don't build in features 'till ya need them.

Actually that's exactly what the code is doing: it rejects data types that are not currently used. The template is necessary because there are two modes that have 90% identical code and 99% identical interface.

It makes the interface "heavier"

In fact the opposite is true. The interface is virtually identical across the read-only and read-write versions (only the resize function is not offered in read-only mode). Semantics are identical, too. So the reader only needs to read and absorb ONE interface to understand TWO ways of using memory-mapped files, as opposed to looking at TWO distinct interfaces (subtle duplication for the lose!) that it just so happens are 99% identical with no indication of that in the source code.

Template is the way to go here, no two ways about it. I could use a bool or enum to say "read-only vs read-write" but that is the same aggravation for less upside.

src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
src/dmd/root/file.d Outdated Show resolved Hide resolved
@thewilsonator
Copy link
Contributor

(this is going to be much easier to review if you do a clean rebase. Multiple commits is fine (and encouraged!) but not with a bunch of rebase commits in the middle of it)

@andralex
Copy link
Member Author

@thewilsonator yah, no worries. I'm coding on a laptop in an airport and snuck in a bunch of unrelated commits. Will do a grand git rebase -i soonish.

@andralex
Copy link
Member Author

Was wondering why the test failures - it turns out there is a shadow C++ struct OutBuffer in outbuffer.h that must be (at least) layout- and destructor-compatible with the D struct.

Need a robust solution here, any ideas? First thought that comes to mind is to use a pointer instead of a direct member for the file mapping, which would always be null on the C++ side.

Copy link
Member

@WalterBright WalterBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All very small changes.

@andralex
Copy link
Member Author

andralex commented Jul 1, 2021

@WalterBright thanks very much for the thorough review! Commit follows.

@andralex andralex force-pushed the memory-mapped branch 5 times, most recently from a992ef8 to 284bdf0 Compare July 1, 2021 22:39
@andralex
Copy link
Member Author

andralex commented Jul 1, 2021

@thewilsonator done squashing

@andralex andralex force-pushed the memory-mapped branch 7 times, most recently from 6bc1ae5 to 1ac70ac Compare July 2, 2021 22:18
@andralex
Copy link
Member Author

andralex commented Jul 2, 2021

What's the matter with buildkite all of a sudden?

/usr/bin/ld: error: cannot find -lsqlite3
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:135: error: undefined reference to 'sqlite3_bind_text'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:140: error: undefined reference to 'sqlite3_bind_text16'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:146: error: undefined reference to 'sqlite3_bind_null'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:151: error: undefined reference to 'sqlite3_bind_blob'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:334: error: undefined reference to 'sqlite3_column_name'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:347: error: undefined reference to 'sqlite3_prepare_v2'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:366: error: undefined reference to 'sqlite3_errmsg'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:251: error: undefined reference to 'sqlite3_column_blob'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:251: error: undefined reference to 'sqlite3_column_bytes'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_blob'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_bytes'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_blob'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_bytes'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_blob'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:255: error: undefined reference to 'sqlite3_column_bytes'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:32: error: undefined reference to 'sqlite3_open_v2'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:37: error: undefined reference to 'sqlite3_close'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:56: error: undefined reference to 'sqlite3_exec'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:102: error: undefined reference to 'sqlite3_last_insert_rowid'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:109: error: undefined reference to 'sqlite3_changes'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:120: error: undefined reference to 'sqlite3_bind_int'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:125: error: undefined reference to 'sqlite3_bind_int64'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:130: error: undefined reference to 'sqlite3_bind_double'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:164: error: undefined reference to 'sqlite3_step'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:175: error: undefined reference to 'sqlite3_reset'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:184: error: undefined reference to 'sqlite3_reset'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:322: error: undefined reference to 'sqlite3_column_count'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:328: error: undefined reference to 'sqlite3_data_count'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:339: error: undefined reference to 'sqlite3_finalize'
../.dub/packages/ae-0.0.3053/ae/sys/sqlite3.d:266: error: undefined reference to 'sqlite3_column_int'
collect2: error: ld returned 1 exit status
Error: linker exited with status 1

src/dmd/root/file.d Outdated Show resolved Hide resolved
@thewilsonator thewilsonator dismissed WalterBright’s stale review July 4, 2021 04:14

All requested changes addressed

@thewilsonator
Copy link
Contributor

(rebased to restart CI with fixed dlang-bot)

@thewilsonator
Copy link
Contributor

Merging. @andralex please consider adding a changelog entry for this.

@thewilsonator thewilsonator merged commit 462684e into dlang:master Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants