Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Knowledge Dump Debugging Epoch Programs
Epoch Knowledge Dump Series
Debugging Epoch Programs Using Native Windows Tools
Living Document Alert!
This page is continually undergoing revisions and updates as we progress with the debug support project. Be sure to check back frequently for the latest developments. Also, be sure to read to the end of the page for the most complete and up-to-date status information.
One of the key usability traits of a new programming language is the debugging experience. Without a solid set of tools for debugging, any novel language faces a serious uphill battle for adoption. Epoch has been since its inception a pragmatic language first and foremost; if it doesn't help get work done, it isn't doing its job. Debugging is no exception (pardon the pun) so we want to have a first-class debugging experience ready for future Epoch programmers.
One option for great debuggability of new languages is to build the debug tools by hand. In fact this was largely the plan for Epoch for a long time. We need to emit comprehensive metadata about the code anyways for garbage collection purposes, so why not hitchhike on that data and deliver a custom debugger?
Of course the problem with this is that building a world-class debugger is a monumental undertaking, and is not even guaranteed to hit the bar for quality. More specifically, a home-grown debugger is likely to offer a very different UX than the tools developers already know. So the gold standard is to integrate with existing tools cleanly.
Since Epoch is primarily targeting Windows (for now!) this means that the ideal debugging experience is to work seamlessly with tools like Visual Studio and WinDbg. Moreover, it means adopting the
PDB debug file format so that things like
DbgHelp.dll can generate stack traces, minidumps, and so forth.
It doesn't take much research into the
PDB format to discover that very little is actually publicly known about how these files work. There are a tiny number of projects that have interfaced successfully with
PDB files, most notably
cv2pdb. The strategy used by this tool is to talk directly to
MSPDB140.dll (or a similarly named file depending on local Visual Studio version) and use its APIs to build up and emit a
Based on analysis of this project as well as some minimal code open-sourced by Microsoft, we've discovered enough of the format's peculiarities to at least make a convincing sketch of a valid
PDB file for a test program written in Epoch.
As of July 2016, the Epoch 64-bit compiler emits debug symbols that can be used with Visual Studio and WinDbg. A number of additional tools have been used to reconstruct the details of how a
PDB comes to be.
DBH.execan dump the function names and source mapping correctly from our generated
PDB. This tool comes with the Windows SDK and can be found alongside WinDbg.
cvdump.exetool which can be found in the Microsoft GitHub repo
microsoft-pdbemits a chunk of data which is useful for validation and sanity checking.
- The DIA SDK included with Visual Studio has a tool called
DIA2Dumpwhich also provides useful details about
Interestingly, none of those tools appear to offer a comprehensive dump of symbol data, so using all three was necessary to engineer a working symbol generation pipeline. The current status of debugger support for Epoch follows:
- Visual Studio 2015 correctly generates callstacks with function names
- Visual Studio 2015 correctly shows source code for Epoch programs during debugging
- WinDbg correctly generates callstacks with function names
- WinDbg correctly shows source code from a given instruction in the disassembly
- DbgHelp generates correct callstacks
There are several notable holes in the current
PDB generation code:
- Type metadata is not emitted yet; this limits a number of debugger features
PDBdata is generated using the
MSPDB140.dllfed with CodeView data generated by LLVM; we currently manually crack this blob and tweak it a bit to make the debug files work, prior to handing over the stream to
- The raw debug data being generated by LLVM is in some cases bogus, because we feed it hack data for laziness reasons. For example we don't track actual line numbers or source files because the compiler front-end is not set up to track that information yet.
Ultimately the project is moving forward and we are very close to supporting a moderately good debug experience on Windows. As time goes on we can fill in the remaining gaps and generate enough debug data to be competitive.
- For about a week we had a problem where Visual Studio would show function names, but WinDbg wouldn't. More interestingly, WinDbg would show source code, and Visual Studio wouldn't! It turns out this is down to messing up two things: section contributions in the
PDB, and a lack of a section symbol describing the
.textPE section for the compiled binary. Fixing up the contributions logic and adding a symbol to map addresses into the correct space resolved this weird behavior.
- A huge amount of insight was gained by reading the code for
DIA2Dump, and most recently,
cvdump. We plan to try and coalesce this information into the Epoch compiler and document it for posterity. Ideally any novel language built on LLVM should be able to benefit from this
PDBemission pipeline, even though it does technically require a Visual Studio installation to work.
Entering the month of August 2016
Another tool came to light from poking the LLVM mailing lists -
llvm-pdbdump. This is by far the most comprehensive resource we've found yet for
PDB emission. It includes the capability to emit a complete
MSF (Multi-Stream Format) file, which is the parent/container format for
PDB data. Based on this tool we are now writing a raw
PDB emission pipeline that integrates with the compiler to generate
PDB debug data for Epoch programs as they are compiled, rather than a post-hoc second step.
Building a minimal PDB
It seems that the following components are necessary and sufficient to get a usable debug experience from Visual Studio 2015 and WinDbg:
- A usable
MSFfile to host the data. Currently we do this with
MSPDB140.dllintegration but as noted above we also have a raw
MSFgenerator in the works.
- A PDB Information stream that contains a GUID and "age" value of the same values as the
.debugCOFF section in the image (
.EXE) to be debugged (see the
WriteDebugStubfunction implementation for details).
- A DBI stream with associated contents:
- A Section for the
.textsection in the final
.EXEimage. It is not clear if additional sections are necessary, but setting up valid
COFFsection data is highly recommended as it makes it easier to align the addresses of code in the final image with addresses as computed by the
- A Module for code. Only one is needed and it may be useful to have multiples for separate compilation but this seems to be pretty flexible so far.
- The Module contains symbol data. From the perspective of a consumer of
MSPDB140.dllthis is just a matter of feeding CodeView data from LLVM/etc. directly into the
AddSymbolsfunction. There is some fixup needed to handle relocations, and one additional
S_SECTIONsymbol should be added to help the debugger map the code addresses to a
- Each publicly visible CodeView symbol from the module should be fed through a call to
AddPublic2as well. This ensures that the debuggers will see the symbol, but it doesn't seem to prevent non-publicized symbols from occasionally working as well.
- The Machine Type of the DBI stream should probably be set but it doesn't seem to be a problem if it's bogus or zero.
- A Section for the
- Notably, TPI (Type) information is not necessary although it will severely limit the debugging experience to not have it.
- The IPI stream is still a mystery although it apparently mimics the TPI stuff in a lot of ways?
August 7, 2016
The first version of completely fabricated PDBs is now checked in to version control (see "raw" emission link above). Most of the work involved stepping through the disassembly of
MSPDB140.dll and hand-correlating the code to the
microsoft-pdb repo. This proved necessary because the code for the
PDB handlers does not compile as it stands in the Microsoft repo. This makes understanding the runtime flow of the code mildly awkward, but nothing that can't be solved with a healthy familiarity with x64 ASM and a willingness to spend a lot of time in a debugger.
In any event, we now have a PDB that loads cleanly in multiple tools (as cited previously in this article) and also serves up working debug information for both Visual Studio 2015 and WinDbg. It is highly hard-coded and many hacky workarounds remain. Getting this PDB generation up to production quality will still take some time, but it is a very encouraging and promising milestone that we can completely bypass black-box tools to generate this data.