dtoh, first draft #39

Merged
merged 6 commits into from Mar 16, 2014

Projects

None yet

10 participants

@adamdruppe
Contributor

I have this basically working and want to get it out so other people can take a look. Should be usable for a decent amount of stuff already.

Associated with: http://d.puremagic.com/issues/show_bug.cgi?id=9285
see the zip I posted in the comments there for the example I was using to test

@klickverbot klickverbot and 1 other commented on an outdated diff Jan 15, 2013
+ if(arg == "-c")
+ useC = true;
+ else if(arg == "-h") {
+ writef("%s", usage);
+ return;
+ } else
+ jsonFilename = arg;
+ }
+
+ if(jsonFilename.length == 0) {
+ writeln("No filename given, for help use dtoh -h");
+ return;
+ }
+
+ string[string] typeMapping = [
+ "int" : "long", // D int is fixed at 32 bit so I think this is more correct than using C int...
klickverbot
klickverbot Jan 15, 2013 Member

Huh? C int is 32 bit pretty much everywhere (on the platforms immediately relevant to us), whereas sizeof(long) == 4 holds only on Windows (LLP64).

adamdruppe
adamdruppe Jan 15, 2013 Contributor

On Tue, Jan 15, 2013 at 01:11:57PM -0800, David Nadlinger wrote:

Huh? C int is 32 bit pretty much everywhere, whereas sizeof(long) == 4 holds only on Windows (LLP64).

I thought it was the opposite!

So is the correct mapping always going to be:

D C
int -> int
long -> long long

for both 32 and 64 bit?

adamdruppe
adamdruppe Jan 15, 2013 Contributor

nvm, I looked it up. Old memories of 16 bit ints confused me!

it is fixed now

Member
alexrp commented Jan 15, 2013

Code style should be consistent with the rest of the code bases of the DPL org, i.e. 4 space indentation, Allman brace style, spaces between operators and operands, space before statement keyword and parenthesis, etc.

Owner

@alexrp what's the status of this? care to make the style pass?

Contributor

The updated code to work with newer versions of dmd is currently
unavailable - I hadn't updated the github thing yet and my home
internet has been down all week (the ice storm up here on Sunday
severed my cable connection!).

Cable guy is supposed to fix it at some point today, so I can most
likely update the pull request over the weekend.

Member

This should be in the compiler. Thanks to #2754 it's now easier to add than it has ever been before!

Owner

@yebblies so how do we go about this? we have a bounty out for dtoh and it's somewhat urgent

Member

I'd say it is OK to have it as a separate tool for now, but it would be nice to have it in the compiler.

Member

So urgent it was left open for a year?

I'm going to need something like this for the D conversion very soon.

If this serves your needs, then I have no problem with you merging it.

Member

I don't know what the standards are for the /tools repository, but I think this has a number of quality issues. If that's a problem, I recommend against fast-tracking this until it has been thoroughly reviewed (I'll do my part later today).

Contributor

On 12/28/13, JakobOvrum notifications@github.com wrote:

I don't know what the standards are for the /tools repository, but I think
this has a number of quality issues.

Some of it has probably already been fixed on the copy I have on my
other computer (still offline though, the cable company didn't keep
their appointment yesterday). What big picture issues do you see?

The type stuff btw has been completely rewritten, since in January dmd
outputted it as plain strings, and now it outputs mangles. A few other
things that weren't possible then can be done now, such as forward
references, skipping structs with dtors, and outputting _gshared and
manifest constants.

The overall approach is unchanged, however.

Member

@adamdruppe Would you be interested/have the time soon to rewrite this as a compiler ast visitor?

Member

What big picture issues do you see?

It's mostly a myriad of small to medium importance issues, such as an extreme amount of garbage memory production (as soon as the output is large enough to invoke that first GC cycle, performance takes a big hit from there on out), poor command line argument handling, duplicated code and some unnecessary code, poor variable naming, control flow that could be simplified, and a classic case of pokemon exception handling. There's also suboptimal readability (to put it mildly) due to reliance on indexes instead of standard algorithms (and it also results in fragile reliance on whitespace details). I was going to do line comments, but I'd rather wait to see the new version. If this was Phobos, these kind of issues would matter, and I'm wondering if the same quality standard is warranted for /tools.

As to the actual source code generation I noticed two things, a) as function declarations without definitions are implicitly extern in C, C++ and D, it would be better to just not emit extern for extern(C) functions when useC is enabled, and b) is it really safe to default to void return type for all calling conventions involved?

Owner

@yebblies love the visitor but there are disadvantages to integrating everything there is within the compiler. A more scalable approach is to have the compiler generate nice annotated json (somehow with lookup done in tow) that allows a lot of tools to visit the json ast.

Member

@andralex This tool is very similar to di and ddoc generation, which are both integrated into the compiler. Json generation is a half-way hack to get around the fact we don't provide compiler-as-a-library. The scalable approach is not to serialize the ast to json, then reconstruct it in a bunch of external tools - it is to re-use the well tested code inside the compiler.

Owner

@andralex This tool is very similar to di and ddoc generation, which are both integrated into the compiler.

Good point.

Json generation is a half-way hack to get around the fact we don't provide compiler-as-a-library.

Doing so still poses the issue of intermediate formats (of which json could be one). I don't see these as either-or.

The scalable approach is not to serialize the ast to json, then reconstruct it in a bunch of external tools - it is to re-use the well tested code inside the compiler.

But generating the ast would use the well tested code inside the compiler! It's just a walkthrough that spits out an intermediate format. I don't understand this.

Contributor

On 12/28/13, Daniel Murphy notifications@github.com wrote:

@adamdruppe Would you be interested/have the time soon to rewrite this as a
compiler ast visitor?

Probably not, at least not any time soon. When my cable gets back I'm
going to have at least a week of other work to catch up on and then
who knows what else.

Contributor

On Sat, Dec 28, 2013 at 08:04:38AM -0800, JakobOvrum wrote:

It's mostly a myriad of small to medium importance issues, such as an extreme amount of garbage memory production (as soon as the output is large enough to invoke that first GC cycle, performance takes a big hit from there on out),

Yeah, I wrote it pretty sloppily, but I don't think speed is hugely important
here; it will probably be dwarfed by the runtime of dmd itself anyway.

Some of the other things you mentioned, but not all, should be better in the
new version - a lot of it came from having to parse type strings whereas now
dmd outputs the mangle.

b) is it really safe to default to void return type for all calling conventions involved?

I'm not sure if it does this anymore or not...

But anyway, my cable finally got fixed so I updated the code. If this mostly
works, I'll finish it, and if it sucks, I'll abandon it. Not really sure how
much of the D should be out in C++: here, I did the extern C and C++ functions,
struct definitions (unless they have a dtor), gshared variables, and interfaces.
I think that should be enough to get some good use but perhaps more is needed too,
idk.

Member

Doing so still poses the issue of intermediate formats (of which json could be one). I don't see these as either-or.

The only intermediate format necessary is the in-memory ast. There are benefits to serializing the ast, but running passes on it is not one of them.

I guess we'll see if this satisfies the needs of the D port.

@ghost
ghost commented Feb 2, 2014

I'll leave the philosophical stuff aside and review the tool itself iteratively. First find:

ref isn't handled properly, according to the test-case in your zip:

testcpp.d:

extern(C++) void testref(ref int lol);

testcpp.h

extern "C++"    void testref(long lol);

Also long itself might not be right, search for c_long/c_ulong in druntime. On the C++ side you might have to use some kind of typedef to ensure the type is a 64bit signed type (and ditto for ulong, make it a 64bit unsigned type in C++).

@ghost
ghost commented Feb 2, 2014

Type declarations with the same name but in different modules (unique in D), are not unique in C++ when headers are used, e.g.:

module test1;
import test2;
struct A { }
module test2;
struct A { }

These two are unique, but in C++ header files they will conflict with each other. Probably a good option here is to wrap them in a namespace, for example:

test1.h:

namespace test1
{
    struct Test { };
}
#include "test2.h"

test1.h:

namespace test2
{
    struct Test { };
}

Alternatively, and if only generating C++ (rather than the optional C language feature your tool has via -c), you could just generate a single header file for all D modules and always use namespaces. It would simplify having to deal with #include generation and having to deal with multiple inclusions, inclusion guards, plus it would save on preprocessing time.

E.g. the generated output could be a single C++ header file:

test.h

namespace test1
{
    struct Test { };
}

namespace test2
{
    struct Test { };
}
@ghost
ghost commented Feb 2, 2014

W.r.t. Andrei's bugzilla comment about the requirement for indentation. For really nice looking output I suggest to use an existing tool like uncrustify. DTOH would of course need to compare code before indentation in order to determine whether it has to overwrite a file in a subsequent generation (see Andrei's bugzilla comment about makefile friendliness).

@ghost
ghost commented Feb 2, 2014

I'm not sure why you named this variable in your test-case (see zip file in bugzilla):

__gshared int tlsInt;

You could probably name it localInt instead, to avoid confusion with TLS.

@ghost
ghost commented Feb 2, 2014

Btw, be careful (or just aware) with passing classes around. I ran into an ABI issue once that I wasn't aware of before. You can read about it here:

http://forum.dlang.org/thread/mailman.1547.1346632732.31962.d.gnu@puremagic.com

Member
yebblies commented Feb 2, 2014

Btw, be careful (or just aware) with passing classes around. I ran into an ABI issue once that I wasn't aware of before.

Some of those have been fixed recently.

@ghost
ghost commented Feb 2, 2014

Enums seem to lack support: The Colors enum wasn't generated.
Edit: Numbers either, so enums need work.

@ghost
ghost commented Feb 2, 2014

To match semantics with D, I think in C++ you should define default constructors for types to match D's .init behavior, for example:

struct D
{
    int i;  // should default to 0, not uninitialized memory
    float f;  // should default to NaN
}

So on the C++ side I'd assume you should generate:

#include <math.h>
#include <cassert>

struct D
{
public:
    D() : i(0), f(NAN) { }

    int i;
    float f;
};

int main()
{
    D d;
    assert(d.i == 0);
    assert(isnan(d.f));
    return 0;
}

I've added a main there to showcase this. And in fact, this stuff should be in a test-suite.

Contributor

On Sun, Feb 02, 2014 at 06:50:25AM -0800, Andrej Mitrovic wrote:

I'm not sure why you named this variable in your test-case (see zip file in bugzilla):

It's because I wrote it as tls at first and it didn't work so I changed it to
gshared to compare the json and never changed it back...

so just sloppy changes on my part.

Contributor

On Sun, Feb 02, 2014 at 08:23:33AM -0800, Andrej Mitrovic wrote:

Enums seem to lack support: The Colors enum wasn't generated.

Right, since the json output gives the names but not values of enums,
I couldn't accurately represent them. Listing the names would be wrong
since what D thought was == 100, c++ would think is == 0.

A few people have mentioned not using the json and I kinda agree, but
modifying the compiler to add something like this strikes me as a bit
iffy too, given that it is kinda esoteric.

@ghost
ghost commented Feb 2, 2014

If @andralex still thinks json is the right approach for this then we can easily change the json output.

Member
yebblies commented Feb 3, 2014

@adamdruppe Doing it this way and making json output better doesn't hurt anybody, even if I don't think it's the ideal way.

Contributor

On Sun, Feb 02, 2014 at 06:46:06AM -0800, Andrej Mitrovic wrote:

These two are unique, but in C++ header files they will conflict with each other. Probably a good option here is to wrap them in a namespace, for example:

I don't think this is going to work either because dmd doesn't do C++
namespace mangling.

I changed the class Animal to be in a namespace.. then got undefined
reference to testcpp::speak(testcpp::Animal) from the linker.

Three options come to mind:

  1. C++ namespace support in dmd. But will it automatically assume
    namespace == module name unless told otherwise? If not, odds are
    it still wodul need manual fixing by the D coder anyway.

  2. Write out something more like the fully qualified/mangled name to c++
    (I think this makes it inconvenient to use, nobody is going to want to
    type _D7testcpp4Test in their C++ code)

or, the one my lazy self leans toward: 3) just use the plain names
and let them clash in C++. The D author will have to rename the types
that conflict.

@ghost
ghost commented Feb 3, 2014

What about pragma(mangle), could that help in any way?

@ghost
ghost commented Feb 3, 2014

Oops, that's a D feature.. but perhaps GCC or other compilers have something equivalent..

Contributor

On Mon, Feb 03, 2014 at 09:18:58AM -0800, Andrej Mitrovic wrote:

Oops, that's a D feature.. but perhaps GCC or other compilers have something equivalent..

If they are extern C++ on the D side though, functions at least will still
have conflicting names to the linker. And if they aren't extern C++, the
calling convention may be incompatible.

@ghost
ghost commented Feb 3, 2014

Well if multiple type declarations in module scope with the same name are found in the D code (via json output) just add it as a "not supported yet" check in the tool if you can.

Contributor

updated this again, the test i've been playing with is here: http://arsdnet.net/dcode/dtoh.zip that's updated a bit too.
To get enums to work we have to add that to the compiler. the namespace thing didn't work out as said before but i didn't do "not supported yet" (the error is not supported yet :P) however since the C++ compiler will complain anyway i'm in no rush. i did get imports fairly sane by turning some into #include while ignoring std.* and core.* - worked for a simple test at least.

@ghost
ghost commented Feb 11, 2014

Windows build script:

dmd -X -c testcpp.d testcpp2.d
dmd dtoh
dtoh.exe testcpp.json
dmc test.cpp testcpp.obj testcpp2.obj phobos.lib -L-L+C:\PATH\TO\dmd2\windows\lib\
@ghost
ghost commented Feb 11, 2014

My other comments still apply for the new version of dtoh:

  • ref isn't handled properly. And integral types need to be explicitly specified on the C++ side (D int is not C++ long, see core.stdc.config) (#39 (comment)).
  • .init in D for floats is NaN, for C++ it's 0.0. (#39 (comment)) This may or may not be an issue.
Contributor

On Tue, Feb 11, 2014 at 06:42:37AM -0800, Andrej Mitrovic wrote:

And integral types need to be explicitly specified on the C++ side (D int is not C++ long, see core.stdc.config)

Oh yeah, int definitely isn't long... but is D's int always C's int? I keep thinking
it isn't because it wasn't on 16 bit, but I think it is on 32/64 bit.

Alternatively, I could #include<stdint.h> and do them all that way.

Which do you think is better? Looking at core.stdc.config, if I just avoid the type
long it looks like everything will work out. I went with int for now since I'm
pretty sure that's right.

  • .init in D for floats is NaN, for C++ it's 0.0. (#39 (comment)) This may or may not be an issue.

I went with your default ctor idea which also covers other user-defined initializers
so probably generally a win.

Member

Oh yeah, int definitely isn't long... but is D's int always C's int? I keep thinking
it isn't because it wasn't on 16 bit, but I think it is on 32/64 bit.

On all x86/x86-64, yes.

Alternatively, I could #include<stdint.h> and do them all that way.

That won't work for C++ long, unfortunately.

On LP64 platforms, D's long maps to C++'s long. On everything else, it maps to long long. C of course can't tell the difference and you only need to match the size.

@ghost
ghost commented Feb 11, 2014

but is D's int always C's int?

In the generated output dtoh seemed to have translated D's int into C++ long, but it no longer does that now so I guess this is ok.

@ghost
ghost commented Feb 24, 2014

@andralex: Are we ok with merging this? I suppose we can expect further pulls for adding new features, unless there's something missing right now.

@andralex andralex commented on the diff Mar 16, 2014
@@ -0,0 +1,473 @@
+// FIXME: output unions too just like structs
+// FIXME: check for compatible types
+
+// nice idea: convert modules to namespaces, but it won't work since then
+// the mangles won't be right
+
+import std.stdio;
+import std.string : replace, toUpper, indexOf, strip;
+import std.algorithm : map, startsWith, canFind;
+
+string getIfThere(Variant[string] obj, string key) {
andralex
andralex Mar 16, 2014 Owner

I enjoy Egyptian braces (use them at work, too), but do we want to add another style to the existing one?

adamdruppe
adamdruppe Mar 16, 2014 Contributor

On Sat, Mar 15, 2014 at 09:55:53PM -0700, Andrei Alexandrescu wrote:

I enjoy Egyptian braces (use them at work, too), but do we want to add another style to the existing one?

Yeah, I'm keeping it in my native style (so to speak) for now since I
expect several changes will need to come in in reply to early bug reports.

Once it passes the beta phase and is less likely to need to be completely
rewritten anyway, changing style becomes easy. (I can edit smaller pieces
in a different style but I'll fall back on my old habits for large modifications.)

Owner

I'll move forward and merge this now, in the understanding that it'll be easier to add to an existing codebase than to work on breaking the ice. @adamdruppe please fix the style in a future diff. Thanks!

@andralex andralex merged commit 70a242c into dlang:master Mar 16, 2014

Yet another tool added without a formal review :(

@ghost
ghost commented Mar 16, 2014

Yet another tool added without a formal review :(

Why stay silent for 3 months and then suddenly complain?

Why stay silent for 3 months and then suddenly complain?

Because I had no idea it existed until Andrei posted about it in the newsgroup. Sure, I could subscribe to receive notifications about pull requests but I don't understand why these tools don't go through same review process the new modules in Phobos do. Yes, I've already heard the lack of manpower argument.

Member

Why stay silent for 3 months and then suddenly complain?

Not all of us were silent, either.

I'll move forward and merge this now, in the understanding that it'll be easier to add to an existing codebase than to work on breaking the ice.

I don't think this hurried script is going to be useful to a proper tool that doesn't do the job in an ad-hoc way. I guess it serves as a "better than nothing" tool, but I don't think there's much point in putting such in the official tools repository. More than anything though, I agree that it should've been taken up on the NG.

@andralex andralex added a commit that referenced this pull request Mar 16, 2014
@andralex andralex Revert "Merge pull request #39 from adamdruppe/dtoh"
This reverts commit 70a242c, reversing
changes made to 44048e6.
f8631e5
Owner

No problem: #125 and let's review this.

Owner

@jacob-carlborg you just volunteered as review manager.

Contributor

On Sun, Mar 16, 2014 at 02:54:23AM -0700, JakobOvrum wrote:

More than anything though, I agree that it should've been taken up on the NG.

I think it was.... in January 2013 when the bugzilla issue was first
opened.

But regardless, I haven't even used it myself on anything but a few
basic tests/toys, so any real-world perspective would be good.

Owner

@adamdruppe please add it to the proposals in review when you're ready.

you just volunteered as review manager.

Fair enough. I guess I have to do that then. I'll sync up with Dicebot.

@adamdruppe please let me know when you're and I'll set up the review.

I added it to the review queue as "Work in progress".

Contributor

On Mon, Mar 17, 2014 at 07:06:22AM -0700, jacob-carlborg wrote:

@adamdruppe please let me know when you're and I'll set up the review.

just let me know the link where the discussion is happening.

@adamdruppe do you have any documentation for dtoh or could you create some? Since this is a tool rather than an API perhaps documentation like this best suited: http://dlang.org/dmd-osx.html.

Member

what is the status of this?
AFAIK there was never an official voting - any reason?

Member

This is more recent: dlang/dmd#5082 - it covers most of the C++ interfacing support in D.

Contributor

On Tue, Mar 29, 2016 at 11:45:34PM -0700, Seb wrote:

what is the status of this?

It is quite dead, apparently nobody cared enough to do real
world use - including me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment