Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reapply fix Issue 23479 - ImportC recognizes .i and .c files, but not .h files #14864

Merged
merged 1 commit into from
Feb 9, 2023

Conversation

ibuclaw
Copy link
Member

@ibuclaw ibuclaw commented Feb 5, 2023

This can be reapplied as it's currently safe to do so without breaking building downstream D compilers.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ibuclaw!

Bugzilla references

Auto-close Bugzilla Severity Description
23479 enhancement ImportC recognizes .i and .c files, but not .h files

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#14864"

@dlang-bot dlang-bot merged commit 650b75b into dlang:master Feb 9, 2023
@ibuclaw ibuclaw deleted the reapply23479 branch February 9, 2023 12:29
@WalterBright
Copy link
Member

@ibuclaw @dkorpel why was this done? It's the same thing I objected to - the two pass search. The correct fix was #14636, not this.

@dkorpel
Copy link
Contributor

dkorpel commented Feb 10, 2023

I forgot there was outstanding controversy on this issue. This PR was open for 4 days without any objection, and from the description it seems like this would make everyone happy: No breakage, but still able to do dmd foo.h now.

@WalterBright
Copy link
Member

The reasons still there: 1. it's slow 2. it's complicated and unexpected behavior

@maxhaton
Copy link
Member

slow

Has it been measured?

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 13, 2023

@ibuclaw @dkorpel why was this done? It's the same thing I objected to - the two pass search. The correct fix was #14636, not this.

That is not a correct fix, it broke gcc builds. We're a massive mixed D/C++ project that generate a lot of headers in the build directory. Any change that favours import C (or C++) sources over D modules is just plain wrong from the get go.

Importing modules is a critical core feature of D, and must never break in any way. ImportC is an add-on, not a core feature.

Give it a different syntax, give it a lower priority (current compromise), or drop ImportC altogether. I don't care which, just don't break importing D modules.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 13, 2023

slow

Has it been measured?

  1. A stat() takes 0-1ms on modern hardware.
  2. I've not studied big-O, but I'm pretty sure that there's no difference in time complexity (assuming worst case in both versions of the loop).

@WalterBright
Copy link
Member

When I worked on Warp (the C preprocessor clone), it went through extensive runtime profiling. One large effect was the file lookups, i.e. searching the #include paths. I was able to get a very good speedup by caching the searches.

People expect files to be found in the order of the search path. Your proposal breaks that.

We've changed the name of files in the D compiler many times. Why is this suddenly some massive problem?

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 14, 2023

People expect files to be found in the order of the search path. Your proposal breaks that.

Yes, people expect modules to be found in order of the search path, I don't break that.

We've changed the name of files in the D compiler many times. Why is this suddenly some massive problem?

That's plain wrong, this exact issue was pointed out right from the get-go:

#12507 (comment)
#12507 (comment)

Adding headers into the mix has just amplified the problem tenfold. In a massive C/C++ project with some 70k sources, headers are auto-generated everywhere, no amount of changing directories within the build will avoid them. The pressure point is that now one can no longer trust the D compiler to import D modules anymore. We used to be really good at importing D modules!

@WalterBright
Copy link
Member

Obviously, we are not going to agree on this. So, I propose the following. There's nothing in the D language that says anything about import searches. The switches between gdc and dmd are already different. They're implementation defined. There are a number of implementation defined differences, for various reasons. And that's a good thing, it gives users a choice. I don't mind in the least if gdc uses a different lookup algorithm. You're in charge of it, you decide. I'm sure you'll agree that I have not attempted to interfere with how gdc and ldc chose to do implementation defined behaviors.

Let's put this behind us. You implement what you need for gdc, and I'll do what's best for dmd.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 14, 2023

If dmd wants to turn its back on D users and the D language in order to be a C compiler, fine. This is not the first time you've been dismissive of real world cases.

https://issues.dlang.org/show_bug.cgi?id=22674

Which is why I put heavy emphasis on compromise.

@dkorpel
Copy link
Contributor

dkorpel commented Feb 14, 2023

@WalterBright Remember that ImportC was never accepted as a DIP, it was only added as an extension under the following promise:

It doesn't alter D in any way.

Note this response with 2 thumbs up reactions:

even if as you said D is not affected, I think that a -preview=ImportC is necessary.

You replied:

Since it does not change D behavior at all, it is not necessary.

Now we've learned that is actually does change D behavior, since it broke a real world D project. We have a policy to deprecate things first if we can, instead of introducing a breaking change between releases. Therefore, there either needs to be a deprecation period / preview switch, or the lookup mechanism must be designed in a way that ImportC doesn't alter D like originally stated.

So what's it going to be?

@schveiguy
Copy link
Member

I have a suggestion:

Almost never, will you have C and D files in the same import paths. What if you had a separate import path flag for C imports? Basically dmd -CI/usr/include would add /usr/include but only for C files.

This allows you to specify any ordering or priorities you want.

I too have been burned by dmd's searching for C files over D imports. I've had to rename C directories just so the D compiler doesn't "find" them.

@WalterBright
Copy link
Member

This is not the first time you've been dismissive of real world cases. https://issues.dlang.org/show_bug.cgi?id=22674

I authored the PR to fix it.

Which is why I put heavy emphasis on compromise.

My proposal is compromise.

Now we've learned that is actually does change D behavior

It does not change D behavior. It changes compiler behavior. Iain's PR also changed compiler behavior, in that it moved the lookup of .c and .i files, too.

This two-pass lookup is not expected behavior. Expected behavior is files further down the path are overridden by upstream files. (The same way inheritance works.) It someone adds some .d files downstream, they're going to be surprised by them overriding upstream files.

If you don't like my compromise proposal, ok. We'll just tell people that if they want to import .h files, they'll have to wrap it in a .c file:

stdio.c:

#include <stdio.h>

@WalterBright
Copy link
Member

in order to be a C compiler

The idea is not to be a C compiler. The idea is to make it easy for people to leverage existing C code for their D projects, and to add D code incrementally to C projects. The C to D translator programs could not do this in a "just works" fashion (there have been 4 of them so far, each one not quite reaching the mark). ImportC can. ImportC was never meant to be a competitor to gcc or clang.

@WalterBright
Copy link
Member

Also be able to get rid of the Deimos project.

@Geod24
Copy link
Member

Geod24 commented Feb 15, 2023

This two-pass lookup is not expected behavior. Expected behavior is files further down the path are overridden by upstream files. (The same way inheritance works.) It someone adds some .d files downstream, they're going to be surprised by them overriding upstream files.

I think we can apply the same reasoning as we did with the symbol lookup passes here. The original implementation, in a single pass, was correct & simple, but actually not what people wanted. Now we have two passes for symbols, which is more complex to implement & explain, but I haven't heard a single complain about it in the 5 years it's been implemented. I don't think people are going to be surprised if you tell them that D files always take precedence over C files.

@dkorpel
Copy link
Contributor

dkorpel commented Feb 15, 2023

It does not change D behavior. It changes compiler behavior.

That's a technicality that doesn't change the core of the problem: a D user upgrading the compiler may suddenly get weird compilation errors when compiling their D project.

Iain's PR also changed compiler behavior, in that it moved the lookup of .c and .i files, too.

That's because it's a fix of the regression introduced by ImportC. ImportC is new, there's no expectation of stability for it yet. It would still be behind a preview switch if you didn't dismiss that suggestion.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2023

Now we've learned that is actually does change D behavior

It does not change D behavior. It changes compiler behavior. Iain's PR also changed compiler behavior, in that it moved the lookup of .c and .i files, too.

No, I restored behaviour. D modules once again are searched and selected, in order as on command line in the predictable fashion they have always had since D1.0 as per pre-2.098 semantics.

This two-pass lookup is not expected behavior. Expected behavior is files further down the path are overridden by upstream files. (The same way inheritance works.) It someone adds some .d files downstream, they're going to be surprised by them overriding upstream files.

We're a D language compiler, import is a D language construct. It is surprising that a foreign language file is overriding the import of a D language file (and breaking builds) when all that changed in the toolchain/project was updating the D compiler from one version to another. The blame can't be put on the project structure, can't be put on the naming conventions used for source files and headers.

If you don't like my compromise proposal, ok. We'll just tell people that if they want to import .h files, they'll have to wrap it in a .c file:

Again, it's brought into focus now because C headers, unlike C sources, are distributed everywhere on the host system paths, and typically generated everywhere in build directories of C projects. import allowing .c and .i to override .d sources as much as a problem as .h. Giving up on importing .h files, might as well drop the ability to import .c and .i sources too.

@schveiguy
Copy link
Member

When I worked on Warp (the C preprocessor clone), it went through extensive runtime profiling. One large effect was the file lookups, i.e. searching the #include paths. I was able to get a very good speedup by caching the searches.

FYI, I mentioned this in the other PR, the number of stats performed has been drastically reduced since #14582

Given the amount of packages based on import paths that match the imported file is going to be small (probably only one), then the effect of this is going to be very small in terms of performance.

@schveiguy
Copy link
Member

schveiguy commented Feb 15, 2023

[If] someone adds some .d files downstream, they're going to be surprised by them overriding upstream files.

I have to agree with Walter here, if your intent is to include C files in your local project that should be imported using ImportC, allowing dependency projects (which should have -I flags later in the command line) to override that inclusion, this is going to be very surprising.

I still also understand the point being made here -- previously to this change, you could have co-habitating C files and they did not interfere with the build process.

Can we at least consider the proposal above?

What if you had a separate import path flag for C imports?

I think this solves both problems.

@WalterBright
Copy link
Member

that a foreign language file

The point is to make it easy for people to mix D into existing C projects and make use of existing C libraries, with the agonies of hand translation. Hand translation has severely limited use of D. Think of how C++ got its start - it fit right in with existing C code. With C++ came .hpp, .cpp, .hxx, .cxx, and .cc files, but I never heard of a two phase lookup for them, and the thought never occurred to me.

We all know how vulnerable D code is to changing .h files. Iain, you also have lost hundreds of hours (I can hardly imagine how much) from dmd .h files getting out of sync with .d files. This is a big reason why I introduced __import to ImportC. I also recall proposals to add special extensions to D to interface dynamic arrays to C code. But my proposal to add dynamic arrays to ImportC was met with vehement rejection. Oh well.

Working well with C has always been a core competency of D, and a key competitive advantage. ImportC is a major improvement in service of that.

@WalterBright
Copy link
Member

The original implementation, in a single pass, was correct & simple

Thank you for saying this. You are the very first person to say that.

But it's not the same situation. People have an intuitive sense that imports occur after scoped lookup. Imports themselves have a level playing field.

@WalterBright
Copy link
Member

@schveiguy> the number of stats performed has been drastically reduced since #14582

Thank you for improving this. I was gobsmacked by how many stats were done in Warp to compile a Boost file. Something like 10,000. What you did is more or less the same thing I did to Warp. It drastically reduced the problem, but didn't eliminate it. I've been wary of stat calls ever since! (The problem is much worse if the import path accesses networked shares.)

What if you had a separate import path flag for C imports?

It's a good idea, but the compiler doesn't know in advance if it's looking for a C or a D import.

@schveiguy
Copy link
Member

schveiguy commented Feb 15, 2023

It's a good idea, but the compiler doesn't know in advance if it's looking for a C or a D import.

Why does it need to know in advance? All it needs to know is "I can only import C files from this directory" or "I can only import D files from this directory". Then it's up to the user to construct a correct set of -I or -CI parameters to the compiler such that the expected outcome results.

For instance, if you want both C and D files imported from directory a/b/c, you would do -Ia/b/c -CIa/b/c. If you only wanted D files, you do -Ia/b/c. If you only wanted C files you do -CIa/b/c. If you wanted C files first, and then D files, you do -CIa/b/c -Ia/b/c.

To use your analogy, you get to construct the tree of inheritance whichever way you see fit.

Note: I am not a compiler developer, but this feature I could do. I just don't want to waste time if it won't be accepted.

@WalterBright
Copy link
Member

Why does it need to know in advance?

import foo;

Is foo a C or a D file?

@Geod24
Copy link
Member

Geod24 commented Feb 15, 2023

It's a good idea, but the compiler doesn't know in advance if it's looking for a C or a D import.

It sounds like the whole issue would be resolved if that was the case. Using import for D and C module seems to be the root of the problem. Perhaps import "stdio.h"; would make more sense ?

@WalterBright
Copy link
Member

We don't know if f() is a template or a function. D has a lot of constructs where the grammar does not distinguish from different underlying structure. It's the whole key to ranges, arrays, UFCS, etc. Why D doesn't distinguish between pointers to class functions, pointers to struct functions, or pointers to nested functions. It's a strength of D that a.b can mean a.b or a->b.

It should not matter what import foo; is importing. There's no reason we should not use it to "import" all kinds of things. It doesn't have to actually be a file, for example. The language should not have to care. It's a strength of D that it does not distinguish.

@WalterBright
Copy link
Member

BTW, one reason I like the named function arguments so much is it unifies the syntax with initializer syntax.

Like all kludge syntax, a special syntax for C imports will inevitably come back to haunt us.

@schveiguy
Copy link
Member

schveiguy commented Feb 16, 2023

import foo;

Is foo a C or a D file?

It doesn't matter. As far as the compiler is concerned, it could be either.

Changing what the import statement means is not what my proposal does. It just allows the user to specify the priority of C and D imports as desired.

@thewilsonator
Copy link
Contributor

Like all kludge syntax, a special syntax for C imports will inevitably come back to haunt us.

Like all kludge designs, prioritising file lookup for C over D will inevitably come back to haunt us.

@WalterBright
Copy link
Member

It just allows the user to specify the priority of C and D imports as desired.

You're right, my mistake.

if you want both C and D files imported from directory a/b/c, you would do -Ia/b/c -CIa/b/c. If you only wanted D files, you do -Ia/b/c. If you only wanted C files you do -CIa/b/c. If you wanted C files first, and then D files, you do -CIa/b/c -Ia/b/c.

That could work. But it is probably easier for the user to simply explicitly list the files that don't fit the usual search pattern. For example, if config.h is in the current directory, and we want to override it with a/b/c/config.d, simply add a/b/c/config.d to the command line. The compiler isn't going to search the path for files on the command line.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 16, 2023

It's a good idea, but the compiler doesn't know in advance if it's looking for a C or a D import.

Why does it need to know in advance? All it needs to know is "I can only import C files from this directory" or "I can only import D files from this directory". Then it's up to the user to construct a correct set of -I or -CI parameters to the compiler such that the expected outcome results.

It's going to be easier to accept that there are three distinct modes of compilation wrt import

  1. I'm a D project, and only intend to import D sources.
  2. I'm a mixed C/C++/D project, and intend to import both C and D files.
  3. I'm a C project with a D app, and only intend to import C files.

That'd be a more trivial flag to implement.

@schveiguy
Copy link
Member

there are three distinct modes of compilation wrt import

These don't take into account libraries, which might themselves only want to import D files, but you can't specify different build options for them and your project, or you get inconsistent builds.

This is already somewhat of a problem, but less so because D projects don't usually have overlapping module trees.

But with C bindings and ImportC, there is huge overlap. This is why this problem is cropping up all of a sudden.

@WalterBright
Copy link
Member

To override whatever happens in the import search path, config.d can always be placed explicitly on the command line. This will override any search path behavior.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 17, 2023

To override whatever happens in the import search path, config.d can always be placed explicitly on the command line. This will override any search path behavior.

Which will again cause linker errors due to duplicate symbols. Are you not considering separate compilation as a valid build strategy?

@jmh530
Copy link

jmh530 commented Sep 21, 2023

It's a good idea, but the compiler doesn't know in advance if it's looking for a C or a D import.

Why does it need to know in advance? All it needs to know is "I can only import C files from this directory" or "I can only import D files from this directory". Then it's up to the user to construct a correct set of -I or -CI parameters to the compiler such that the expected outcome results.

For instance, if you want both C and D files imported from directory a/b/c, you would do -Ia/b/c -CIa/b/c. If you only wanted D files, you do -Ia/b/c. If you only wanted C files you do -CIa/b/c. If you wanted C files first, and then D files, you do -CIa/b/c -Ia/b/c.

This strikes me as the best option. Are there any objections to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants