Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imprecise translation for ternary operator #324

Closed
lSoleyl opened this issue Jul 2, 2020 · 3 comments
Closed

Imprecise translation for ternary operator #324

lSoleyl opened this issue Jul 2, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@lSoleyl
Copy link

lSoleyl commented Jul 2, 2020

Hi,

this is a great tool for crosschecking the compiler's interpretation of C++ code. However the expansion of the ternary operator looks a bit misleading to me. Given the following code:

struct Str {
  Str(const char* string) : string(string) {}
  
  operator const char*() const { 
    return string; 
  }
  
  const char* string;
};

Str globalString = "test";


const char* getString(bool empty) {
  return empty ? "" : globalString;
}

With C++17 the getString() function gets translated into:

const char * getString(bool empty)
{
  return static_cast<const char *>(empty ? Str("") : Str(globalString).operator const char *());
}

I would argue that a parenthesis is missing and it should look like this for both result types of the ternary expression to match:

const char * getString(bool empty)
{
  return static_cast<const char *>((empty ? Str("") : Str(globalString)).operator const char *());
}

Also when setting the compilation option to anything below C++17, then the result looks even weirder:

const char * getString(bool empty)
{
  return static_cast<const char *>(empty ? Str(Str(Str(""))) : Str(globalString).operator const char *());
}
@andreasfertig andreasfertig added the bug Something isn't working label Jul 2, 2020
@andreasfertig
Copy link
Owner

Hello @lSoleyl,

that is a nice one. Thanks for spotting and reporting this issue.

I agree that the expression needs some additional parenthesis. A fix for that is on its way.

To you second part, and thanks for reporting the C++17 version first, this is correct. However, I agree that it looks wired. I assume this was either a bug in Clang or the language rules changed with C++17. The additional Str we see here are in fact created during AST parsing. But they are annotated as elidable. You can see it in the AST dump here. The final result of the C++17 and C++14 version are the same as you can see in the link as well. It is one of the things C++ Insights reveals :-)

Andreas


The following is for reference for others and myself. This entire code surprised me, because my initial thought was that even the Str("") is a bug. At this point there is already a const char* so no need to create a Str object. It seems to me, that the tenary-operator (at least in Clang) seems to favor LHS and RHS of the same type. In this case due to the constructor of Str being also eligible as a conversion constructor the compiler finds a way to construct a Str from a const char*. This way is also needed when globalString is constructed by assigning the value. If I change the code slightly, adding explicit to the single argument constructor and using direct-initialization either with curly braces or parenthesis the resulting code comes without the unnecessary temporary construction of Str(""). I wasn't aware of that myself and for a more expensive object I consider it a performance bug. So note to myself, always try to make single argument constructors explicit.

struct Str {
  explicit Str(const char* string) : string(string) {}
  
  operator const char*() const { 
    return string; 
  }
  
  const char* string;
};

Str globalString{"test"};


const char* getString(bool empty) {
  return empty ? "" : globalString;
}

You can test it here: cppinsights.io/s/2f2b90f3

@lSoleyl
Copy link
Author

lSoleyl commented Jul 2, 2020

Thanks for the quick response and the link to the compiler explorer... I didn't knew about this one.
As to your quick note:

This entire code surprised me, because my initial thought was that even the Str("") is a bug. At this point there is already a const char* so no need to create a Str object. It seems to me, that the tenary-operator (at least in Clang) seems to favor LHS and RHS of the same type.

This example was originally crafted to be a C++ quiz to explain why getString() results in UB no matter whether empty is set or not, but in the original example Str also owns the string and deallocates it in it's destructor (I left out this part here to not make the example more complicated than necessary). This behavior should be the same across all standard conforming compilers.

The actual explanation for the construction of Str("") is that both expressions must be converted into the same type. And we have:

  return empty ? [const char[1]]"" : [Str&]globalString;
//--> Both types are not equal and there is no conversion from Str& to const char[1] (const char* is not considered equal here)
//    So following conversion path is chosen:
  return empty ? [Str] Str("") : [Str&] globalString;
// --> Now there is still a mismatch because we have a temporary and an L-Value, which is fixed by copying the second expression
  return empty ? [Str] Str("") : [Str] Str(globalString); 
// --> And now the implicit conversion to const char*  can be called on the result to return the function's expected type
   return [const char*] (emtpy ? Str("") : Str(globalString)).operator const char*();

So now one can also see that if Str is a resource managing type like std::string then this will lead to UB, because we return a pointer to deleted memory.

In your second example with the explicit constructor, the compiler cannot apply any user defined conversions to the two expressions to match their types exactly and then just converts the globalString into const char* and since the compiler has no other choice, it then decays the const char[1] into a const char* to match up the types.

This way is also needed when globalString is constructed by assigning the value. If I change the code slightly, adding explicit to the single argument constructor and using direct-initialization either with curly braces or parenthesis the resulting code comes without the unnecessary temporary construction of Str(""). I wasn't aware of that myself and for a more expensive object I consider it a performance bug.

I just want to add that I don't think, this really is a performance issue in the globalString case. By examining the compiler's output assembly (here) we can see that the compiler doesn't generate a call to a copy constructor upon initialization. The generated assembly is the same for both initialization methods. I guess it is hard to argue about performance from just analyzing the AST as long as the compiler won't translate it 1:1 into assembly.

andreasfertig added a commit that referenced this issue Jul 2, 2020
@andreasfertig
Copy link
Owner

Hello @lSoleyl,

my pleasure.

This example was originally crafted to be a C++ quiz to explain why getString() results in UB no matter whether empty is set or not, but in the original example Str also owns the string and deallocates it in it's destructor (I left out this part here to not make the example more complicated than necessary). This behavior should be the same across all standard conforming compilers.

thanks for clarifying the origin of the code. I figured something like this.

I just want to add that I don't think, this really is a performance issue in the globalString case. By examining the compiler's output assembly (here) we can see that the compiler doesn't generate a call to a copy constructor upon initialization. The generated assembly is the same for both initialization methods. I guess it is hard to argue about performance from just analyzing the AST as long as the compiler won't translate it 1:1 into assembly.

Agreed. For that piece of code it doesn't matter. And yes, AST is way to early to see the final result. More or less no optimization takes place at this stage. My note was about a more expensive to setup object when the compile cannot optimize it away so easily.

Andreas

andreasfertig added a commit that referenced this issue Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants