Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid C++ identifiers #387

Closed
klmr opened this issue Oct 30, 2015 · 24 comments
Closed

Fix invalid C++ identifiers #387

klmr opened this issue Oct 30, 2015 · 24 comments

Comments

@klmr
Copy link

klmr commented Oct 30, 2015

Rcpp uses __ as a prefix for identifiers in auto-generated code (e.g. here and here). This is problematic, since __ is not not valid in C++ identifiers (it’s reserved for use by the compiler).

Such identifiers are for instance used by standard library implementations. And although most modern implementations try to avoid aggravating users, and so use more explicit conventions in their macro names, a simple grep through the libc++ implementation shows a few macros with names such as __need_size_t (and more in libstdc++), so it’s not unthinkable that there might be conflicts with future versions of some compilers.

It would be better to replace these instances with a legal identifier token, e.g. Rcpp_.

@eddelbuettel
Copy link
Member

Thanks for the note, Konrad.

That is attributes.cpp only and hence well localised. I would encourage you (as the one with an itch to scratch) to work on a pull request if you want to see change "soon" as we all are a tad too busy to write code on demand for what I'd call non-bugs.

We effectively live in a g++/clang++ world only even though the occasional user may edploy the Intel compiler or work on some non-standard OS besides Windoof, OS X and Linux. And in that world this just doesn't seem to have bitten so shrugs ...

@jjallaire
Copy link
Member

We used those prefixes specifically because they are reserved for the
compiler (and thus extremely unlikely to conflict with user variable
names). Point well taken though that we might end up conflicting with the
compiler as a result! You can in some ways view compileAttributes
pre-processing as effectively part of the (extended) compilation and thus
not at all an inappropriate place to use __ prefixes.

On Fri, Oct 30, 2015 at 12:09 PM, Dirk Eddelbuettel <
notifications@github.com> wrote:

Thanks for the note, Konrad.

That is attributes.cpp only and hence well localised. I would encourage
you (as the one with an itch to scratch) to work on a pull request if you
want to see change "soon" as we all aer atad to budy to write code on
demand for what I'd call non-bugs.

We effectively live in a g++/clang++ world only even though the occasional
user may edploy the Intel compiler or work on some non-standard OS besides
Windoof, OS X and Linux. And in that world this just doesn't seem to have
bitten so shrugs ...


Reply to this email directly or view it on GitHub
#387 (comment).

@klmr
Copy link
Author

klmr commented Oct 30, 2015

@jjallaire I agree that a prefix is entirely appropriate (and arguably even necessary) here.

I can try my hand at a PR.

@eddelbuettel
Copy link
Member

I read the comment differently from you, @klmr :

You can [...] view compileAttributes pre-processing as [...] part of the [...] compilation and thus
not at all an inappropriate place to use __ prefixes.

Looks like we're not yet convinced we need it but heck, if you submit something we will look at it.

@klmr
Copy link
Author

klmr commented Oct 30, 2015

@eddelbuettel I got that part, but the “compilation” done by Rcpp isn’t part of the C++ standard compilation workflow so its use of __ isn’t covered by the respective section in the standard (the section explicitly covers only the “C++ implementations and standard libraries”). As far as the C++ standard is concerned, Rcpp is just any old user.

@jjallaire
Copy link
Member

Understood, but with C++11 attributes are often used as code generators
that are called during the compilation phase and are thus seen as "compiler
add-ons". So I don't think it's a huge stretch for these compiler add-ons
to use symbols reserved for the compiler (but I agree you are correct in
saying it's not technically standards compliant).

On Fri, Oct 30, 2015 at 1:55 PM, Konrad Rudolph notifications@github.com
wrote:

@eddelbuettel https://github.com/eddelbuettel I got that part, but the
“compilation” done by Rcpp isn’t part of the C++ standard compilation
workflow so its use of __ isn’t covered by the respective section in the
standard. As far as the C++ standard is concerned, Rcpp is just any old
user.


Reply to this email directly or view it on GitHub
#387 (comment).

@kevinushey
Copy link
Contributor

We could also hide the implementation in a separate namespace, e.g.

CharacterVector fun(RObject data);
namespace Rcpp {
namespace generated {
inline SEXP pkg_fun(SEXP dataSEXP) {
    // impl
}
}
}

RcppExport SEXP pkg_fun(SEXP dataSEXP) {
    return Rcpp::generated::pkg_fun(dataSEXP);
}

Or, just consider using e.g. an rcpp_ prefix (and maybe even a _gen_ postfix)

@klmr
Copy link
Author

klmr commented Oct 30, 2015

In this particular case I think none of these options are technically necessary since it’s an isolated compilation unit and scope (except that it needs to avoid clashing with function parameters, of course). In general, though, the idea of using a prefix is to avoid name clashes with global macros — even if that shouldn’t happen here it’s probably a good policy.

@coatless
Copy link
Contributor

coatless commented Aug 1, 2016

@eddelbuettel

While I'm in the Attributes.cpp file for #526, would it be okay to change the __ prefix to rcpp_ and attach a suffix _gen_ per @kevinushey 's suggestion?

(Tagging #506)

@dcdillon
Copy link
Contributor

dcdillon commented Aug 2, 2016

Yes. And change every include guard to single underscores while you're at it.

@dcdillon
Copy link
Contributor

dcdillon commented Aug 2, 2016

Or better yet #pragma once

@jjallaire
Copy link
Member

I don't know #pragma once is going to work on every compiler we end up
getting built with. It's also not actually part of the standard (not sure
if CRAN will have a problem with that or not). There are also some
subtleties you need to be aware of related to how files are uniquely
identified (https://en.wikipedia.org/wiki/Pragma_once). I'd suggest erring
on the side of conservatism and leaving the include guards alone.

On Mon, Aug 1, 2016 at 8:15 PM, Daniel C. Dillon notifications@github.com
wrote:

Or better yet #pragma once


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#387 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAGXx40Id6Idgl4mwc6RZxb5dUpLOCNbks5qbowjgaJpZM4GZGlo
.

@coatless
Copy link
Contributor

coatless commented Aug 2, 2016

@jjallaire took the words from my mouth as I was transcribing them.

I'm planning on also modifying:

std::string CppPackageIncludeGenerator::getHeaderGuard() const {
     return "__" + packageCpp() + "_h__";
}

@dcdillon
Copy link
Contributor

dcdillon commented Aug 2, 2016

Heh. Well single underscores and an RCPP prefix would be best

@dcdillon
Copy link
Contributor

dcdillon commented Aug 2, 2016

At any rate a good rule of thumb I have found for include guards is PACKAGE_NAMESPACE_NAMESPACE_NAMESPACE_FILE_H

@kevinushey
Copy link
Contributor

kevinushey commented Aug 2, 2016

Technically, all identifiers even containing two underscores are considered reserved by the C++ standard, so we shouldn't use those either. From http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier:

17.4.3.2.1 Global names [lib.global.names]

Certain sets of names and function signatures are always reserved to the implementation:

Each name that contains a double underscore (_ _) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.165
165) Such names are also reserved in namespace ::std (17.4.3.1).

Isn't C++ fun? :) But let's not bikeshed ourselves into oblivion...

@eddelbuettel
Copy link
Member

In favour of

I'd suggest erring on the side of conservatism and leaving the include guards alone.

as there is no identified problem with them.

Ditto against

Or better yet #pragma once

as I don't buy into change for change's sake.

@dcdillon
Copy link
Contributor

dcdillon commented Aug 2, 2016

I can tell you that #pragma once, while not technically portable, is pretty much industry standard. That being said, industry doesn't use joeschickenandcppcompiler because it has the a license that permits use while ice fishing with natives of the tropics.

@coatless
Copy link
Contributor

coatless commented Aug 2, 2016

So, to conclude, go with the prefix of:

RCPP_            # option 1
RCPP_CORE_ # option 2

Skip the suffix/postfix of _gen_?

Make sure there are no __ or _[A-Z].

@eddelbuettel
Copy link
Member

I like option 1 and it is machine generated why not add _gen_ as well. No strong feelings either way.

@coatless
Copy link
Contributor

coatless commented Aug 2, 2016

For posterity the invalid identifier are.... (drum roll...)

Variables

  • __result => RCPP_result_gen_
  • __rngScope =>RCPP_rngScope_gen_
  • __isInterrupt => RCPP_isInterrupt_gen_
  • __isError => RCPP_isError_gen_
  • __msgSEXP => RCPP_msgSEXP_gen_

Functions

From:

std::string CppPackageIncludeGenerator::getHeaderGuard() const {
    return "__" + packageCpp() + "_h__";
}

To:

std::string CppPackageIncludeGenerator::getHeaderGuard() const {
    return "RCPP_" + packageCpp() + "_h_gen_";
}

From:

std::string CppExportsIncludeGenerator::getHeaderGuard() const {
    return "__" + packageCpp() + "_RcppExports_h__";
}

To:

std::string CppExportsIncludeGenerator::getHeaderGuard() const {
    return "RCPP_" + packageCpp() + "_RcppExports_h_gen_";
}

@eddelbuettel
Copy link
Member

Is it just me or does anybody else think that

Variables

  • __result => rcpp_result_gen_
  • __rngScope =>rcpp_rngScope_gen_
  • __isInterrupt => rcpp_isInterrupt_gen_
  • __isError => rcpp_isError_gen_
  • __msgSEXP => rcpp_msgSEXP_gen_

might be nicer?

@jjallaire
Copy link
Member

Yes, I'd agree that lower case is nicer here (and without the trailing
underscore).

On Tue, Aug 2, 2016 at 6:19 AM, Dirk Eddelbuettel notifications@github.com
wrote:

Is it just me or does anybody else think that
Variables

  • _result => rcpp_result_gen
  • _rngScope =>rcpp_rngScope_gen
  • _isInterrupt => rcpp_isInterrupt_gen
  • _isError => rcpp_isError_gen
  • _msgSEXP => rcpp_msgSEXP_gen

might be nicer?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#387 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAGXxzQHUfuyl9akf92ZQUwSAg7JrSxMks5qbxmwgaJpZM4GZGlo
.

eddelbuettel added a commit that referenced this issue Aug 3, 2016
Auto generation Warnings & Invalid C++ Identifiers (Closes #526 and #387)
@coatless
Copy link
Contributor

coatless commented Aug 3, 2016

@eddelbuettel : Please close this as it was apart of PR #528 .

It looks like automatic only works on 1 issue id at a time?

(Tagging #506).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants