Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle C++ raw string in a single token #7

Open
asmwarrior opened this issue Feb 19, 2024 · 3 comments
Open

handle C++ raw string in a single token #7

asmwarrior opened this issue Feb 19, 2024 · 3 comments

Comments

@asmwarrior
Copy link

Hi, from the page: https://en.cppreference.com/w/cpp/language/string_literal

There are many kinds of C++ raw strings, while I see the lexer/preprocessor should handle them as a single token. While currently they are handled as separate tokens, for example:

int U = 3;
const char32_t* s7 = U"GHIJKL";

In the above cases, the "U" will be parsed as a single Token, and with the same Token id.

Thanks.

@asmwarrior
Copy link
Author

It looks like gcc handle the raw string in the preprocessor, see here as a reference: 55971 – Preprocessor macros with C++11 raw string literals fail to compile

@GrieferAtWork
Copy link
Owner

GrieferAtWork commented Feb 20, 2024

Jup. You're right. C++ raw strings (like R"(foo)") are a feature TPP doesn't support as of right now.

The only thing that comes close is TPP_CONFIG_RAW_STRING_LITERALS, but those aren't c++ raw strings (and I don't recomend you use those instead, as they're deemon raw strings, which work a bit differently; don't forget that tpp is for C and "C-like" languages).

If it's any consolation to you, I've also been planning to add """ block string """ support (like you have in Java or Python) too for some time now, so I guess I'll just put all those c++11-style string literals onto my mental TODO list as well.

So: will be implemented eventually (but no promises as to when).

But: u"foo" isn't a "raw" string; that's a unicode string, and if that's all you want, you should define a keyword DEF_K(u) and then handle that in your programming language's token processor as case KWD_u: if (*TPPLexer_Current->l_token.t_end == '"') { /* unicode string */ }, to essentially check for a u keyword, which is immediatly followed by a double-quote. I do something similar to implement template strings (local x = 10; local y = f"value of x is {x}";) in deemon

@asmwarrior
Copy link
Author

Hi, thanks for the detailed explanation.

My interest about learning some C-preprocessor code is to improve the embedded parser(To fetch some Symbols in the source files) inside the Code::Blocks.
There are not much free/open source C-preprocessor in the world, though Clang can parse them, but it has too big code base, also it is slow. Some similar tools like:

universal-ctags/ctags: A maintained ctags implementation

danmar/cppcheck: static analysis of C/C++ code with its preprocessor danmar/simplecpp: C++ preprocessor

robertoraggi / cplusplus

A preprocessor is a very low level tool base to supply a token stream to the high level parsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants