Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary prefix in c++ strings #1674

Closed
dolfim opened this issue Jan 18, 2017 · 17 comments
Closed

Binary prefix in c++ strings #1674

dolfim opened this issue Jan 18, 2017 · 17 comments
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity

Comments

@dolfim
Copy link

dolfim commented Jan 18, 2017

I spent some time struggling with the binary prefix contained in c++ string objects. Here NULL characters are not used to terminate strings and can be actual content of a string.

The solution I found is:

#define MYPREFIX "/some/path/with/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla"
std::string p_wrong = MYPREFIX;
std::string p_correct = std::string(MYPREFIX).c_str();

In p_wrong the text is already placed in memory by the compiler and interpreted as a string of >255 characters. The correct version is casting the full text to a c string and calling the std::string constructor at runtime which is truncating correctly padding.

It is a pity that I had to patch the code for it (and the time invested in finding out the issue).

Is there another way for dealing with this issue?

@jakirkham
Copy link
Member

Here NULL characters are not used to terminate strings and can be actual content of a string.

I don't think that is true. 😕

Did you try enabling detect_binary_files_with_prefix in the recipe?

@dolfim
Copy link
Author

dolfim commented Feb 16, 2017

It is like that. It is actually possible to push NULL characters in c++ string, but they have to be added as chars.

#include <iostream>
#include <string>

int main() {
  std::string s = "abc\0def"; // this will contain only abc
  std::cout << s << std::endl;
  s.push_back('\0'); // now it contains also a NULL.
  std::cout << s << std::endl;
}

And yes, I already tried out the option. It is actually the binary replacement that created hard problems to debug, because NULLs are not trivial to see in the output ;-)

@msarahan
Copy link
Contributor

Sorry, I really don't understand how conda-build triggers this problem (not that it doesn't - just having a hard time understanding what a fix would be). Would it help if the prefix replacement didn't include the NULL character?

@mingwandroid
Copy link
Contributor

mingwandroid commented Feb 16, 2017

@dolfim: Perhaps you could write your code so that it does dynamic relocation? GetModuleFileNameA/W is the function(s) you'd most probably use to figure out the actual location (on Windows, for Linux, readlink("/proc/self/exe") and for macOS _NSGetExecutablePath())

@dolfim
Copy link
Author

dolfim commented Feb 16, 2017

Let me explain better what is going on. (I don't think that not having NULLs helps)

Some introduction:

  1. The std::string object contains the data and the length.
  2. Given the length, it will print everything it contains, also the NULLs.

Now, what happens with conda-build:

  1. The program is compiled with a long placeholder path.
  2. The compiler recognizes at compile time the length of the chars-array and set it as length inside the string object. The compiler sees the sequence "hello_placeholder" as type a static array chars[17] which has a fixed size.
  3. During the conda install process, the content of the binary gets replaced, but the length remains the same, i.e. something >255.
  4. At runtime the program will look at the length and print all the characters, even the NULLs.

In my example workaround I do the following:

  1. Return at runtime the content of the variable as chars-array. Since this is dynamic runtime it will be of type chars * (generic chars array, no fixed length).
  2. The assignment operator trigger the constructor of a new std::string. When building a string from a char array, the code loops through all the (c-style) characters until a NULL is found. --> This is the expected behavior of NULL padding.

I know it is very subtle as a problem, it took me some time to come up with this understanding...
@msarahan : is it a bit clearer now?

@dolfim
Copy link
Author

dolfim commented Feb 16, 2017

@mingwandroid : the function you propose isn't in the standard library, right?
The problem actually occurs on all platforms.

@mingwandroid
Copy link
Contributor

mingwandroid commented Feb 16, 2017

I'm proposing you set detect_binary_files_with_prefix: False in your recipe, then all the stuff that conda-build does and that conda install does doesn't happen.

Whatever folder you want relative to the prefix, you dynamically calculate based on those three functions. I'd be surprised if someone hasn't written a BSD or a Public Domain wrapper for getting the executable's (or shared library's) PATH using the methods I outlined above.

The functions I detailed are in %SYSTEMROOT%\System32\kernel32.dll, libc.so and /usr/lib/system/libdyld.dylib respectively, so in the lowest level system libraries of each platform, pretty much.

@dolfim
Copy link
Author

dolfim commented Feb 16, 2017

This is for sure a possibility, now I solved it already with the workaround I posted above.

Additionally, we also have a mechanism in place to provide the installation prefix via env variables which I set in the loading of the conda venv $PREFIX/etc/conda/activate.d/package

This is not anymore an issue for me, but it could for other users, therefore I didn't close it yet.

I leave it up to you to decide if to consider a fix, documenting the workaround, or do nothing.

@tovrstra
Copy link
Contributor

I'm running into this problem, when trying to make a conda recipe for cppcheck. See conda-forge/cppcheck-feedstock#8. Strings in the C++ standard library are indeed not null-terminated, which seems to be related to this issue. I'm not a developer of cppcheck, so I can't easily change the source code to avoid using std::string, e.g. in the following line:

https://github.com/danmar/cppcheck/blob/19e9e42dd7fc8ba327a2736cc091285d37faff80/lib/library.cpp#L99

I've only seen this problem with clangxx (on osx) and not with gxx. It seems that clangxx is compiling string literals directly into std::string objects, such that the constructor of std::string is not executed with an argument of type char[N]. That constructor would take care of the null characters. If that analysis is correct, clangxx has a bug, because according to the c++ standard:

" (unescaped_character|escaped_character)* " Narrow multibyte string literal. The type of an unprefixed string literal is const char[N], where N is the size of the string in code units of the execution narrow encoding, including the null terminator.

(See https://en.cppreference.com/w/cpp/language/string_literal) That effectively means that any string literal in C++ source code has to be a C-style null-terminated string, no matter how it is compiled.

@tovrstra
Copy link
Contributor

P.S. Is there a way to build everything on OSX with gcc instead of clang? That would be a simple solution.

@mingwandroid
Copy link
Contributor

It really would not be simple at all, trust me on that.

@tovrstra
Copy link
Contributor

ok, thanks. I'll open an issue on the cppcheck project then to suggest to support some form of dynamic relocation as you suggested above, which is in the end still the nicest solution.

@jakirkham
Copy link
Member

cc @SylvainCorlay @wolfv (in case this is relevant)

glormph added a commit to glormph/bioconda-recipes that referenced this issue Mar 11, 2020
BiocondaBot pushed a commit to bioconda/bioconda-recipes that referenced this issue Mar 11, 2020
Merge PR #20099, commits were: 
 * Patch percolator Globals.cpp to avoid null termination problem with the conda build/run process, see conda/conda-build#1674
 * Merge branch 'master' into perco_testmsgf
 * Add tandem2pin, msgf2pin tests to recipe, both failing
@mjuric
Copy link
Contributor

mjuric commented May 21, 2021

Just battled the same problem for a couple of hours, until I understood it sufficiently to know what to Google for. Just wanted to thank @dolfim for sharing the .c_str() hack. Incredibly helpful, thanks!

As the amount of C++ code (compiled w. modern compilers) grows, this issue may become more frequent. Is there a good place in the conda-build documentation to mention it? Or a way for the patcher to detect a potentially problematic binary and issue a warning?

@Zaharid
Copy link

Zaharid commented Jun 11, 2021

FWIW marking strings as volatile can help here. See #2524.

@github-actions
Copy link

Hi there, thank you for your contribution!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this issue to remain open please:

  1. Verify that you can still reproduce the issue at hand
  2. Comment that the issue is still reproducible and include:
    - What OS and version you reproduced the issue on
    - What steps you followed to reproduce the issue

NOTE: If this issue was closed prematurely, please leave a comment.

Thanks!

@github-actions github-actions bot added the stale [bot] marked as stale due to inactivity label Jul 21, 2022
@github-actions github-actions bot added the stale::closed [bot] closed after being marked as stale label Aug 21, 2022
@jakirkham
Copy link
Member

Can someone please reopen this issue? AFAIK it remains unsolved

timsnyder added a commit to regro-cf-autotick-bot/verilator-feedstock that referenced this issue Mar 12, 2023
DEFENV_VERILATOR_ROOT needs to be used as a C-string for conda's install-time
relocation patching to work. see also: conda/conda-build#1674
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Aug 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

7 participants