Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] normalize build path #5949

Open
bmwiedemann opened this issue Jan 16, 2024 · 7 comments
Open

[ENH] normalize build path #5949

bmwiedemann opened this issue Jan 16, 2024 · 7 comments

Comments

@bmwiedemann
Copy link

bmwiedemann commented Jan 16, 2024

Is your feature request related to a problem? Please describe.

While working on reproducible builds for openSUSE, I found that
our python-frozenlist and python-yarl packages vary in every build, because
https://github.com/aio-libs/yarl/blob/8ba2714/packaging/pep517_backend/_backend.py#L199
passes a random tmp path to Cython-3.0.8
and Cython embeds that in the produced .so files.

Describe the solution you'd like.

Cython could use gcc's -ffile-prefix-map= option (where available) to normalize these path-names.

Describe alternatives you've considered.

Cython could omit path names from output objects altogether, but debuginfo might still be affected from random path names.

Additional context

No response

@webknjaz
Copy link
Contributor

webknjaz commented Jan 16, 2024

passes a random tmp path to Cython-3.0.8

Clarification: it doesn't pass the path but creates a temporary directory, copies the project there and does chdir before invoking Cython.

@webknjaz
Copy link
Contributor

@bmwiedemann could you update master to a commit hash in the link so it doesn't break if refactored?

@juliangilbey
Copy link

I think it's a little subtler than that; somewhere in the Cython/Compiler directory, there is code which creates the names of variables in the generated C++ code. Somehow, the absolute filename path is embedded in some of the created variables. A solution to this problem would be to have something like an environment variable CYTHON_PROJECT_ROOT, and if this this set to a path, this part of the generated name would be changed to projectroot, or perhaps there would also have to be a second environment variable CYTHON_PROJECT_NAME, and then the root path would instead be changed to, say, <project>_root or root_<project>. But I don't know exactly where the pathname is picked up for naming the variables, so I can't propose a PR at this point.

@juliangilbey
Copy link

A little more on this. Looking at the results of compiling RapidFuzz, I see the following:

  • The line include "fuzz_cpp.pyx" in fuzz_cpp_avx2.cxx gives rise to these lines of code in the resulting .cxx file:
static const char __pyx_k_home_jdg_debian_spyder_packages[] = "/home/jdg/debian/spyder-packages/rapidfuzz/build-area/rapidfuzz-3.6.2+ds/src/rapidfuzz/fuzz_cpp.pyx";
  PyObject *__pyx_kp_s_home_jdg_debian_spyder_packages;
  Py_CLEAR(clear_module_state->__pyx_kp_s_home_jdg_debian_spyder_packages);
  Py_VISIT(traverse_module_state->__pyx_kp_s_home_jdg_debian_spyder_packages);
#define __pyx_kp_s_home_jdg_debian_spyder_packages __pyx_mstate_global->__pyx_kp_s_home_jdg_debian_spyder_packages
...

and several more.

fuzz_cpp.pyx itself, without an include line, gives rise to very similar lines.

So there are two places where the path is embedded in the resulting C++ files: it is in the variable names, and it is in the value of the constant string __pyx_k_<truncated_filepath>[].

@webknjaz
Copy link
Contributor

webknjaz commented Mar 6, 2024

@juliangilbey is your suggestion that the Cython users should have a way of passing such a path or Cython itself?

@juliangilbey
Copy link

Hi @webknjaz,
My thought is that it could be Cython users, but if Cython itself can do it, even better. I don't understand the inner workings of Cython well enough to understand why there is a need to embed the full path in the generated .cxx files, and for variables to be named according to the path prefix (first 32 characters). If that can be avoided, it is presumably far better; it also prevents personal data leakage if someone builds a Cython-based package and then releases it without realising that the build paths are hard-coded in the resulting object files.

@webknjaz
Copy link
Contributor

webknjaz commented Mar 6, 2024

I think parts of this are used for tracing (which is how coverage can be measured, for example). But I don't know what prevents Cython from using relative paths for that..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants