Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch MacOS x86 fail: section __TEXT/__const address out of range for architecture x86_64 when building NNPACK #119

Closed
kulinseth opened this issue May 24, 2022 · 13 comments

Comments

@kulinseth
Copy link

The PyTorch MacOS build with NNPack is failing with: section __TEXT/__const address out of range for architecture x86_64

When upgrading the Xcode to latest 13.3.1, we see this behavior.
The difference between Xcode 13.2.1 and 13.3 is that there are more boundary checks to prevent OOB reads.

The conv1x1.yp.o object file has malformed load commands:
$ size -mlx conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
total 0x36f
total 0x36f
__const section starts at 0x300, and ends at 0x380, which exceeds the __TEXT segment size (0x36f).
There is manually generated object file using the third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py script. Can we regenerate the object file with the latest Xcode to make sure this bug is fixed and there is no OOB access.

@dbl001
Copy link

dbl001 commented May 25, 2022

I'm trying to build PeachPy but I am getting:

src_dir = os.path.abspath(self.distribution.package_dir[""])
KeyError: ''

#118

@dbl001
Copy link

dbl001 commented May 25, 2022

It's working now ... ;-)
I ran

$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py

clang

% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Is this correct?

% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x380
(base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
-rw-r--r--  1 davidlaxer  staff  1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o

Trying to build PyTorch next.

@kulinseth
Copy link
Author

It's working now ... ;-) I ran

$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py

clang

% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Is this correct?

% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x380
(base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
-rw-r--r--  1 davidlaxer  staff  1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o

Trying to build PyTorch next.

Thanks @dbl001 for taking a look. Make sure the command line is same as what is passed to clang when building PyTorch. You can get that in verbose mode.

@malfet
Copy link

malfet commented May 25, 2022

I wonder if this was already fixed by f8ef1a3

@malfet
Copy link

malfet commented May 25, 2022

@Maratyszcza do you mind merging latest master into pre-generated branch? Or should I just fork it and maintain it myself for PyTorch?

@Maratyszcza
Copy link
Owner

@malfet Create a pull request, and I'll merge

@dbl001
Copy link

dbl001 commented May 26, 2022

My PyTorch build failed again. What's -g4?

cd /Users/davidlaxer/pytorch/build/confu-deps/NNPACK && PYTHONPATH=/Users/davidlaxer/pytorch/third_party/python-six:/Users/davidlaxer/pytorch/third_party/python-peachpy

Users/davidlaxer/anaconda3/bin/python -m peachpy.x86_64 -mabi=sysv -g4 -mimage-format=mach-o -I/Users/davidlaxer/pytorch/third_party/NNPACK/src -I/Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma -I/Users/davidlaxer/pytorch/third_party/FP16/include -o /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o /Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py
...
ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
...
 % size -mlx  /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x36f

find . -name 'peachpy*' -ls
230205417        0 drwxr-xr-x   21 davidlaxer       staff                 672 May 20 07:37 ./third_party/python-peachpy/peachpy
230205416       40 -rw-r--r--    1 davidlaxer       staff               19560 May 19 10:13 ./third_party/python-peachpy/logo/peachpy.png
230205498        8 -rw-r--r--    1 davidlaxer       staff                 864 May 19 10:13 ./third_party/python-peachpy/sphinx/peachpy.rst
230166126        0 drwxr-xr-x    5 davidlaxer       staff                 160 May 19 10:12 ./third_party/FP16/test/peachpy

@malfet
Copy link

malfet commented May 26, 2022

@dbl001 have you updated the submodules? I've landed the change like an hour ago, that should have fixed that

@malfet
Copy link

malfet commented May 26, 2022

@malfet Create a pull request, and I'll merge

Here it is #120
I'm not really sure if PyTorch build system still relies on pre-generated branch (or why this script can not be run during the build process, but whatever)

@dbl001
Copy link

dbl001 commented May 26, 2022

Is it still pinned to the old version in PyTorch?
E.g. #76094

@malfet
Copy link

malfet commented May 26, 2022

Is it still pinned to the old version in PyTorch? E.g. #76094

I'm not sure I understand the question

@dbl001
Copy link

dbl001 commented May 26, 2022 via email

@kulinseth
Copy link
Author

I think this has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants