[LLVM] Support CodeGenBlob for large >2GB models#10882
[LLVM] Support CodeGenBlob for large >2GB models#10882FrozenGene merged 1 commit intoapache:mainfrom
Conversation
junrushao
left a comment
There was a problem hiding this comment.
Thanks @apivovarov for this PR. LGTM!
|
Could we use |
|
GCC has the following x86-Options related to 2GB limit issue on x86_64 Linux platform I checked the difference in object (.o) and asm (.s) files generated by gcc. The diff is in the section name for Read-Only Data. mcmodel small uses LLVM/clang also has LLVM/clang mostly use The workaround for clang and LLVM x86 is to explicitly put section name for the large const pre-defined arrays. For example: C/C++: LLVM API: |
1511536 to
25859ec
Compare
25859ec to
a8f762d
Compare
Summary:
I tried to compile test model having 2GB+ params.
Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform)
Currently we convert `constants.bin` to `constants.obj` using
```
ld -r -b binary -o constants.obj constants.bin
```
`objdump` shows that `_binary_constants_bin` array was placed to `.data` section.
`.data` and `.rodata` sections can not allocate more that 2GB.
To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit.
We can do it by using
```
objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj
```
to rename `.data` section in `constants.obj` file to `.lrodata`.
Before
```
$ objdump -x constants_old.obj
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 8b8577e0 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size
000000008b8577e0 g .data 0000000000000000 _binary_constants_bin_end
0000000000000000 g .data 0000000000000000 _binary_constants_bin_start
```
After:
```
$ objdump -x constants.obj
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .lrodata 8b8577e0 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
0000000000000000 l d .lrodata 0000000000000000 .lrodata
000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size
000000008b8577e0 g .lrodata 0000000000000000 _binary_constants_bin_end
0000000000000000 g .lrodata 0000000000000000 _binary_constants_bin_start
```
Large (>2GB) `model.so`
```
-rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so
```
Related Links:
- [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/)
- [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882)
Pull Request resolved: #520
Reviewed By: alexanderguzhva
Differential Revision: D44533582
Pulled By: chenyang78
fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d
|
@apivovarov Thanks, I use this patch to test SD unet, it indeed support large >2GB models(mold instead of ld linker could also support it), but both this patch and mold linker still truncate when more than 3.1G(for may case, this patch produces a 3.1G unet.so, save more tensors than mold), I don't know if somebody else also meet this issue... print when save: const tensor 606 print when load: const tensor 606 |
|
@wanglei91 we recommend using lift transform params for large parameters, as in https://github.com/mlc-ai/web-stable-diffusion. |
|
@tqchen Got it, much appreciated for the reply! |
Summary:
I tried to compile test model having 2GB+ params.
Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform)
Currently we convert `constants.bin` to `constants.obj` using
```
ld -r -b binary -o constants.obj constants.bin
```
`objdump` shows that `_binary_constants_bin` array was placed to `.data` section.
`.data` and `.rodata` sections can not allocate more that 2GB.
To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit.
We can do it by using
```
objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj
```
to rename `.data` section in `constants.obj` file to `.lrodata`.
Before
```
$ objdump -x constants_old.obj
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 8b8577e0 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size
000000008b8577e0 g .data 0000000000000000 _binary_constants_bin_end
0000000000000000 g .data 0000000000000000 _binary_constants_bin_start
```
After:
```
$ objdump -x constants.obj
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .lrodata 8b8577e0 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
0000000000000000 l d .lrodata 0000000000000000 .lrodata
000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size
000000008b8577e0 g .lrodata 0000000000000000 _binary_constants_bin_end
0000000000000000 g .lrodata 0000000000000000 _binary_constants_bin_start
```
Large (>2GB) `model.so`
```
-rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so
```
Related Links:
- [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/)
- [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882)
Pull Request resolved: facebookincubator/AITemplate#520
Reviewed By: alexanderguzhva
Differential Revision: D44533582
Pulled By: chenyang78
fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d
If large const data (>2GB) is saved to default
.rodatasection then linking it to shared library will fail - relocation truncated to fit: R_X86_64_PC32.The issue exists on Linux x86_64 platform.
GCC puts large const data to .lrodata segment if -mcmodel=medium parameter is used, but LLVM ignores it.
The workaround is to explicitly put large const data to
.lrodatasection.Lets put const data which is larger than 1GB to
.lrodatasection.Model compilation and execution was tested with large MXNet model - it was compiled using TVM-TensorRT:
gen-large-mxnet-model-py
compile-mxnet-tvm-trt-py
Serialized TensorRT TVM Module was saved to
model.sofile. (size is greater than 2GB).