[LLVM] Support CodeGenBlob for large >2GB models by apivovarov · Pull Request #10882 · apache/tvm

apivovarov · 2022-04-02T04:06:35Z

If large const data (>2GB) is saved to default .rodata section then linking it to shared library will fail - relocation truncated to fit: R_X86_64_PC32.
The issue exists on Linux x86_64 platform.
GCC puts large const data to .lrodata segment if -mcmodel=medium parameter is used, but LLVM ignores it.
The workaround is to explicitly put large const data to .lrodata section.
Lets put const data which is larger than 1GB to .lrodata section.

Model compilation and execution was tested with large MXNet model - it was compiled using TVM-TensorRT:
gen-large-mxnet-model-py

compile-mxnet-tvm-trt-py

Serialized TensorRT TVM Module was saved to model.so file. (size is greater than 2GB).

2,361,256,184  model.so

junrushao

Thanks @apivovarov for this PR. LGTM!

FrozenGene · 2022-04-02T09:02:59Z

Could we use setCodeModel to solve this issue ? @apivovarov

apivovarov · 2022-04-03T21:55:31Z

GCC has the following x86-Options related to 2GB limit issue on x86_64 Linux platform

-mlarge-data-threshold=threshold
When -mcmodel=medium is specified, data objects larger than threshold are placed in the large data section. This value must be the same across all objects linked into the binary, and defaults to 65535.

-mcmodel=medium
Generate code for the medium model: the program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked.

I checked the difference in object (.o) and asm (.s) files generated by gcc. The diff is in the section name for Read-Only Data. mcmodel small uses .rodata and mcmodel medium uses .lrodata for const pre-defined data large than 64K.

LLVM/clang also has -mcmodel param but it does not have -mlarge-data-threshold param.
I tried to use -mcmodel=medium parameter with llvm/clang on x86_64 platform - the generated object, asm and llvm files are the same for -mcmodel=small and -mcmodel=medium. This is why linking object file .o to .so fails with error relocation truncated to fit: R_X86_64_PC32

LLVM/clang mostly use mcmodel parameter on RISC-V and XCore architectures.
mcmodel parameter is ignored on X86 platform.

The workaround for clang and LLVM x86 is to explicitly put section name for the large const pre-defined arrays. For example:

C/C++:

__attribute__((section(".lrodata")))
extern const long long int B100[131072];
const long long int B100[131072] = {...};

LLVM API:

auto* B100 = new llvm::GlobalVariable(...)
B100->setSection(".lrodata");

Summary: I tried to compile test model having 2GB+ params. Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform) Currently we convert `constants.bin` to `constants.obj` using ``` ld -r -b binary -o constants.obj constants.bin ``` `objdump` shows that `_binary_constants_bin` array was placed to `.data` section. `.data` and `.rodata` sections can not allocate more that 2GB. To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit. We can do it by using ``` objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj ``` to rename `.data` section in `constants.obj` file to `.lrodata`. Before ``` $ objdump -x constants_old.obj architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .data 8b8577e0 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA SYMBOL TABLE: 0000000000000000 l d .data 0000000000000000 .data 000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size 000000008b8577e0 g .data 0000000000000000 _binary_constants_bin_end 0000000000000000 g .data 0000000000000000 _binary_constants_bin_start ``` After: ``` $ objdump -x constants.obj architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .lrodata 8b8577e0 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA SYMBOL TABLE: 0000000000000000 l d .lrodata 0000000000000000 .lrodata 000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size 000000008b8577e0 g .lrodata 0000000000000000 _binary_constants_bin_end 0000000000000000 g .lrodata 0000000000000000 _binary_constants_bin_start ``` Large (>2GB) `model.so` ``` -rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so ``` Related Links: - [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/) - [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882) Pull Request resolved: #520 Reviewed By: alexanderguzhva Differential Revision: D44533582 Pulled By: chenyang78 fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d

wanglei91 · 2023-05-06T09:10:58Z

@apivovarov Thanks, I use this patch to test SD unet, it indeed support large >2GB models(mold instead of ld linker could also support it), but both this patch and mold linker still truncate when more than 3.1G(for may case, this patch produces a 3.1G unet.so, save more tensors than mold), I don't know if somebody else also meet this issue...

print when save:
const tensor 605
header: 15951258332257624383
cpu_dev.device_type: 1
tensor->ndim: 3
shape0: 16
shape1: 4096
shape2: 77

const tensor 606
header: 15951258332257624383
cpu_dev.device_type: 1
tensor->ndim: 3
shape0: 640
shape1: 1
shape2: 1

print when load:
const tensor 605
header: 15951258332257624383
shape0: 16
shape1: 4096
shape2: 77

const tensor 606
header: 160445370461692650
include/tvm/runtime/ndarray.h", line 512
TVMError: Check failed: (header == kTVMNDArrayMagic) is false: Invalid DLTensor file format

tqchen · 2023-05-06T14:26:13Z

@wanglei91 we recommend using lift transform params for large parameters, as in https://github.com/mlc-ai/web-stable-diffusion.

cc @vinx13 @psrivas2 , this is also useful information

wanglei91 · 2023-05-08T02:26:41Z

@tqchen Got it, much appreciated for the reply!

Summary: I tried to compile test model having 2GB+ params. Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform) Currently we convert `constants.bin` to `constants.obj` using ``` ld -r -b binary -o constants.obj constants.bin ``` `objdump` shows that `_binary_constants_bin` array was placed to `.data` section. `.data` and `.rodata` sections can not allocate more that 2GB. To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit. We can do it by using ``` objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj ``` to rename `.data` section in `constants.obj` file to `.lrodata`. Before ``` $ objdump -x constants_old.obj architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .data 8b8577e0 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA SYMBOL TABLE: 0000000000000000 l d .data 0000000000000000 .data 000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size 000000008b8577e0 g .data 0000000000000000 _binary_constants_bin_end 0000000000000000 g .data 0000000000000000 _binary_constants_bin_start ``` After: ``` $ objdump -x constants.obj architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .lrodata 8b8577e0 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA SYMBOL TABLE: 0000000000000000 l d .lrodata 0000000000000000 .lrodata 000000008b8577e0 g *ABS* 0000000000000000 _binary_constants_bin_size 000000008b8577e0 g .lrodata 0000000000000000 _binary_constants_bin_end 0000000000000000 g .lrodata 0000000000000000 _binary_constants_bin_start ``` Large (>2GB) `model.so` ``` -rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so ``` Related Links: - [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/) - [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882) Pull Request resolved: facebookincubator/AITemplate#520 Reviewed By: alexanderguzhva Differential Revision: D44533582 Pulled By: chenyang78 fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d

junrushao approved these changes Apr 2, 2022

View reviewed changes

apivovarov force-pushed the large_data_apache branch 2 times, most recently from 1511536 to 25859ec Compare April 4, 2022 18:39

[LLVM] Support CodeGenBlob for large >2GB models on x86

a8f762d

apivovarov force-pushed the large_data_apache branch from 25859ec to a8f762d Compare April 4, 2022 18:39

apivovarov mentioned this pull request Apr 4, 2022

[LLVM] Support CodeGenBlob for large >2GB models neo-ai/tvm#249

Merged

FrozenGene merged commit ceed331 into apache:main Apr 5, 2022

pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022

[LLVM] Support CodeGenBlob for large >2GB models on x86 (apache#10882)

3bd4926

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022

[LLVM] Support CodeGenBlob for large >2GB models on x86 (apache#10882)

d7737c2

Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022

[LLVM] Support CodeGenBlob for large >2GB models on x86 (apache#10882)

9c595bf

apivovarov mentioned this pull request Mar 30, 2023

Support models with 2GB+ params facebookincubator/AITemplate#520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM] Support CodeGenBlob for large >2GB models#10882

[LLVM] Support CodeGenBlob for large >2GB models#10882
FrozenGene merged 1 commit intoapache:mainfrom
apivovarov:large_data_apache

apivovarov commented Apr 2, 2022 •

edited

Loading

Uh oh!

junrushao left a comment

Uh oh!

FrozenGene commented Apr 2, 2022

Uh oh!

apivovarov commented Apr 3, 2022 •

edited

Loading

Uh oh!

wanglei91 commented May 6, 2023 •

edited

Loading

Uh oh!

tqchen commented May 6, 2023

Uh oh!

wanglei91 commented May 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

apivovarov commented Apr 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junrushao left a comment

Choose a reason for hiding this comment

Uh oh!

FrozenGene commented Apr 2, 2022

Uh oh!

apivovarov commented Apr 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wanglei91 commented May 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented May 6, 2023

Uh oh!

wanglei91 commented May 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apivovarov commented Apr 2, 2022 •

edited

Loading

apivovarov commented Apr 3, 2022 •

edited

Loading

wanglei91 commented May 6, 2023 •

edited

Loading