Skip to content

[LLVM] Support CodeGenBlob for large >2GB models#10882

Merged
FrozenGene merged 1 commit intoapache:mainfrom
apivovarov:large_data_apache
Apr 5, 2022
Merged

[LLVM] Support CodeGenBlob for large >2GB models#10882
FrozenGene merged 1 commit intoapache:mainfrom
apivovarov:large_data_apache

Conversation

@apivovarov
Copy link
Contributor

@apivovarov apivovarov commented Apr 2, 2022

If large const data (>2GB) is saved to default .rodata section then linking it to shared library will fail - relocation truncated to fit: R_X86_64_PC32.
The issue exists on Linux x86_64 platform.
GCC puts large const data to .lrodata segment if -mcmodel=medium parameter is used, but LLVM ignores it.
The workaround is to explicitly put large const data to .lrodata section.
Lets put const data which is larger than 1GB to .lrodata section.

Model compilation and execution was tested with large MXNet model - it was compiled using TVM-TensorRT:
gen-large-mxnet-model-py

compile-mxnet-tvm-trt-py

Serialized TensorRT TVM Module was saved to model.so file. (size is greater than 2GB).

2,361,256,184  model.so

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apivovarov for this PR. LGTM!

@FrozenGene
Copy link
Member

Could we use setCodeModel to solve this issue ? @apivovarov

@apivovarov
Copy link
Contributor Author

apivovarov commented Apr 3, 2022

GCC has the following x86-Options related to 2GB limit issue on x86_64 Linux platform

-mlarge-data-threshold=threshold
When -mcmodel=medium is specified, data objects larger than threshold are placed in the large data section. This value must be the same across all objects linked into the binary, and defaults to 65535.

-mcmodel=medium
Generate code for the medium model: the program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked.

I checked the difference in object (.o) and asm (.s) files generated by gcc. The diff is in the section name for Read-Only Data. mcmodel small uses .rodata and mcmodel medium uses .lrodata for const pre-defined data large than 64K.

LLVM/clang also has -mcmodel param but it does not have -mlarge-data-threshold param.
I tried to use -mcmodel=medium parameter with llvm/clang on x86_64 platform - the generated object, asm and llvm files are the same for -mcmodel=small and -mcmodel=medium. This is why linking object file .o to .so fails with error relocation truncated to fit: R_X86_64_PC32

LLVM/clang mostly use mcmodel parameter on RISC-V and XCore architectures.
mcmodel parameter is ignored on X86 platform.

The workaround for clang and LLVM x86 is to explicitly put section name for the large const pre-defined arrays. For example:

C/C++:

__attribute__((section(".lrodata")))
extern const long long int B100[131072];
const long long int B100[131072] = {...};

LLVM API:

auto* B100 = new llvm::GlobalVariable(...)
B100->setSection(".lrodata");

@apivovarov apivovarov force-pushed the large_data_apache branch 2 times, most recently from 1511536 to 25859ec Compare April 4, 2022 18:39
@FrozenGene FrozenGene merged commit ceed331 into apache:main Apr 5, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
facebook-github-bot pushed a commit to facebookincubator/AITemplate that referenced this pull request Mar 31, 2023
Summary:
I tried to compile test model having 2GB+ params.
Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform)

Currently we convert `constants.bin` to `constants.obj` using
```
ld -r -b binary -o constants.obj constants.bin
```

`objdump` shows that `_binary_constants_bin` array was placed to `.data` section.
`.data` and `.rodata` sections can not allocate more that 2GB.

To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit.
We can do it by using
```
objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj
```
to rename `.data` section in `constants.obj` file to `.lrodata`.

Before
```
$ objdump -x constants_old.obj

architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         8b8577e0  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l    d  .data	0000000000000000 .data
000000008b8577e0 g       *ABS*	0000000000000000 _binary_constants_bin_size
000000008b8577e0 g       .data	0000000000000000 _binary_constants_bin_end
0000000000000000 g       .data	0000000000000000 _binary_constants_bin_start
```

After:
```
$ objdump -x constants.obj

architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .lrodata      8b8577e0  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
0000000000000000 l    d  .lrodata	0000000000000000 .lrodata
000000008b8577e0 g       *ABS*	0000000000000000 _binary_constants_bin_size
000000008b8577e0 g       .lrodata	0000000000000000 _binary_constants_bin_end
0000000000000000 g       .lrodata	0000000000000000 _binary_constants_bin_start
```

Large (>2GB) `model.so`
```
-rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so

```

Related Links:
- [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/)
- [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882)

Pull Request resolved: #520

Reviewed By: alexanderguzhva

Differential Revision: D44533582

Pulled By: chenyang78

fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d
@wanglei91
Copy link

wanglei91 commented May 6, 2023

@apivovarov Thanks, I use this patch to test SD unet, it indeed support large >2GB models(mold instead of ld linker could also support it), but both this patch and mold linker still truncate when more than 3.1G(for may case, this patch produces a 3.1G unet.so, save more tensors than mold), I don't know if somebody else also meet this issue...

print when save:
const tensor 605
header: 15951258332257624383
cpu_dev.device_type: 1
tensor->ndim: 3
shape0: 16
shape1: 4096
shape2: 77

const tensor 606
header: 15951258332257624383
cpu_dev.device_type: 1
tensor->ndim: 3
shape0: 640
shape1: 1
shape2: 1

print when load:
const tensor 605
header: 15951258332257624383
shape0: 16
shape1: 4096
shape2: 77

const tensor 606
header: 160445370461692650
include/tvm/runtime/ndarray.h", line 512
TVMError: Check failed: (header == kTVMNDArrayMagic) is false: Invalid DLTensor file format

@tqchen
Copy link
Member

tqchen commented May 6, 2023

@wanglei91 we recommend using lift transform params for large parameters, as in https://github.com/mlc-ai/web-stable-diffusion.

cc @vinx13 @psrivas2 , this is also useful information

@wanglei91
Copy link

@tqchen Got it, much appreciated for the reply!

Ivorchu pushed a commit to Ivorchu/AITemplate-NM-Pruning-Archive that referenced this pull request Jul 30, 2025
Summary:
I tried to compile test model having 2GB+ params.
Large (>2GB) `constants.obj` was generated successfully, but `model.so` creation failed. (Linux x86_64 platform)

Currently we convert `constants.bin` to `constants.obj` using
```
ld -r -b binary -o constants.obj constants.bin
```

`objdump` shows that `_binary_constants_bin` array was placed to `.data` section.
`.data` and `.rodata` sections can not allocate more that 2GB.

To solve the issue with 2GB limit we can put `_binary_constants_bin` array to `.lrodata` read-only section. It does not have 2GB limit.
We can do it by using
```
objcopy --rename-section .data=.lrodata,alloc,load,readonly,data,contents constants.obj constants.obj
```
to rename `.data` section in `constants.obj` file to `.lrodata`.

Before
```
$ objdump -x constants_old.obj

architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         8b8577e0  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l    d  .data	0000000000000000 .data
000000008b8577e0 g       *ABS*	0000000000000000 _binary_constants_bin_size
000000008b8577e0 g       .data	0000000000000000 _binary_constants_bin_end
0000000000000000 g       .data	0000000000000000 _binary_constants_bin_start
```

After:
```
$ objdump -x constants.obj

architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .lrodata      8b8577e0  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
0000000000000000 l    d  .lrodata	0000000000000000 .lrodata
000000008b8577e0 g       *ABS*	0000000000000000 _binary_constants_bin_size
000000008b8577e0 g       .lrodata	0000000000000000 _binary_constants_bin_end
0000000000000000 g       .lrodata	0000000000000000 _binary_constants_bin_start
```

Large (>2GB) `model.so`
```
-rwxrwxr-x 1 ubuntu ubuntu 2.2G Mar 30 06:24 my_model.so

```

Related Links:
- [Embedding Binary Blobs With GCC](https://www.burtonini.com/blog/2007/07/13/embedding-binary-blobs-with-gcc/)
- [TVM PR - [LLVM] Support CodeGenBlob for large >2GB models](apache/tvm#10882)

Pull Request resolved: facebookincubator/AITemplate#520

Reviewed By: alexanderguzhva

Differential Revision: D44533582

Pulled By: chenyang78

fbshipit-source-id: 75cc9d07bacd1a74124dafd21a9d64101f8cb96d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants