Use lazy tensor #1427

justinchuby · 2025-04-27T01:19:55Z

Use LazyTensor to avoid having the store to disk.
Load model in native dtype with torch_dtype="auto" to avoid the cast overhead when loading the model.
The end result is 6.7 GB of peak memory usage (75% reduction) w/o needing temp disk space & 28% reduction in export time
Memory needed for a model is (size of the weights + max(size of individual weight) * 3)

With a bigger model (google/gemma-3-12b-it) (model size 22.7 GB):

Time: 0:14:38 -> 0:01:57 (87% reduction) Memory: 190.5 -> 35.3 (81% reduction)

(left: current main, right: with IR and lazy tensor)

Compare with gguf conversion tool

justinchuby added 30 commits April 22, 2025 22:45

Use ONNX IR for model builder

41229e2

wip

02aac04

initializer

5fb9e44

Undo detach

48dd0ba

use force=True

6275bca

Find numpy type

9b5fcf7

Save some numpy conversions

693e79e

make_rotary_embedding_multi_cache

82bf7ce

remove numpy helper

361127a

Make constants

c67675e

Format import

976f211

Fix syntax

6c4b159

sort

e1171c1

Handle int4

07d8fff

import

24f86e6

set

56ae3b2

Support precision

3418771

Fix onnx_dtype

f00cb6c

Use register_initializer

0fac4f6

Use external tensor

892c82d

ignore_cleanup_errors

eb9a0e9

clean up

3069c09

remove cache data

68193ef

make_value_info

d179b24

make_value_info

19567ee

Handle None output

d256e22

uint4

9d5b0d1

int4

76af7ed

makedirs

6cdce16

Fix make_inputs_and_outputs

f428288

justinchuby added 20 commits April 23, 2025 15:54

simplify

4d8dff4

Import opset

198134a

Fix softcap_exists naming

e4c9b4e

Fix value shape

fc06e75

https://github.com/microsoft/onnxruntime-genai/pull/1420

272ba00

rename

fec08aa

Refactor to utils file

89f12b5

Fix config

55fc16f

Fix import

0965f60

import

ec1b950

Remove eval

b169bbf

Is none

be36882

Update test

b835ab2

WIP

494a5f2

Try this

2797f91

syntax

8bc3236

Use torch tensors

525b418

Use torch tensors

0f31e56

cache

35203df

auto

b8867a5

justinchuby mentioned this pull request Apr 28, 2025

Add LazyTensor class to implement ir.TensorProtocol microsoft/onnxscript#2232

Merged

contiguous

8742ed7

justinchuby closed this Jun 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use lazy tensor #1427

Use lazy tensor #1427

Uh oh!

justinchuby commented Apr 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Use lazy tensor #1427

Use lazy tensor #1427

Uh oh!

Conversation

justinchuby commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

With a bigger model (google/gemma-3-12b-it) (model size 22.7 GB):

Compare with gguf conversion tool

Uh oh!

Uh oh!

justinchuby commented Apr 27, 2025 •

edited

Loading