t5-11b out of memory/FileNotFoundError #11

ViktorThink · 2021-05-01T18:20:27Z

``First of all, this seems like a great repo that I was super excited to find!

When testing with t5-small everything works correctly. But when trying with my custom t5-11b I get out of memory issues.

I was running this with a t5-11b as model:
onnx_model_paths = generate_onnx_representation("t5-11b",model=model)

And at first I got this error:

RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

So I simply added use_external_data_format=True to all of the three torch.onnx.export in onnx_exporter.py in fastT5.

Then I can run onnx_model_paths = generate_onnx_representation(model_name,model=model), and get no error (First time I posted I got an error but it seems like I made an error and only had 100 GB disk memory, when trying 200 GB it worked).

Then when running quant_model_paths = quantize(onnx_model_paths) I get the error:

`FileNotFoundError                         Traceback (most recent call last)
<ipython-input-7-3a782b6d5a25> in <module>
      8 
      9 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
---> 10 quant_model_paths = quantize(onnx_model_paths)
     11 
     12 # step 3. setup onnx runtime

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    266         op_types_to_quantize = list(IntegerOpsRegistry.keys())
    267 
--> 268     model = load_model(Path(model_input), optimize_model)
    269     quantizer = ONNXQuantizer(
    270         model,

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in load_model(model_path, optimize)
     51         return onnx_model.model
     52 
---> 53     return onnx.load(Path(model_path))
     54 
     55 

/opt/conda/lib/python3.7/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
    125         if model_filepath:
    126             base_dir = os.path.dirname(model_filepath)
--> 127             load_external_data_for_model(model, base_dir)
    128 
    129     return model

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_model(model, base_dir)
     69     for tensor in _get_all_tensors(model):
     70         if uses_external_data(tensor):
---> 71             load_external_data_for_tensor(tensor, base_dir)
     72             # After loading raw_data from external_data, change the state of tensors
     73             tensor.data_location = TensorProto.DEFAULT

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_tensor(tensor, base_dir)
     48     external_data_file_path = os.path.join(base_dir, file_location)
     49 
---> 50     with open(external_data_file_path, 'rb') as data_file:
     51 
     52         if info.offset:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/encoder.embed_tokens.weight'`

Has anyone successfully exported the t5-11b version and knows how to solve this?

Update:

I tried changing the working directory to /home/jupyter/models instead of /home/jupyter/, which seems to solve the FileNotFoundError. But then again I get problems with the size:

ValueError                                Traceback (most recent call last)
<ipython-input-10-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

The text was updated successfully, but these errors were encountered:

Ki6an · 2021-05-03T13:45:04Z

I've not tested the library for t5-11b. I'm glad that you were able to export the model by adding use_external_data_format=True .
I suggest you do the same for quantizing as well.

https://github.com/microsoft/onnxruntime/blob/add4e4225ba69ba48a28889ff91e65bbc5f6f2ca/onnxruntime/python/tools/quantization/quantize.py#L260

and also make sure that you have enough memory.

ViktorThink · 2021-05-03T15:57:15Z

Thank you for getting back, it's highly appreciated.

I tried adding use_external_data_format=True to quantize_dynamic:

quantize_dynamic(
            model_input=model_name,
            model_output=output_model_name,
            per_channel=True,
            activation_type=QuantType.QUInt8,
            weight_type=QuantType.QUInt8,
            optimize_model=False,
            use_external_data_format=True
        )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],

Still get the exact same error:

ValueError                                Traceback (most recent call last)
<ipython-input-4-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

A bit strange though, since in the documentation you sent it says that setting use_external_data_format=True should solve this error...

Ki6an · 2021-05-03T18:18:59Z

it is strange indeed! The problem seems to be in the onnxruntime library. you could follow this issue and try to solve the problem. if this does not help then, I suggest you create a new issue in onnxruntime regarding this issue.

samanz · 2021-05-03T19:25:15Z

I'm getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the infer_shapes method doesn't work with large models, and is supposed to be replaced with infer_shapes_path. So that would need to be fixed in the onnxruntime project. I modified the code in onnx_quantizer to look like:

        onnx.shape_inference.infer_shapes_path(model_name, model_name + ".inferred")
        model = onnx.load(model_name + ".inferred")

while passing in a model_name to the method as well. The code was able to get pass the shape inference step, but failed with this information now:

Quantizing... |##########                      | 1/3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e72945460842> in <module>
      1 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
----> 2 quant_model_paths = quantize(onnx_model_paths)
      3 
      4 # step 3. setup onnx runtime
      5 model_sessions = get_onnx_runtime_sessions(quant_model_paths)

~/.local/lib/python3.6/site-packages/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True,
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    281         op_types_to_quantize)
    282 
--> 283     quantizer.quantize_model()
    284     quantizer.model.save_model_to_file(model_output, use_external_data_format)
    285 

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_model(self)
    195                 op_quantizer = CreateDefaultOpQuantizer(self, node)
    196 
--> 197             op_quantizer.quantize()
    198 
    199         self._dequantize_outputs()

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/operators/matmul.py in quantize(self)
     17 
     18         (quantized_input_names, zero_point_names, scale_names, nodes) = \
---> 19             self.quantizer.quantize_inputs(node, [0, 1])
     20 
     21         matmul_integer_output = node.output[0] + "_output_quantized"

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_inputs(self, node, indices, initializer_use_weight_qType)
    613             if initializer is not None:
    614                 q_weight_name, zp_name, scale_name = self.quantize_weight(
--> 615                     initializer, self.weight_qType if initializer_use_weight_qType else self.input_qType)
    616 
    617                 quantized_input_names.append(q_weight_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_weight(self, weight, qType)
    654 
    655         # Update packed weight, zero point, and scale initializers
--> 656         weight_data = self.tensor_proto_to_array(weight)
    657         _, _, zero_point, scale, q_weight_data = quantize_data(weight_data.flatten().tolist(),
    658                                                                get_qrange_for_qType(qType, self.reduce_range), qType)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in tensor_proto_to_array(initializer)
    215     def tensor_proto_to_array(initializer):
    216         if initializer.data_type == onnx_proto.TensorProto.FLOAT:
--> 217             weights = onnx.numpy_helper.to_array(initializer)
    218         else:
    219             raise ValueError('Only float type quantization is supported. Weights {} is {}. '.format(

~/.local/lib/python3.6/site-packages/onnx/numpy_helper.py in to_array(tensor)
     52         return np.frombuffer(
     53             tensor.raw_data,
---> 54             dtype=np_dtype).reshape(dims)
     55     else:
     56         data = getattr(tensor, storage_field),  # type: Sequence[np.complex64]

ValueError: cannot reshape array of size 16777216 into shape (1024,4096)

ViktorThink · 2021-05-09T13:31:58Z

Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html

My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.

ViktorThink changed the title ~~t5-11b out of memory~~ t5-11b out of memory/FileNotFoundError May 3, 2021

Ki6an closed this as completed May 22, 2021

redthing1 mentioned this issue May 11, 2022

Fails to convert T0-3B #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t5-11b out of memory/FileNotFoundError #11

t5-11b out of memory/FileNotFoundError #11

ViktorThink commented May 1, 2021 •

edited

Ki6an commented May 3, 2021 •

edited

ViktorThink commented May 3, 2021

Ki6an commented May 3, 2021

samanz commented May 3, 2021

ViktorThink commented May 9, 2021 •

edited

t5-11b out of memory/FileNotFoundError #11

t5-11b out of memory/FileNotFoundError #11

Comments

ViktorThink commented May 1, 2021 • edited

Ki6an commented May 3, 2021 • edited

ViktorThink commented May 3, 2021

Ki6an commented May 3, 2021

samanz commented May 3, 2021

ViktorThink commented May 9, 2021 • edited

ViktorThink commented May 1, 2021 •

edited

Ki6an commented May 3, 2021 •

edited

ViktorThink commented May 9, 2021 •

edited