Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t5-11b out of memory/FileNotFoundError #11

Closed
ViktorThink opened this issue May 1, 2021 · 5 comments
Closed

t5-11b out of memory/FileNotFoundError #11

ViktorThink opened this issue May 1, 2021 · 5 comments

Comments

@ViktorThink
Copy link

ViktorThink commented May 1, 2021

``First of all, this seems like a great repo that I was super excited to find!

When testing with t5-small everything works correctly. But when trying with my custom t5-11b I get out of memory issues.

I was running this with a t5-11b as model:
onnx_model_paths = generate_onnx_representation("t5-11b",model=model)

And at first I got this error:

RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

So I simply added use_external_data_format=True to all of the three torch.onnx.export in onnx_exporter.py in fastT5.

Then I can run onnx_model_paths = generate_onnx_representation(model_name,model=model), and get no error (First time I posted I got an error but it seems like I made an error and only had 100 GB disk memory, when trying 200 GB it worked).

Then when running quant_model_paths = quantize(onnx_model_paths) I get the error:

`FileNotFoundError                         Traceback (most recent call last)
<ipython-input-7-3a782b6d5a25> in <module>
      8 
      9 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
---> 10 quant_model_paths = quantize(onnx_model_paths)
     11 
     12 # step 3. setup onnx runtime

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    266         op_types_to_quantize = list(IntegerOpsRegistry.keys())
    267 
--> 268     model = load_model(Path(model_input), optimize_model)
    269     quantizer = ONNXQuantizer(
    270         model,

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in load_model(model_path, optimize)
     51         return onnx_model.model
     52 
---> 53     return onnx.load(Path(model_path))
     54 
     55 

/opt/conda/lib/python3.7/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
    125         if model_filepath:
    126             base_dir = os.path.dirname(model_filepath)
--> 127             load_external_data_for_model(model, base_dir)
    128 
    129     return model

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_model(model, base_dir)
     69     for tensor in _get_all_tensors(model):
     70         if uses_external_data(tensor):
---> 71             load_external_data_for_tensor(tensor, base_dir)
     72             # After loading raw_data from external_data, change the state of tensors
     73             tensor.data_location = TensorProto.DEFAULT

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_tensor(tensor, base_dir)
     48     external_data_file_path = os.path.join(base_dir, file_location)
     49 
---> 50     with open(external_data_file_path, 'rb') as data_file:
     51 
     52         if info.offset:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/encoder.embed_tokens.weight'`

Has anyone successfully exported the t5-11b version and knows how to solve this?

Update:

I tried changing the working directory to /home/jupyter/models instead of /home/jupyter/, which seems to solve the FileNotFoundError. But then again I get problems with the size:

ValueError                                Traceback (most recent call last)
<ipython-input-10-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612
@ViktorThink ViktorThink changed the title t5-11b out of memory t5-11b out of memory/FileNotFoundError May 3, 2021
@Ki6an
Copy link
Owner

Ki6an commented May 3, 2021

I've not tested the library for t5-11b. I'm glad that you were able to export the model by adding use_external_data_format=True .
I suggest you do the same for quantizing as well.

https://github.com/microsoft/onnxruntime/blob/add4e4225ba69ba48a28889ff91e65bbc5f6f2ca/onnxruntime/python/tools/quantization/quantize.py#L260

and also make sure that you have enough memory.

@ViktorThink
Copy link
Author

Thank you for getting back, it's highly appreciated.

I tried adding use_external_data_format=True to quantize_dynamic:

quantize_dynamic(
            model_input=model_name,
            model_output=output_model_name,
            per_channel=True,
            activation_type=QuantType.QUInt8,
            weight_type=QuantType.QUInt8,
            optimize_model=False,
            use_external_data_format=True
        )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],

Still get the exact same error:

ValueError                                Traceback (most recent call last)
<ipython-input-4-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

A bit strange though, since in the documentation you sent it says that setting use_external_data_format=True should solve this error...

@Ki6an
Copy link
Owner

Ki6an commented May 3, 2021

it is strange indeed! The problem seems to be in the onnxruntime library. you could follow this issue and try to solve the problem. if this does not help then, I suggest you create a new issue in onnxruntime regarding this issue.

@samanz
Copy link

samanz commented May 3, 2021

I'm getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the infer_shapes method doesn't work with large models, and is supposed to be replaced with infer_shapes_path. So that would need to be fixed in the onnxruntime project. I modified the code in onnx_quantizer to look like:

        onnx.shape_inference.infer_shapes_path(model_name, model_name + ".inferred")
        model = onnx.load(model_name + ".inferred")

while passing in a model_name to the method as well. The code was able to get pass the shape inference step, but failed with this information now:

Quantizing... |##########                      | 1/3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e72945460842> in <module>
      1 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
----> 2 quant_model_paths = quantize(onnx_model_paths)
      3 
      4 # step 3. setup onnx runtime
      5 model_sessions = get_onnx_runtime_sessions(quant_model_paths)

~/.local/lib/python3.6/site-packages/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True,
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    281         op_types_to_quantize)
    282 
--> 283     quantizer.quantize_model()
    284     quantizer.model.save_model_to_file(model_output, use_external_data_format)
    285 

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_model(self)
    195                 op_quantizer = CreateDefaultOpQuantizer(self, node)
    196 
--> 197             op_quantizer.quantize()
    198 
    199         self._dequantize_outputs()

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/operators/matmul.py in quantize(self)
     17 
     18         (quantized_input_names, zero_point_names, scale_names, nodes) = \
---> 19             self.quantizer.quantize_inputs(node, [0, 1])
     20 
     21         matmul_integer_output = node.output[0] + "_output_quantized"

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_inputs(self, node, indices, initializer_use_weight_qType)
    613             if initializer is not None:
    614                 q_weight_name, zp_name, scale_name = self.quantize_weight(
--> 615                     initializer, self.weight_qType if initializer_use_weight_qType else self.input_qType)
    616 
    617                 quantized_input_names.append(q_weight_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_weight(self, weight, qType)
    654 
    655         # Update packed weight, zero point, and scale initializers
--> 656         weight_data = self.tensor_proto_to_array(weight)
    657         _, _, zero_point, scale, q_weight_data = quantize_data(weight_data.flatten().tolist(),
    658                                                                get_qrange_for_qType(qType, self.reduce_range), qType)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in tensor_proto_to_array(initializer)
    215     def tensor_proto_to_array(initializer):
    216         if initializer.data_type == onnx_proto.TensorProto.FLOAT:
--> 217             weights = onnx.numpy_helper.to_array(initializer)
    218         else:
    219             raise ValueError('Only float type quantization is supported. Weights {} is {}. '.format(

~/.local/lib/python3.6/site-packages/onnx/numpy_helper.py in to_array(tensor)
     52         return np.frombuffer(
     53             tensor.raw_data,
---> 54             dtype=np_dtype).reshape(dims)
     55     else:
     56         data = getattr(tensor, storage_field),  # type: Sequence[np.complex64]

ValueError: cannot reshape array of size 16777216 into shape (1024,4096)

@ViktorThink
Copy link
Author

ViktorThink commented May 9, 2021

Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html

My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants