Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Report]: Crash after a few calls to model.predict() #1213

Open
Utanapishtim31 opened this issue Nov 8, 2023 · 1 comment
Open

[BUG Report]: Crash after a few calls to model.predict() #1213

Utanapishtim31 opened this issue Nov 8, 2023 · 1 comment
Assignees

Comments

@Utanapishtim31
Copy link

Description

My application uses a model and regularly calls predict(). After a dozen calls to predict() (the number of calls is variable), my application crashes.

After a very long investigation, I have been able to detect that some data have been written after the end of a buffer (i.e. a buffer overflow).

Windows debugger displays the following message:

Critical error detected c0000374 <-- Heap corruption
A breakpoint instruction (__debugbreak() statement or a similar call) was executed in XXX.exe.

and another debug message states that data have been written after the end of a memory buffer.

The call stack after the exception tells that this memory buffer is managed by a SafeTensorHandle:

 	ntdll.dll!RtlIsZeroMemory() + 162 bytes	Unknown
 	ntdll.dll!__misaligned_access() + 1066 bytes	Unknown
 	ntdll.dll!__misaligned_access() + 1802 bytes	Unknown
 	ntdll.dll!00007ff99abf1b85()	Unknown
 	ntdll.dll!00007ff99ab7c4d8()	Unknown
 	ntdll.dll!RtlFreeHeap() + 81 bytes	Unknown
 	ucrtbase.dll!_free_base() + 27 bytes	Unknown
 	ucrtbase.dll!_aligned_free() + 22 bytes	Unknown
 	tensorflow.dll!tensorflow::ApiDef::set_summary() + 6193 bytes	Unknown
 	tensorflow.dll!google::protobuf::Map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,tensorflow::AttrValue>::const_iterator::operator++() + 891 bytes	Unknown
 	tensorflow.dll!std::_Destroy_in_place<tensorflow::Tensor>() + 43 bytes	Unknown
 	tensorflow.dll!google::protobuf::Map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,tensorflow::AttrValue>::const_iterator::operator++() + 1026 bytes	Unknown
 	tensorflow.dll!TF_DeleteTensor() + 26 bytes	Unknown
 	[Managed to Native Transition]	
>	Tensorflow.Binding.dll!Tensorflow.SafeTensorHandle.ReleaseHandle() Line 21	C#
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	mscorlib.dll!System.Runtime.InteropServices.SafeHandle.~SafeHandle() Line 57	C#
 	[Native to Managed Transition]	
 	kernel32.dll!BaseThreadInitThunk() + 29 bytes	Unknown
 	ntdll.dll!RtlUserThreadStart() + 40 bytes	Unknown

By tagging the SafeTensorHandles created during the lifetime of my application, I have been able to detect that the SafeTensorHandle causing the crash is the handle contained in a SafeStringTensorHandle. Actually, it is the handle 'safeTensorHandle' created in Tensor.StringTensor():

public class Tensor : DisposableObject, ITensorOrOperation, ITensorOrTensorArray, IPackable<Tensor>, ICanBeFlattened
{
	public SafeStringTensorHandle StringTensor(byte[][] buffer, Shape shape)
	{
		SafeTensorHandle safeTensorHandle = c_api.TF_AllocateTensor(TF_DataType.TF_STRING, shape.dims, shape.ndim, (ulong)(shape.size * 24)); <-- the buffer of this handle will overflow
		IntPtr intPtr = c_api.TF_TensorData(safeTensorHandle);
		for (int i = 0; i < buffer.Length; i++)
		{
			c_api.TF_StringInit(intPtr);
			c_api.TF_StringCopy(intPtr, buffer[i], buffer[i].Length);
			intPtr += 24;
		}
		return new SafeStringTensorHandle(safeTensorHandle, shape);
	}
}

This SafeStringTensorHandle is created during model.predict() and contains the optimization options. The call stack where it is created is like this:

	Tensorflow.Binding.dll!Tensorflow.Tensor.StringTensor(byte[][] buffer, Tensorflow.Shape shape) Line 2322	C#
 	Tensorflow.Binding.dll!Tensorflow.Tensor.StringTensor(string[] strings, Tensorflow.Shape shape) Line 2316	C#
 	Tensorflow.Binding.dll!Tensorflow.Tensor.InitTensor(System.Array array, Tensorflow.Shape shape) Line 570	C#
 	Tensorflow.Binding.dll!Tensorflow.Tensor.Tensor(System.Array array, Tensorflow.Shape shape) Line 449	C#
 	Tensorflow.Binding.dll!Tensorflow.Eager.EagerTensor.EagerTensor(System.Array array, Tensorflow.Shape shape) Line 135	C#
 	Tensorflow.Binding.dll!Tensorflow.constant_op.convert_to_eager_tensor(object value, Tensorflow.Contexts.Context ctx, Tensorflow.TF_DataType dtype) Line 148	C#
 	Tensorflow.Binding.dll!Tensorflow.constant_op.convert_to_eager_tensor(object value, Tensorflow.TF_DataType dtype, Tensorflow.Shape shape, string name, bool verify_shape, bool allow_broadcast) Line 163	C#
 	Tensorflow.Binding.dll!Tensorflow.constant_op.constant(object value, Tensorflow.TF_DataType dtype, Tensorflow.Shape shape, bool verify_shape, bool allow_broadcast, string name) Line 34	C#
 	Tensorflow.Binding.dll!Tensorflow.ops.convert_to_tensor(object value, Tensorflow.TF_DataType dtype, string name, bool as_ref, Tensorflow.TF_DataType preferred_dtype, Tensorflow.Contexts.Context ctx) Line 482	C#
 	Tensorflow.Binding.dll!Tensorflow.tensorflow.convert_to_tensor(object value, Tensorflow.TF_DataType dtype, string name, Tensorflow.TF_DataType preferred_dtype) Line 2910	C#
 	Tensorflow.Binding.dll!Tensorflow.OptimizeDataset.OptimizeDataset(Tensorflow.IDatasetV2 dataset, string[] optimizations_enabled, string[] optimizations_disabled, string[] optimizations_default, string[] optimization_configs) Line 31	C#
 	Tensorflow.Binding.dll!Tensorflow.DatasetV2.apply_options() Line 143	C#
 	Tensorflow.Binding.dll!Tensorflow.OwnedIterator._create_iterator(Tensorflow.IDatasetV2 dataset) Line 30	C#
 	Tensorflow.Binding.dll!Tensorflow.OwnedIterator.OwnedIterator(Tensorflow.IDatasetV2 dataset) Line 26	C#
 	Tensorflow.Keras.dll!Tensorflow.Keras.Engine.DataAdapters.DataHandler.enumerate_epochs() Line 118	C#
 	Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.PredictInternal(Tensorflow.Keras.Engine.DataAdapters.DataHandler data_handler, int verbose) Line 808	C#
 	Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.predict(Tensorflow.Tensors x, int batch_size, int verbose, int steps, int max_queue_size, int workers, bool use_multiprocessing) Line 793	C#

This bug is very serious because it precludes the deployment of my application to my customers.

Reproduction Steps

I have not been able to create a minimal application to reproduce the bug, primarily because it occurs randomly when the GC decides to delete the handles.

Known Workarounds

No workaround found.

Configuration and Other Information

Tensorflow.NET 0.110.4
Tensorflow.Keras 0.11.4
Windows 11

@Utanapishtim31
Copy link
Author

Utanapishtim31 commented Nov 9, 2023

TensorflowBufferOverflow.zip

I've been able to create a small C# app which reproduces the problem. You will have to run it as debug from Visual Studio with "Enable native code debugging" checked in the Properties/Debug of the project.

The crash occurs when the app closes automatically after 50 predictions with an "Unhandled exception at 0x00007FF99875F61E (ucrtbase.dll) in TensorflowBufferOverflow.exe: Fatal program exit requested." in SafeEagerTensorHandle.ReleaseHandle().

You will probably have to run the app several times before this exception is raised.

Plase note that in my real application this exception is raised during the lifetime of the application, not only when closing, so this exception is a lot more critical. Furthermore the exception is raised in my app in SafeTensorHandle.ReleaseHandle() from SafeStringTensorHandle.ReleaseHandle() so it is not exactly the same error as here, but I hope that they are similar enough so that a fix can be applied to both classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants