Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SampleApp goes to break mode in the last version (CNTK backend) #13

Open
sharpwood opened this issue Nov 21, 2017 · 3 comments
Open

SampleApp goes to break mode in the last version (CNTK backend) #13

sharpwood opened this issue Nov 21, 2017 · 3 comments

Comments

@sharpwood
Copy link

sharpwood commented Nov 21, 2017

image

@sharpwood sharpwood changed the title SampleApp goes to break mode in the last version SampleApp goes to break mode in the last version (CNTK backend) Nov 21, 2017
@cesarsouza
Copy link
Owner

cesarsouza commented Nov 21, 2017

Hi @sharpwood,

Thanks for opening the issue!

As you have noticed, this exception only happens when using the CNTK backend. Sadly, the fact that this impossible-to-catch exception has been raised means that something has corrupted the memory of the process, including internal data structures used by the CLR. The CLR has detected this corruption, and therefore decided to stop executing code because it becomes unsafe to continue doing so. As such, it is very likely that whatever is causing this memory corruption is unrelated to the line of code currently highlighted in your screenshot, but instead have only been detected while executing that line.

Since Keras# does not use unsafe operations and therefore never touches memory directly, the only possibility is that the memory corruption is being caused by CNTK itself. The most likely explanation is that Keras# is calling some of the CNTK APIs in an unexpected manner, CNTK is not performing enough argument checks to detect those incorrect calls, and proceeds to execute those incorrectly defined operations, resulting in memory corruption.

The CNTK project should be releasing a new version tomorrow. We can try again with the new version to see if at least the error message improves, but for the time being, I would suggest experimenting with the TensorFlow backend instead.

Regards,
Cesar

@sharpwood
Copy link
Author

image

use cntk 2.3

@cesarsouza
Copy link
Owner

cesarsouza commented Nov 28, 2017

Hi @sharpwood,

Thanks for the update. I've just updated to CNTK 2.3 but the issue is still the same. In fact, if you comment the lines in file CNTKFunction.cs:

this.trainer.TrainMinibatch(input_dict, isSweepEndInarguments: false, computeDevice: DeviceDescriptor.CPUDevice);
updated.Add(c.constant(this.trainer.PreviousMinibatchLossAverage()));
updated.Add(c.constant(this.trainer.PreviousMinibatchEvaluationAverage()));

and instead replace them by

// this.trainer.TrainMinibatch(input_dict, isSweepEndInarguments: false, computeDevice: DeviceDescriptor.CPUDevice);

updated.Add(c.constant(this.trainer.PreviousMinibatchLossAverage()));
updated.Add(c.constant(this.trainer.PreviousMinibatchEvaluationAverage()));

therefore disabling the mini-batch training but still letting Keras# do all the rest besides that, you will see that the issue disappears.

Keras# will still be preparing the mini-batches, coping memory from C# to CNTK's NDArrayView/Value, build the execution graph, just as before. The exception is that this time we will not let CNTK update the model, and as you see we will not run into memory issues during the training part (the model will never learn anything though, since no weight update is being made).

Regards,
Cesar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants