-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in multithread app (v0.11.2) #380
Comments
Also Let me know if that made any difference. |
I've tested it locally and it works fluently. I'll add information about it the wiki. |
If you'll need to do multi-threaded unit tests in the future, you are welcome to use Usage: MultiThreadedUnitTestExecuter.Run(threadCount: 8, worload: tid => ...); |
@Nucs
|
I've fixed it in the commit above, untill it is available via nuget, lock (Locks.ProcessWide)
_session = Session.LoadFromSavedModel(modelLocation).as_default(); As mentioned in the wiki, due to lack of documentation from TF's side - we don't know what APIs are not threadsafe and it appears that |
With I found out that the problem is related to garbage collection. Try this example: for (int t = 0; t < THREADS_COUNT; t++)
{
new Thread(() =>
{
Session sess;
lock (Locks.ProcessWide)
sess = Session.LoadFromSavedModel(modelLocation).as_default();
{
var inputs = new[] { "sp", "fuel" };
var inp = inputs.Select(name => sess.graph.OperationByName(name).output).ToArray();
var outp = sess.graph.OperationByName("softmax_tensor").output;
for (var i = 0; i < 1000; i++)
{
{
var data = new float[96];
FeedItem[] feeds = new FeedItem[2];
for (int f = 0; f < 2; f++)
feeds[f] = new FeedItem(inp[f], new NDArray(data));
try
{
sess.run(outp, feeds);
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
GC.Collect();
}
}
}).Start();
} or test project https://github.com/deadman2000/TensorFlowNetMultithreading |
Please try the following code: |
Yes, this sample works fine. But after i replaced cycle to infinity loop, it has crashed after 1 minute in Debug and after 30 seconds in Release |
I ran it for over 35 minutes (and still running) without a crash, the This has lead me to believe it is a game of chance and indeed the program crashed eventually after 10-60 seconds with the following message in Output Window
When an error is a simple exit code it indicates that C++ (Tensorflow) has caused the crash.
Even though we do not call the same Session from different threads. I'll try to create a memory dump and see if we can indicate more accurately where the problem lies. |
@Nucs Should we raise this issue to tensorflow team? |
After I'll get my hands on a dump and research it. I'll let you know. |
@Oceania2018, you can contact the TF team Dump file: https://mega.nz/#!VVF3HKrS!r5sxOBsbccdVn93KpwS7EtxWSvtLlim2GwExDF8L9h4 |
Please try with this code in var output_values = fetch_list.Select(x => IntPtr.Zero).ToArray();
var inputs = feed_dict.Select(f => f.Key).ToArray();
var input_values = feed_dict.Select(f => (IntPtr)f.Value).ToArray();
var target_opers = target_list.Select(f => (IntPtr)f).ToArray();
c_api.TF_SessionRun(_handle,
run_options: null,
inputs: inputs,
input_values: input_values,
ninputs: feed_dict.Length,
outputs: fetch_list,
output_values: output_values,
noutputs: fetch_list.Length,
target_opers: target_opers,
ntargets: target_list.Count,
run_metadata: IntPtr.Zero,
status: status); |
Any update/thoughts on this one? I see the latest suggestion by @deadman2000 is already in the master, but I still reproduce this issue. Weird that adding a lock on sess.run() doesn't help either. I've also tried setting UsePerSessionThreads and IsolateSessionState to true, but no good. |
Back then my evaluations were that it is a multithreading problem within Tensorflow so we are kind of helpless here. They still have not responded on the issue. Drop a comment there if you will. |
I guess the problem isn't with tensorflow. I implemented a multithreaded application on pure C_API, and it works successfully, without crashes. |
I've just build TF2.1 debug version locally and was able to reproduce the issue. Some more detailed callstack: Didn't have much time to investigate yet, but also seems like we pass invalid input tensor values. Probably due to GC. Although couldn't spot where we could dispose it yet. Will investigate how TensorConverter.ToTensor() works tomorrow. Any ideas appreciated :) |
So I think issue is caused by the following usage of nd.GetData() in Tensor.Creation.cs. I guess that starts pointing to GC controlled memory with no guarantees it will stay at the same address after GC work.
Changing to the following helped (probably with some performance degradation which I didn't notice due to small input dataset):
After this change I do not reproduce the crash anymore, but I will keep testing this. |
i have this error too in gpu version in cpu version everything is fine how solve this ? |
@Mghobadid fix by #533 should do the trick. Not the most efficient way, but seems to work at least on CPU (and I reproduced exactly same issue on GPU, so should be same)... |
Thanks @tompetk for the fix. I hope the PR gets merged asap and we get a new NuGet version. |
App is crashing during sess.run with message
Segmentation fault (core dumped)
on docker orAttempted to read or write protected memory. This is often an indication that other memory is corrupt.
on Windows.Stack trace on Windows
I created small example project for tests: https://github.com/deadman2000/TensorFlowNetMultithreading
The text was updated successfully, but these errors were encountered: