Not all binding implementations follow the requirements of fdb_stop_network #3015

ajbeamon · 2020-04-23T20:28:07Z

In the current C API, it is required that fdb_stop_network be called and the network thread allowed to complete before terminating the program. It seems we aren't doing this in all of our bindings, and absent a change to this requirement as proposed in #2978, this can lead to undefined behavior.

It looks like the current state of our in-tree bindings are that:

Ruby and Python follow the requirements of the API (stop then join)
Java stops the network thread but doesn't join it
Go doesn't stop the network thread automatically or provide any API to do it manually as far as I can tell

If #2978 is done, then this won't be an issue. However, I suspect that solving this problem is easier than the other one, so it may be worth updating the two bindings in the meantime.

The text was updated successfully, but these errors were encountered:

ajbeamon · 2020-05-20T18:04:53Z

It seems I may have been mistaken about the Java case, where it looks like we do block on the run network call terminating in our implementation of stop network.

vishesh · 2020-05-20T19:14:17Z

In Go land, it seems a bit harder to do.

Ruby, Python and Java are using the API provided by language atexit/onShutdownHook respectively which can register functions to be called when programs end. Go doesn't seem to have any equivalent. There is SetFinalizer but doesn't seem to be helpful in this case as the documentation says

The finalizer is scheduled to run at some arbitrary time after the program can no longer reach the object to which obj points. There is no guarantee that finalizers will run before a program exits

So it seems like Go really emphasizes on explicitly handling cleanup up stuff, and the solution has to be change in API itself.

sfc-gh-anoyes · 2020-05-20T19:55:36Z

It could be that the real requirement is that you join the network thread before returning from main, and atexit is too late to avoid the undefined behavior (atexit is also when global destructors are run)

ajbeamon · 2020-05-20T20:13:33Z

Or maybe the fact that we are waiting for fdb_run_network to stop but not actually joining the thread that it's running in is a problem.

ajbeamon · 2020-05-22T18:07:16Z

@vishesh So are you saying that we should expose the stopNetwork function in Go and that's it for now?

gm42 · 2024-05-14T13:03:28Z

In Go land, it seems a bit harder to do.

Indeed it is. This is the best I could come up with:

//go:linkname runtime_addExitHook runtime.addExitHook
func runtime_addExitHook(f func(), runOnNonZeroExit bool)

func init() {
	// this is a mitigation for https://github.com/apple/foundationdb/issues/3015
	// and it has the purpose of having our tests with -race enabled not crash with SIGSEGV
	// due to the destructors being invoked while the network thread is still running
	runtime_addExitHook(fdb.StopNetwork, true)
}

I am exposing StopNetwork() in a PR here, and will be testing out whether this approach works for tests with -race, or not.

etschannen assigned vishesh and ajbeamon Apr 29, 2020

etschannen added this to the 6.3 milestone Apr 29, 2020

sfc-gh-anoyes mentioned this issue Apr 30, 2021

Try detecting active network thread at shutdown #4737

Draft

5 tasks

gm42 mentioned this issue Feb 29, 2024

Using DNS entries in cluster file can cause SIGSEV #11222

Closed

gm42 mentioned this issue May 14, 2024

Go binding: add StopNetwork() #11393

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not all binding implementations follow the requirements of fdb_stop_network #3015

Not all binding implementations follow the requirements of fdb_stop_network #3015

ajbeamon commented Apr 23, 2020

ajbeamon commented May 20, 2020

vishesh commented May 20, 2020

sfc-gh-anoyes commented May 20, 2020

ajbeamon commented May 20, 2020

ajbeamon commented May 22, 2020

gm42 commented May 14, 2024

Not all binding implementations follow the requirements of fdb_stop_network #3015

Not all binding implementations follow the requirements of fdb_stop_network #3015

Comments

ajbeamon commented Apr 23, 2020

ajbeamon commented May 20, 2020

vishesh commented May 20, 2020

sfc-gh-anoyes commented May 20, 2020

ajbeamon commented May 20, 2020

ajbeamon commented May 22, 2020

gm42 commented May 14, 2024