-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server fails with glib2 2.76.0 #146
Comments
Starting |
Bisected and found this to be the commit causing the trouble: Looks like threading is broken here... |
Sigh. That linked page says:
That's not very promising... |
I think the complicated code was referencing what was removed in glib, not what we have in nbd. 😜 I am no very familiar with that code, so no idea how to proceed from here. Let me know if I can help to further narrow this. Sadly the glib2 package was moved to Arch's official repositories before this issue was noticed. So currently the nbd server is non-functional without extra action. |
Yes, that was clear :-)
One thing you could do is run the "drd" tool of valgrind:
and post the output. Perhaps there's a bug in my code, but I'm not sure; It feels like a bug in glib to me, not in nbd. Would you be willing to file a bug report against glib? Feel free to refer to this one -- and please also post the link to it in this bug report. I'll follow up there if necessary.
Yeah, not great :-( |
Ran the command as given. As
The forked process was still running in background, so I connected to
So looks like
Will do next week with some spare time if we do not solve it here. |
Building with |
Just opened an issue for glib... |
My issue is similar, but not quite the same. My server starts (I think) and I can connect from a client. The drives show up with their sizes but when reading ( I'm not sure what could be causing this and I'm not sure how I can troubleshoot, but I'll be happy to help if I can. I do know that glib version is 2.76.0-1 on Arch Linux. |
Sounds exactly the same. |
Downgrading to |
We have a reaction from glib upstream asking for more information... But I am not that familiar with the nbd code. |
Also seeing this with glib2 2.76.1 on Arch Linux. As an imperfect alternative to downgrading glib2, you can also override the
# /etc/systemd/system/nbd.service.d/override.conf
[Service]
Type=simple
ExecStart=
ExecStart=/usr/bin/nbd-server -d
Restart=always
|
Here on the new Fedora 38 the same problem. The test program printing some output via g_thread_pool_push runs in Debian 11, but freezes in Fedora 38 if the pool is created before the fork. Digging a bit further with the example program: Id Target Id Frame
If I understand correctly, threads are not duplicated during fork, so probably having a g_thread_pool_new before and g_thread_pool_push after a fork is a recipe for waiting forever. I assume some rearrangements in nbd-server are required.... |
Okay, thanks for that. I don't think we do anything important with the thread pool before the fork, so moving it to be called after the fork should be doable. I'll look at that and publish a new version ASAP. |
I made a littile patch for Fedora, leaving the pool_new on the same place if nofork and doing the pool_new after the fork if !nofork. Plus extra parameter to get the number of threads in that function. It compiled and performed all tests. |
Can you share your patch and/or open a pull request? |
We created the thread pool at the main initialization, before forking off a child. This used to work just fine, but as of GLib 2.76, this no longer works due to changes internal to the implementation of GThreadPool. Since we don't need to use the thread pool before the fork() call anyway, stop trying to do so and avoid the problem altogether. Closes: gh-146 Signed-off-by: Wouter Verhelst <w@uter.be>
Thanks a lot! |
NetworkBlockDevice/nbd#146 git-svn-id: file:///srv/repos/svn-community/svn@1448938 9fca08f4-af9d-4005-b8df-a31f2cc04f65
NetworkBlockDevice/nbd#146 git-svn-id: file:///srv/repos/svn-community/svn@1448938 9fca08f4-af9d-4005-b8df-a31f2cc04f65
Sorry for the delay. I see the problem has been solved in meantime. This latest version has some more differences with the Fedora provided version 3.24. Based on the Fedora version 3.24, only move of the g_thread_pool_new call to the location as in the latest version breaks the inetd test. The patch below passes all tests during build of Fedora 38 rpm.
|
Any chance we have a new release? |
The update to GLib 2.75.3 broke nbd ("Log limit exceeded"). NixOS@8e5ee71 Upstream issue: NetworkBlockDevice/nbd#146 This commit also applies a second patch from the upstream development repository so that the patch that fixes the issue applies cleanly.
Yes, it's high on my todo list now. However, before that can happen, I have a few loose ends to finish up; the structured replies patch broke the data logging in the transaction log, and on top of that nbd-trdump also needs to be taught about structured replies. I also don't feel confident enough that the patch which I applied will work sufficiently well; I'd like to do some more in-depth testing. I'm hoping to be able to push that out sometime next week. Of course, if you feel up to it, help with patches or extra testing is always welcome 😉 |
Scratch that, I found time today. Release should be upcoming soon. |
Have been wondering why my clients fail to boot via PXE, where NBD hangs... After some debugging it turns out the client is fine, but server behaves bad due to an update of
glib2
to version2.76.0
.No idea what change caused this... I will try to dig deeper.
The text was updated successfully, but these errors were encountered: