New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telnet console randomly stops responding #1335
Comments
What version of QEMU are you running? I had this problem when we upgraded to a recent version but the problem went away when we went back to 2.5.0. |
Hmmm, it looks like I'm using 2.11.1 dpkg --list | grep qemuii ipxe-qemu 1.0.0+git-20180124.fbe8c52d-0ubuntu2 all PXE boot firmware - ROM images for qemu qemu-x86_64 --versionqemu-x86_64 version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.1) I'll look into downgrading to 2.5. Thanks! |
Thank you for that comment. I have not had the time to downgrade as there
is not a cut and paste solution for that. If a fix comes out before I
downgrade please let me know and I'll test it.
…On Mon, Jun 11, 2018, 9:09 AM Bernhard Ehlers ***@***.***> wrote:
As GNS3 wants to use Ubuntu 18.04 for the GNS3VM (GNS3/gns3-vm#93
<GNS3/gns3-vm#93>) and @PatrickBostwick
<https://github.com/PatrickBostwick> uses exactly this OS, I find that
this bug is extremely important. If it's really an issue of QEMU and can't
be fixed by a workaround, then the GNS3VM needs to ship it's own version of
QEMU and GNS3 should issue a warning against using QEMU 2.11.
I think, that just closing it, is not appropriate.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1335 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AcBpjgx6lAcnxOia3TY-h_0jCQpchkQ6ks5t7pZFgaJpZM4UHmgd>
.
|
Hi all, i'm getting this same error with ubuntu 16.04 and kvm 2.5. I'm trying to gather more info. ii ipxe-qemu 1.0.0+git-20150424.a25a16d-1ubuntu1.2 all PXE boot firmware - ROM images for qemu What is strange is i can see putty has connected to python via tcpdump but no data is showing up. Seems like VNC consoles doesn't have this problem. I think i noticed too many irqs on a serial port on a different vm. I can't login to the vm remotely yet but i want to start adding to this. |
playing around with tcpdump and strace. I verified i can see the telnet session going into python. I ran strace against python and noticed i can see data coming back from the remote console. egrep '6758 recvfrom(18' output.txt and i see 6758 recvfrom(18, "\n\33[H\33[JThis system is for authorized use only.\r\nlogin: ", 262144, 0, NULL, NULL) = 55 So this is the login prompt for the remote box. I don't really seem much going on between recvfrom calls. 6758 recvfrom(18, "Logi", 262144, 0, NULL, NULL) = 4 I'm not sure where to proceed. my unix trouble shooting skills are about at an end. |
Oh and i sshed directly to the host. Nothing odd showing up in dmesg or /var/log/*. |
BTW i'm running GNS3 2.1.11 on bare metal Ubuntu 16.04 with kernel 4.15.0-42-generic. |
Well.. i got brave.. i looked up gdb debugging and python.. i attached gdb ran a bt existed out and the console started working. |
I didn't get any softlock messages in dmesg. Also if kvm was the issue i would expect that strace.. ing (is that a verb?) gns3server wouldn't have shown the login console and all that jazz. I feel like the issue or at least the one i'm reporting its on the gns3server. Could be wrong. I also don't understand why stopping gns3server via gdb would have fixed it somehow. I mean if kvm was going bonkers how could python be the advil. I'm going to upgrade stuff to 18.04 and see what happens. BTW my gns3server is a dell R620 - E5-2660. I'll report back on how that goes. |
I don't think kvm is the problem either. Have you tried different Qemu VMs or a different terminal? |
It seemed to follow the VM. Like i could power off said vm, turn it back on
and it would happen still be happening.
I just upgraded said system to ubuntu 18.04. I'll play with it over the
weekend and report back if i see it again. Should i do see it come up any
pointers on how to fire up gdb or anything else?
…On Thu, Dec 13, 2018 at 9:03 PM Jeremy Grossmann ***@***.***> wrote:
I don't think kvm is the problem either. Have you tried different Qemu VMs
or a different terminal?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1335 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AeMSoAhEGHJbpp7PnMYyfM16t35-5GELks5u4wcOgaJpZM4UHmgd>
.
|
oh and to be clear, i had 2 VMs built from the same image attached to the same network. One worked the other didn't. Now they both seem to be fine. odd. |
Seems like it hasn't come back since upgrading. I'll update the bug report if it pops up again. |
Ok, please post here if it happens again. Thanks. |
We seem to be having this identical issue on our install. Ubuntu 16.04.3, QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.33), Copyright (c) 2003-2008 Fabrice Bellard, gns3server version 2.1.0. Reproduction: It's not immediate, and a restart of the emulated router does clear the problem for a few hours, but it always comes back, and once it starts happening it keeps happening (and thus the console is unusable) until the router VM is restarted. When the problem occurs, I telnet to the host running the VM (which is on socket 18 in the trace below), and gns3server relays to the host successfully via socket 37, gets a response back from socket 37, but never sends that respond back out socket 18 to my client: An strace -p shows when I hit enter on the telnet session, the Enter is relayed, and a "Login:" prompt comes back from the router VM, but is never relayed by the gns3server process: [pid 2702] epoll_wait(3, [{EPOLLIN, {u32=18, u64=6583065115254325266}}], 29, 100) = 1 lsof of sockets 18 and 37: $ sudo lsof -i -a -p 2702 | grep 37u The matching local port 5001 binding: |
For what it's worth, we have a sneaking suspicion (nothing yet to prove this is the case) that the issue may be related to a firewall that's in between my laptop and the server that may be removing TCP sessions from its mapping table after a couple hours. The server is showing additional connections still open to port 5000, but those telnet sessions are long since gone: gns3serve 2702 gns3 11u IPv4 1009532 0t0 TCP (server):5000->(other laptop):53368 (ESTABLISHED) I left socket 18 (my telnet session) open over night, and this morning, the telnet client still acted connected, until I hit Enter, and almost immediately got "Connection closed by foreign host." However, AFTER that, socket 18 is STILL showing ESTABLISHED on the server with identical output as above. Are TCP keepalives enabled on the TCP socket between the Telnet connection sharing proxy on GNS3 and the various clients (the laptop->GNS3 session)? It doesn't look like it: I'm not familiar enough yet with asyncio to figure out how to set it in that library, although with Python's low level socket call it'd be That would help at least clean up the hanging sockets. I'm not sure what chaos this may be causing inside gns3. I'm rooting through telnet_server.py and base_node.py trying to sort through it, but work keeps pulling me away. |
Thanks for troubleshooting this. I have added the socket options SO_KEEPALIVE and TCP_NODELAY which hopefully should help/fix this issue. |
Thanks - I snagged the patch and applied it to one of our compute nodes. I should hopefully know in a day or so if this was the fix or not. |
Excellent, that would be awesome to confirm if the patch fixes this issue. |
@blargh2015 have you had a chance to test? :) |
So far it looks good - our test VMs on one of our compute nodes continues to work through today. Various CLI commands also indicate the hanging sockets aren't sticking around anymore. I'm going to apply it to all of our compute nodes later today and make sure our other engineers are hammering it. tl;dr: Looking good so far - let me spread it wider and let it soak another couple days before the 'all clear'. |
So a couple days in, and things look good - seems like that patch nailed the problem. I think we're OK to close the issue. |
Thanks for your help 👍 |
Hi, I am facing the same issue (same socket transport error in the server log), with a fresh GNS3 server 2.2.5 install on a bare-metal server. Ubuntu 18.04.4 LTS and QEMU 2.11. |
I'm still seeing it as well with 2.2.3. VaranDelft - bb284d0 That is the url that has the patch. It should already be in 2.2.5. My question is I don't understand how this fixes anything. I'm by no means a python expert but it looks like a socket object has been instantiation without being used? Not following how this is setting the socket options on the async io. |
Yes, the patch is there in 2.2.5. I checked the content of the |
oh, i just re-read the python code I get it now. socket is a ref to the writer object that was passed in. |
I was reading and it looks like maybe tcp keepalives don't start firing off for 2 hours and this isn't configurable via socket call on linux. I think 1 hour is what most cool kids are configuring for a timeout on stateful firewalls. https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html I'm using that to lower the timers and will see if that helps things at all. |
@spikefishjohn Thanks for sharing the link. I have decreased the time to 3600 s in my environment now. Let's see how it goes. What are your observations ?
Isn't this what the patch is trying to do as well ? |
So far I haven't seen any other issues since enabling the timers using the usingkeepalive.html link. 3600 might be too close to default tcp timeout. I would do 30 mins or maybe even 15. Basically anything below any stateful packet timer. The patch makes it so keep alive CAN be sent but the rate in which they are sent isn't configured at a socket level only at the OS level. |
In my case, I'm still observing the issue in a few devices. So I'll probably need to try reducing the time interval and see if it helps. |
hi, I have the same issue on GNS3 2.2.35.1. The decrease of tcp timers didn't help. Do you have some advice on how to get rid of this problem ? |
@e-cnelis @VarunDelft @blargh2015 @PatrickBostwick I know this is a super old bug, but i think the real fix for this is now in the the latest release of GNS3. Patch here |
Telnet console randomly stops responding requiring a reboot of the device to get it to respond. Networking between devices seems to still work when this happens however and I can even telnet or ssh into the device whose console no longer works. Below is log output. I am running Ubuntu 18.04 x86_64 with 32GB of ram.
2018-05-20 13:55:07 ERROR base_events.py:1266 Fatal read error on socket transport
protocol: <asyncio.streams.StreamReaderProtocol object at 0x7f08c580a4e0>
transport: <_SelectorSocketTransport fd=43 read=polling write=<polling, bufsize=2493>>
Traceback (most recent call last):
File "/usr/lib/python3.6/asyncio/selector_events.py", line 723, in _read_ready
data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2018-05-20 13:55:07 ERROR base_events.py:1266 Fatal read error on socket transport
protocol: <asyncio.streams.StreamReaderProtocol object at 0x7f08c695b6a0>
transport: <_SelectorSocketTransport fd=48 read=polling write=<polling, bufsize=2493>>
Traceback (most recent call last):
File "/usr/lib/python3.6/asyncio/selector_events.py", line 723, in _read_ready
data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2018-05-20 13:55:07 ERROR base_events.py:1266 Task exception was never retrieved
future: <Task finished coro=<AsyncioTelnetServer.run() done, defined at /usr/share/gns3/gns3-server/lib/python3.6/site-packages/gns3server/utils/asyncio/telnet_server.py:188> exception=TimeoutError(110, 'Connection timed out')>
Traceback (most recent call last):
File "/usr/share/gns3/gns3-server/lib/python3.6/site-packages/gns3server/utils/asyncio/telnet_server.py", line 197, in run
yield from self._process(network_reader, network_writer, connection)
File "/usr/share/gns3/gns3-server/lib/python3.6/site-packages/gns3server/utils/asyncio/telnet_server.py", line 259, in _process
data = coro.result()
File "/usr/lib/python3.6/asyncio/streams.py", line 634, in read
yield from self._wait_for_data('read')
File "/usr/lib/python3.6/asyncio/streams.py", line 464, in _wait_for_data
yield from self._waiter
File "/usr/lib/python3.6/asyncio/selector_events.py", line 723, in _read_ready
data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
The text was updated successfully, but these errors were encountered: