Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USBIPd lockup about once a month #729

Open
rhermsen opened this issue Oct 14, 2023 · 13 comments
Open

USBIPd lockup about once a month #729

rhermsen opened this issue Oct 14, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@rhermsen
Copy link

I'm using usbipd between Windows-11 and WSL2-Ubuntu to connect a serial device to WSL2.
For the most part it works great, but about once a month it locks up.
I have a Python program running that obtains data from the USB connected device every 5 seconds.
To recover from this lockup I have to detach/attach (usbipd wsl detach --busid 1-4 / usbipd wsl attach --busid 1-4).
And than, after a restart of my program it works again.
There is no need to powercycle the device with USB (serial) port.
Dates seen a lockup so far (so often far apart):

2023-06-22 15:46
2023-06-26 23:48
2023-08-09 19:15
2023-09-17 15:34
2023-10-10 01:44

I looked at
https://github.com/dorssel/usbipd-win/wiki/Troubleshooting
and this time I started with trace level logging.
I see that the logging is without timestamps, so I wonder if this logging would be helpful for my issue?
Any additional advice how I could debug this issue?

Version details:
Windows:

systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"
OS Name:                   Microsoft Windows 11 Pro
OS Version:                10.0.22000 N/A Build 22000
usbipd --version
3.2.0+38.Branch.master.Sha.7ad7b9904ac0db8d759e3a53a22b1126da01b2be

WSL2:

$ cat /etc/os-release | grep VERSION=
VERSION="22.04.3 LTS (Jammy Jellyfish)"
$ uname -r
5.15.90.1-microsoft-standard-WSL2
$ apt list --installed hwdata linux-tools*
Listing... Done
hwdata/jammy,now 0.357-1 all [installed]
linux-tools-5.15.0-60-generic/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-60/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-86-generic/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-5.15.0-86/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-common/jammy-updates,jammy-security,now 5.15.0-86.96 all [installed,automatic]
linux-tools-virtual/jammy-updates,jammy-security,now 5.15.0.86.83 amd64 [installed]
$ update-alternatives --display usbip
usbip - auto mode
  link best version is /usr/lib/linux-tools/5.15.0-60-generic/usbip
  link currently points to /usr/lib/linux-tools/5.15.0-60-generic/usbip
  link usbip is /usr/local/bin/usbip
/usr/lib/linux-tools/5.15.0-60-generic/usbip - priority 20

USB device/driver:
A particulate matter meter (fijnstof meter).
https://github.com/msillano/tuyaDAEMON/wiki/custom-device-'PM-detector':-case-study

$ lsusb -s 001:005
Bus 001 Device 005: ID 1a86:7523 QinHeng Electronics CH340 serial converter
@dorssel
Copy link
Owner

dorssel commented Oct 14, 2023

Any additional advice how I could debug this issue?

https://github.com/dorssel/usbipd-win/wiki/Troubleshooting also shows how you can create captures (PCAPs). That's the most detailed logging available. You can also capture on the Linux side (usbmon), but it's a bit harder to set up. syslog messages are also helpful.

@dorssel
Copy link
Owner

dorssel commented Dec 6, 2023

4.0.0 has been released which fixes a (very small) memory leak. This could explain your issue. Can you confirm?

@rhermsen
Copy link
Author

Thanks for the update.
I installed version 4.0.0 on 7-Dec.
At that time 3.2.0 was running for 6 wks.

@dorssel dorssel added the bug Something isn't working label Dec 18, 2023
@dorssel
Copy link
Owner

dorssel commented Feb 4, 2024

Any news on this?

@rhermsen
Copy link
Author

rhermsen commented Feb 4, 2024

Hi,
Uptime is now: 'active (running) since Wed 2024-01-03 13:45:54 CET; 1 month 1 day ago'
(I did had to restart a few times after installing 4.0.0 for other reasons ...)

@dorssel
Copy link
Owner

dorssel commented Feb 4, 2024

Are you monitoring the memory usage of the usbipd service?

@rhermsen
Copy link
Author

rhermsen commented Feb 4, 2024

Hi,
No I was not..., only overall memory usage (which goes up and down due to WSL2).
Just added a graph to MRTG for usbipd.exe

@rhermsen
Copy link
Author

rhermsen commented Mar 2, 2024

Update on a likely unrelated occurrence.

On 2024-02-29 at 20:28 the PM-Monitor stopped working.
This was after 1 month and 26 days uptime (started 2024-01-03 13:45:54).
This time with a different failure-mode than for which I opened the ticket.
I cannot exclude that this got raised due to a connection glitch (e.g. connector accidentally touched).
The Win11 event log also show some logs at the time of the issue.

Memory usage looks to increase from 10MB to about 11MB at the moment the issue occurred (2024-02-29 20:26:48 CET).
Because the CH340-based serial-USB device was detached from usbipd.exe the setup stopped (no auto restart).
After attaching the CH340-based serial-USB device and restarting PM-Monitor process (2024-03-01 19:19:55 CET) I see another memory usage increase from 11MB to 12MB.
I have a graph of this ~20% memory usage increase. usbipd.exe wasn't restarted, so maybe it is normal to have a bit more memory in use?

Because I didn't have to restart usbipd.exe for this occurrence I assume it to be a different issue than for which I opened the ticket.

PM-Monitor restart scripts

startUSBipD.bat

@echo off
echo "Starting usbipd..."

:search
tasklist | find "wsl.exe"
IF %ERRORLEVEL% == 0 GOTO :found
TIMEOUT /T 1
GOTO :search

:found
TIMEOUT /T 3
python C:\Users\rhermsen\usbipd_start.py
TIMEOUT /T 30
wsl.exe -u root service pm_monitor restart
echo "Done"

usbipd_start.py

# -*- coding: utf-8 -*-
"""
# execute usbipd at startup
#
# rhermsen
# version 2.0
# 
Created on Fri Jun 16 21:20:07 2023


Available commands upto 3.2.0:
    usbipd wsl list
    usbipd wsl
        gives usage info
    usbipd wsl list
    usbipd wsl attach -h
    usbipd wsl detach -h
Available commands from 4.0.0:
    usbipd list
    usbipd
        gives usage info
    usbipd list
    usbipd attach -h
    usbipd detach -h


@author: rhermsen

"""

import subprocess

def executeCliCmd(command):
    '''
    Parameters
    ----------
    command : string, The full command line, including the parameters to execute.
    
        DESCRIPTION.
        Executes a command with optional parameters. The resulting output is returned in a list of strings.

    Returns
    -------
    List of strings where each line is an entry in this list.
    '''
    p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
    out, err = p.communicate()
    outString = str(out)
    # result = outString.split('\\r\\n')
    # print(result)
    return outString.split('\\r\\n')



def findBusidAndState(list, match='CH340'):
    attached = False
    busid = None
    for line in list:
        wordList = line.split()
        if match in wordList:
            busid = wordList[0]
            if "Attached" in wordList:
                attached = True
            # print("busid =", busid, "attached", attached)
    return busid, attached

list_command = 'usbipd list'
usb_list = executeCliCmd(list_command)

busid, attached = findBusidAndState(usb_list)
attach_command = f'usbipd attach --wsl --busid {busid}'
#print("attach_command:",attach_command)
if busid != None and attached == False:
    executeCliCmd(attach_command)
elif attached == True:
    print("PM-Monitor USB already attached to WSL2")
else:
    print("Could not attach PM-Monitor.")

MRTG USBIPd memory usage

Logging

WSL

Error messages in journalctl:

Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: connection reset by peer
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: stop threads
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: release socket
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: disconnect device
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: USB disconnect, device number 2
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: failed to send control message: -19
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341 1-1:1.0: device disconnected

similar in dmesg:

[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: connection reset by peer
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: stop threads
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: release socket
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: disconnect device
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: USB disconnect, device number 2
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: failed to send control message: -19
[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
[Thu Feb 29 20:27:01 CET 2024] ch341 1-1:1.0: device disconnected

Win11 event logs

(sometimes with a few events before the issue, and more after the issue):
USBIPd.exe event 2024-02-29 2026.evtx:
USBIPd.exe event 2024-02-29 2026.evtx.zip
Win Application logs USBIPd.exe event 2024-02-29 2026.evtx:
Win Application logs USBIPd.exe event 2024-02-29 2026.evtx.zip
Win System logs USBIPd.exe event 2024-02-29 2026.evtx:
Win System logs USBIPd.exe event 2024-02-29 2026.evtx.zip

@dorssel
Copy link
Owner

dorssel commented Mar 2, 2024

Thanks for the elaborate report.

Memory usage seems correct. usbipd-win is written in C#, a managed language using a garbage collector. It's normal for a new connection to have a temporary increase in memory consumption. It will take a while before the garbage collector will kick in.

The memory bug that I fixed earlier leaked 40 bytes per URB + a kernel event handle. I could see it in memory consumption clearly after about 100.000 URBs. I guess in your case it was the kernel handles that caused the original crash. I went up to 1 million URBs, but couldn't actually reproduce the crash. Can you guestimate your URB-rate? If that is indeed > 1 million per month it could explain the original crash.

With the current behavior, I think we can safely say the original problem is fixed.

@rhermsen
Copy link
Author

rhermsen commented Mar 3, 2024

Is it only in one direct this issue was present?

My stab at guestimate the URB-rate:
I have a tcpdump capture and assuming each URB ends up in a separate TCP segment I count

  • Win > WSL 10 segments from which 1 TCP segment without data (e.g. TCP Ack only) = 9
  • WSL > Win 12 segments from which 3 TCP segment without data (e.g. TCP Ack only) = 9
  • Same amount of data is requested every 5 seconds
  • ( 9 / 5 ) * ( 60 * 60 * 24 * 365) /12 = 4730400 URB/month in each direction

@dorssel
Copy link
Owner

dorssel commented Mar 3, 2024

Is it only in one direct this issue was present?

Usually, for each URB there is one USBIP request and one USBIP reply. It also makes sense that a single "poll" from your software consists of several URBs. Of course, when either the request or the reply doesn't fit in the MTU there will be several TCP fragments. But as a "guestimate", the 4 million makes sense.

@rhermsen
Copy link
Author

rhermsen commented Mar 7, 2024

To comment on your last update:
Indeed I see a single poll to be split in several URB's each in its own TCP segment. There is no fragmentation (largest packet is 146 MB only).

@rhermsen
Copy link
Author

rhermsen commented Mar 7, 2024

Unfortunately there was another similar occurrence as on 2024-02-29.

On 2024-03-07 11:04 the PM-Monitor stopped working in a similar way as on 2024-02-29 at 20:28.
This was after only 5 days uptime (started 2024-03-01 19:19:55).
This time nobody was near the device.
The Win11 event log doesn't show much this time...

Category: Usbipd.ConnectedClient
EventId: 2

Client 192.168.188.87 released device at 1-4 (USB\VID_1A86&PID_7523\5&9BB4271&0&4).

Memory usage was the same as after last week issue, 12MB until I restarted USBIPd.exe.
But after I restarted with debug and packet capture (pcap) enabled it jumped to 22MB.
(I assume a bit this is due debug and/or a packet capture running).

MRTG USBIPd memory usage 1

I hope the capture isn't growing too fast, but from the 'guestimate' it probably will...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants