New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libbladeRF] Device locks up when simultaneously tuning freq and RX'ing #351
Comments
This has only been replicated on USB 3.0. However, that may simply just be a matter of transfers occurring quicker, so I do not yet want to make the claim that it's a USB 3.0-only issue. I've added the libbladeRF_test_freq_hop(.exe) program to master to replicate this. The following is what I have been using to reproduce this:
In Linux, I seem to run into some timeouts and severe lockups. More debugging is required, but things seem stuck in within libusb code. In the past, I have seen this occur when transfers were corrupted due to race conditions. In Windows, I hit this assertion failure because the transfer is marked in-flight. More digging required, but this smells like a race, perhaps due to the hackish grossness involving the stream lock here. |
In Windows, it appears that I am able to induce this with When the aforementioned assertion failure is hit, it is because we expect transfer I will commit a band-aid for this, such that we will simply use the next available transfer. However, I would like to keep this observation in the back of my mind for when we dig into issue #328. Below is a patch to log transfer events, should anyone else care to try to replicate this. (And admittedly, it give me a place to find it again. ;) )
|
The assert() failures in Windows from issue #351 appear to be the result of transfer completion callbacks arriving out of order. This seems to happen pretty reliably after an event that momentarily spikes CPU load. When we expect that transfer number N to be available next, I'm seeing that transfer [ (N+1) % num_transfers ] is actually coming back as the next available transfer. This patch is a band-aid to allow us to just use the next available transfer, instead of assuming which transfer it will be. While this should keep things running instead of grinding to a halt, it does *not* address the underlying issue. Out-of-order transfer completion indicates users may get buffers out of order, which is unacceptable.
The assert() failures in Windows from issue #351 appear to be the result of transfer completion callbacks arriving out of order. This seems to happen pretty reliably after an event that momentarily spikes CPU load. (Windows 7 and 8.1 with libusb 1.0.19.) When we expect that transfer number N to be available next, I'm seeing that transfer [ (N+1) % num_transfers ] is actually coming back as the next available transfer. For example we get transfers ... 6, 7, 8, 10, 9, 11, 12 ... This patch is a band-aid to allow us to just use the next available transfer, instead of assuming which transfer it will be. While this should keep things running instead of grinding to a halt [1], it does *not* address the underlying issue. Out-of-order transfer completion indicates users may get buffers out of order, which is unacceptable. As such, we will need to keep digging further, and I will register a new issue regarding libusb giving us out-of-order transfer callbacks. [1] Arguably, a transient bad buffer is better than an entire application crashing.
The assert() failures in Windows from issue #351 appear to be the result of transfer completion callbacks arriving out of order. This seems to happen pretty reliably after an event that momentarily spikes CPU load when using higher sample rates. (Windows 7 and 8.1 with libusb 1.0.19) When we expect that transfer number N to be available next, I'm seeing that transfer [ (N+1) % num_transfers ] is actually coming back as the next available transfer. For example we get transfers ... 6, 7, 8, 10, 9, 11, 12 ... This patch is a band-aid to allow us to just use the next available transfer, instead of assuming which transfer it will be. While this should keep things running instead of grinding to a halt [1], it does *not* address the underlying issue. Out-of-order transfer completion indicates users may get buffers out of order, which is unacceptable. As such, we will need to keep digging further, and I will register a new issue regarding libusb giving us out-of-order transfer callbacks. [1] Arguably, a transient bad buffer is better than an entire application crashing, now that we're aware of this.
The assert() failures in Windows from issue #351 appear to be the result of libusb transfer completion callbacks arriving out of order. This seems to happen pretty reliably after an event that momentarily spikes CPU load when using higher sample rates. This has been reproduced on Windows 7 and 8.1 with libusb 1.0.19. The Cypress backend is not affected. When we expect that transfer number N to be available next, I'm seeing that transfer [ (N+1) % num_transfers ] is actually coming back as the next available transfer. For example we get transfers ... 6, 7, 8, 10, 9, 11, 12 ... This patch is a band-aid to allow us to just use the next available transfer, instead of assuming which transfer it will be. While this should keep things running instead of grinding to a halt [1], it does *not* address the underlying issue. Out-of-order transfer completion indicates users may get buffers out of order, which is unacceptable. As such, we will need to keep digging further, and I will register a new issue regarding libusb giving us out-of-order transfer callbacks. [1] Arguably, a transient bad buffer is better than an entire application crashing, now that we're aware of this.
It has been reported that libbladeRF intermittently locks up when simultaneously tuning the frequency and RX'ing samples via the synchronous interface, more frequently with higher samplerates/bandwidths.
I'll be investigating this further and tracking the status on this here.
The text was updated successfully, but these errors were encountered: