New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAP playback to TWS headset is desynchronized #515
Comments
@pv Im reading the following part of the spec and it looks like we need to sync the sequence number somehow:
That said normally we would use HCI_LE_Read_ISO_TX_Sync to sync up the sequence number but I don't think that is implemented in AX210 firmware, note that we need to increment the sequence number even when there is no data. |
The user application probably could set some running sequence number in a CMSG, and we could then use those to align the packets for different CIS in same CIG on kernel side, and also assign new sequence numbers for the controller. Getting it right is probably easier to do on kernel side. I'm not sure if AX210 does something with the numbers. IIRC I tried aligning the sequence numbers in the two CIS (passing them from userspace in cmsg) earlier, but this had no effect. The timestamps did something, but not possible to put correct numbers to get audio without the TX Sync. Maybe need to try this again, it was some time ago. |
@pv Id leave this for the kernel to resolve since it might revolve into HCI traffic, and in case we move the socket around the stack so it is probably not a good idea to use CMSG for something so fundamental as sequence number, anyway if you tried that and it didn't make any difference, then maybe it is not related to that. |
Yeah, for the CMSG I mean that it would be useful to have some mechanism for indicating to the kernel which packets correspond to the same time in different CIS, to make it easier to synchronize the streams on socket level. Running sequence numbering for the packets (unrelated to whatever kernel decides to send to adapter) same in each CIS could be one way. |
@pv can you give it a try with following patch to see if it makes a difference: https://patchwork.kernel.org/project/bluetooth/patch/20230621205230.1006939-1-luiz.dentz@gmail.com/ |
Doesn't seem to change it. BlueZ (except transport.send) and Pipewire I think do not make use of the getsockopt ISO QoS latency/interval values obtained after connection, but use the ASE QoS values. I guess a suspicion was that user application should stuff N>0 extra packets to the sockets immediately when playback starts, so that kernel+adapter do not run out of data to send if there are timing glitches. I tried this earlier with different numbers of extra frames, but didn't seem to help with the desync. Writing many initial packets also seems to make it a bit worse. The kernel seems to do this:
even though the socket writes are in interleaved CIS1 CIS2 CIS1 CIS2 ... order. Probably there's no ordering guarantee. The ordering can also swap spontaneously as kernel sends ... CIS1 CIS2 CIS2 CIS1 ... or CIS1 CIS1 CIS2 CIS2. However, I have also HCI traces with fully CIS1 CIS2 CIS1 CIS2 ... interleaved traffic that but desynchronized playback, and traces with interleaving order swaps giving synchronized playback. So it's not clear this has something to do with it. |
We probably need to fix that though so the CIS Established values are used since that afaik is what the controller expects.
Well these seems to be sent at the same iteration though, Im thinking on doing some changes to the how we schedule since the current scheduler don't take into account the timestamps of the packets.
This is because of the current queuing discipline, it is basically a priority based one which in this case probably means there are 2 channels with the same priority so they will get each half of the buffers/credits at each scheduler iteration, which is not that great for timing sensitive packets. |
@pv Im using these files to with new btmon -a/--analyze code:
No wonder they get out of sync, although the vast majority are around 12-13 ms there are quite a few that are spread. |
I suspect this is when things start to derail:
I will probably need to add the frame number to make it clearer which are these responses. |
From what I recall, and can still reproduce (on AX210 Firmware timestamp 2023.24 buildtype 1 build 67068, Found device firmware: intel/ibt-0041-0041.sfi, Firmware Version: 252-24.23) is that the playback sync does not "slip" during playback and there are no sound glitches. Instead, the left/right streams are desynchronized immediately at playback start, so if it is scheduling issue then culprit is probably close to the start of the streaming. If user stops writing to the socket for some tens of ms, and then resumes sending packets, this can change the synchronization (either make it synchronized, or make it desynchronized). |
ISO streams in the same group should be synchronized, this is required e.g. by stereo audio playback in TWS headsets. Currently the only synchronization mechanism we use is that the user has to supply data for each stream at the right time and in right order, and then we send them to the controller ASAP without explicitly trying to keep streams in sync and hope for the best. However, it is observed e.g. with Intel AX210 that rather often a packet for one of the streams does not get sent but is left in the queue and postponed, even if submitted to controller queue at the same time (as far as able). After this, that ISO stream latency is persistently one packet larger relative to the other streams in the group, and e.g. TWS stereo audio is ruined. User sending packets to the ISO sockets cannot know this occurred, nor can it easily correct for it. Fix the problem by adding an explicit stream synchronization mechanism to ISO sockets, and handling stream synchronization in packet scheduling. The user may tag each packet sent to ISO socket via CMSG with a monotonically increasing sequence number. The kernel shall ensure that SDU timings within ISO group are such that packets with same CMSG sequence number are aligned. The implementation here does this by delaying sending some packets until the streams are aligned. It will also wait for controller queue to flush sufficiently, if it appears that streams are out of sync on the controller side. See the code for explanation. This patch fixes the observed TWS desynchronization issue (currently not completely, see below), if the sound server is modified to add the packet seq, https://gitlab.freedesktop.org/pvir/pipewire/-/commits/iso-seqnum I tried various other things to try to fix this (FIFO scheduling etc.), without much success. Requiring conn->sent alignment before sending more stuff to the controller seems necessary. There are a few things to think about, hence RFC: - Any better ideas? - The conn->sent alignment check probably should be somehow refined. It seems to be enough with low transport latency, but with larger transport latencies one can still get streams out of sync, characterized by the controller not reporting Number of Completed packets for all streams in one go, but sending completed packet events for some of the streams only a bit later. - This sort of scheduling thing that depends on controller queue or clock IMO needs to be done in kernel, user should just tell the kernel how stuff is supposed to be scheduled. - However, not clear if packet sequence number is the best way to for this sort of thing, or if user should specify some sort of timestamps or other information instead. E.g. we might change this to use SDU timestamps (if they can be used), but user would communicate the scheduling info still the same way. - Ideally we probably should use SDU timestamps to explicitly tell the controller which packets go to same interval, but at least when last tried AX210 didn't support HCI_LE_Read_ISO_TX_Sync, and it appears clock synchronization is required which cannot(?) be obtained without it. - This patch can make latency to grow on transmission glitch, if it needs to wait for controller queue to drain so that streams can be aligned again. For proper flow control especially in audio use cases, but also more generally, would be better to have for user a way to know the number of packets in queue, in socket buf, and in controller, and the max. TIOCOUTQ is similar, but does not give packet counts nor controller queue so is insufficient. Link: bluez/bluez#515 Signed-off-by: Pauli Virtanen <pav@iki.fi>
ISO streams in the same group should be synchronized, this is required e.g. by stereo audio playback in TWS headsets. Currently the only synchronization mechanism we use is that the user has to supply data for each stream at the right time and in right order, and then we send them to the controller ASAP without explicitly trying to keep streams in sync and hope for the best. However, it is observed e.g. with Intel AX210 that rather often a packet for one of the streams does not get sent but is left in the queue and postponed, even if submitted to controller queue at the same time (as far as able). After this, that ISO stream latency is persistently one packet larger relative to the other streams in the group, and e.g. TWS stereo audio is ruined. User sending packets to the ISO sockets cannot know this occurred, nor can it easily correct for it. The issue can e.g. be triggered by bad transmission (e.g. cover one of the TWS earbuds with hand, or move the device further away), which appears to make a random change to the latency of its stream. Fix the problem by adding an explicit stream synchronization mechanism to ISO sockets, and handling stream synchronization in packet scheduling. The user may tag each packet sent to ISO socket via CMSG with a monotonically increasing sequence number. The kernel shall ensure that SDU timings within ISO group are such that packets with same CMSG sequence number are aligned. The implementation here does this by delaying sending some packets until the streams are aligned. It will also wait for controller queue to flush sufficiently, if it appears that streams are out of sync on the controller side. See the code for explanation. This patch fixes the observed TWS desynchronization issue (currently not completely, see below), if the sound server is modified to add the packet seq, https://gitlab.freedesktop.org/pvir/pipewire/-/commits/iso-seqnum I tried various other things to try to fix this (FIFO scheduling, different number of packets submitted at the same time, etc.), without much success, and this is the first one to give results. Requiring conn->sent alignment before sending more stuff to the controller seemed the important step here. There are a few things to think about, hence RFC: - Any better ideas? The spec seems vague here (if we can't set SDU timestamps), so some approach that's not based on guessing what the controller internally does could be better. - The conn->sent alignment check probably should be somehow refined. It seems to be enough with low transport latency, but with larger transport latencies one can still get streams out of sync, characterized by the controller not reporting Number of Completed packets for all streams in one go, but sending completed packet events for some of the streams only a bit later. - This sort of scheduling thing that depends on controller queue or clock IMO needs to be done in kernel, user should just tell the kernel how stuff is supposed to be scheduled. - However, not clear if packet sequence number is the best way to for this sort of thing, or if user should specify some sort of timestamps or other information instead. E.g. we might change this to use SDU timestamps (if they can be used), but user would communicate the scheduling info still the same way. - Ideally we probably should use SDU timestamps to explicitly tell the controller which packets go to same interval, but at least when last tried AX210 didn't support HCI_LE_Read_ISO_TX_Sync, and it appears clock synchronization is required which cannot(?) be obtained without it. - This patch can make latency to grow on transmission glitch, if it needs to wait for controller queue to drain so that streams can be aligned again. For proper flow control especially in audio use cases, but also more generally, would be better to have for user a way to know the number of packets in queue, in socket buf, and in controller, and the max. TIOCOUTQ is similar, but does not give packet counts nor controller queue so is insufficient. Link: bluez/bluez#515 Signed-off-by: Pauli Virtanen <pav@iki.fi>
ISO streams in the same group should be synchronized, this is required e.g. by stereo audio playback in TWS headsets. Currently the only synchronization mechanism we use is that the user has to supply data for each stream at the right time and in right order, and then we send them to the controller ASAP without explicitly trying to keep streams in sync and hope for the best. However, it is observed e.g. with Intel AX210 that rather often a packet for one of the streams does not get sent but is left in the queue and postponed, even if submitted to controller queue at the same time (as far as able). After this, that ISO stream latency is persistently one packet larger relative to the other streams in the group, and e.g. TWS stereo audio is ruined. User sending packets to the ISO sockets cannot know this occurred, nor can it easily correct for it. The issue can e.g. be triggered by bad transmission (e.g. cover one of the TWS earbuds with hand, or move the device further away), which appears to make a random change to the latency of its stream. Fix the problem by adding an explicit stream synchronization mechanism to ISO sockets, and handling stream synchronization in packet scheduling. The user may tag each packet sent to ISO socket via CMSG with a monotonically increasing sequence number. The kernel shall ensure that SDU timings within ISO group are such that packets with same CMSG sequence number are aligned. The implementation here does this by delaying sending some packets until the streams are aligned. It will also wait for controller queue to flush sufficiently, if it appears that streams are out of sync on the controller side. See the code for explanation. This patch fixes the observed TWS desynchronization issue (currently not completely, see below), if the sound server is modified to add the packet seq, https://gitlab.freedesktop.org/pvir/pipewire/-/commits/iso-seqnum I tried various other things to try to fix this (FIFO scheduling, different number of packets submitted at the same time, etc.), without much success, and this is the first one to give results. Requiring conn->sent alignment before sending more stuff to the controller seemed the important step here. There are a few things to think about, hence RFC: - Any better ideas? The spec seems vague here (if we can't set SDU timestamps), so some approach that's not based on guessing what the controller internally does could be better. - The conn->sent alignment check probably should be somehow refined. It seems to be enough with low transport latency, but with larger transport latencies one can still get streams out of sync, characterized by the controller not reporting Number of Completed packets for all streams in one go, but sending completed packet events for some of the streams only a bit later. - This sort of scheduling thing that depends on controller queue or clock IMO needs to be done in kernel, user should just tell the kernel how stuff is supposed to be scheduled. - However, not clear if packet sequence number is the best way to for this sort of thing, or if user should specify some sort of timestamps or other information instead. E.g. we might change this to use SDU timestamps (if they can be used), but user would communicate the scheduling info still the same way. - Ideally we probably should use SDU timestamps to explicitly tell the controller which packets go to same interval, but at least when last tried AX210 didn't support HCI_LE_Read_ISO_TX_Sync, and it appears clock synchronization is required which cannot(?) be obtained without it. - This patch can make latency to grow on transmission glitch, if it needs to wait for controller queue to drain so that streams can be aligned again. For proper flow control especially in audio use cases, but also more generally, would be better to have for user a way to know the number of packets in queue, in socket buf, and in controller, and the max. TIOCOUTQ is similar, but does not give packet counts nor controller queue so is insufficient. Link: bluez/bluez#515 Signed-off-by: Pauli Virtanen <pav@iki.fi>
ISO streams in the same group should be synchronized, this is required e.g. by stereo audio playback in TWS headsets. Currently the only synchronization mechanism we use is that the user has to supply data for each stream at the right time and in right order, and then we send them to the controller ASAP without explicitly trying to keep streams in sync and hope for the best. However, it is observed e.g. with Intel AX210 that rather often a packet for one of the streams does not get sent but is left in the queue and postponed, even if submitted to controller queue at the same time (as far as able). After this, that ISO stream latency is persistently one packet larger relative to the other streams in the group, and e.g. TWS stereo audio is ruined. User sending packets to the ISO sockets cannot know this occurred, nor can it easily correct for it. The issue can e.g. be triggered by bad transmission (e.g. cover one of the TWS earbuds with hand, or move the device further away), which appears to make a random change to the latency of its stream. Fix the problem by adding an explicit stream synchronization mechanism to ISO sockets, and handling stream synchronization in packet scheduling. The user may tag each packet sent to ISO socket via CMSG with a monotonically increasing sequence number. The kernel shall ensure that SDU timings within ISO group are such that packets with same CMSG sequence number are aligned. The implementation here does this by delaying sending some packets until the streams are aligned. It will also wait for controller queue to flush sufficiently, if it appears that streams are out of sync on the controller side. See the code for explanation. This patch fixes the observed TWS desynchronization issue (currently not completely, see below), if the sound server is modified to add the packet seq, https://gitlab.freedesktop.org/pvir/pipewire/-/commits/iso-seqnum I tried various other things to try to fix this (FIFO scheduling, different number of packets submitted at the same time, etc.), without much success, and this is the first one to give results. Requiring conn->sent alignment before sending more stuff to the controller seemed the important step here. There are a few things to think about, hence RFC: - Any better ideas? The spec seems vague here (if we can't set SDU timestamps), so some approach that's not based on guessing what the controller internally does could be better. - The conn->sent alignment check probably should be somehow refined. It seems to be enough with low transport latency, but with larger transport latencies one can still get streams out of sync, characterized by the controller not reporting Number of Completed packets for all streams in one go, but sending completed packet events for some of the streams only a bit later. - This sort of scheduling thing that depends on controller queue or clock IMO needs to be done in kernel, user should just tell the kernel how stuff is supposed to be scheduled. - However, not clear if packet sequence number is the best way to for this sort of thing, or if user should specify some sort of timestamps or other information instead. E.g. we might change this to use SDU timestamps (if they can be used), but user would communicate the scheduling info still the same way. - Ideally we probably should use SDU timestamps to explicitly tell the controller which packets go to same interval, but at least when last tried AX210 didn't support HCI_LE_Read_ISO_TX_Sync, and it appears clock synchronization is required which cannot(?) be obtained without it. - This patch can make latency to grow on transmission glitch, if it needs to wait for controller queue to drain so that streams can be aligned again. For proper flow control especially in audio use cases, but also more generally, would be better to have for user a way to know the number of packets in queue, in socket buf, and in controller, and the max. TIOCOUTQ is similar, but does not give packet counts nor controller queue so is insufficient. Link: bluez/bluez#515 Signed-off-by: Pauli Virtanen <pav@iki.fi>
Applications sending ISO packets need to know latency on kernel and controller side accurately, so that they can maintain it at a desired target value. Moreover, Intel AX210 (and maybe others) do not appear to successfully use SDU timings or sequence numbers to maintain synchronization between CIS in same CIG, and streams often stay out of sync persistently even though SDUs for each CIS are sent to ISO sockets and to the controller exactly at the same time. As these controllers have broken LE Read ISO TX Sync, it's not very clear in the spec how the synchronization is supposed to work. Add new ioctl SIOCBTREADTXINFO, which provides a reference time when a packet was last sent and the length of the queue at that time. From this information and packet send times it records, user application can reconstruct the total tx latency between submitting a packet to a socket, and controller declaring it completed. This patch does not implement support for LE Read ISO TX Sync. However, the ioctl() is such that it shall be possible to implement it also via LE Read ISO TX Sync. Moreover, it may be useful for audio use cases to implement it also for L2CAP later. Link: bluez/bluez#515 Signed-off-by: Pauli Virtanen <pav@iki.fi>
Use TX timestamps to get accurate reading of queue length and latency on kernel + controller side. This is new kernel BT feature, so requires kernel with the necessary patches, available currently only in bluetooth-next/master branch. Enabling Poll Errqueue kernel experimental Bluetooth feature is also required for this. Use the latency information to mitigate controller issues where ISO streams are desynchronized due to tx problems or spontaneously when some packets that should have been sent are left sitting in the queue, and transmission is off by a multiple of the ISO interval. This state is visible in the latency information, so if we see streams in a group have persistently different latencies, drop packets to resynchronize them. Also make corrections if the kernel/controller queues get too long, so that we don't have too big latency there. Since BlueZ watches the same socket for errors, and TX timestamps arrive via the socket error queue, we need to set BT_POLL_ERRQUEUE in addition to SO_TIMESTAMPING so that BlueZ doesn't think TX timestamps are errors. Link: bluez/bluez#515 Link: https://lore.kernel.org/linux-bluetooth/cover.1710440392.git.pav@iki.fi/ Link: https://lore.kernel.org/linux-bluetooth/f57e065bb571d633f811610d273711c7047af335.1712499936.git.pav@iki.fi/
With the TX latency patches, this is now mitigated on application side. |
I'm seeing BAP playback to a TWS headset (Samsung Galaxy Buds2 Pro + Intel AX210) left and right earbuds to be desynchronized immediately at playback start. Looking at audio recorded by external mic, it seems the playback on left and right earbuds is off by 10ms.
This occurs sometimes, quite often when the devices connect after taking them out of their case. Disconnect + reconnect fixes the issue, but sometimes it needs to be done more than once to get synchronized playback.
Looking at the HCI traces, I did not see the sound server (Pipewire) doing something strange, and BlueZ/kernel problem also seems unlikely, but uploading here mainly to have some place to refer to for the files.
Headset: Samsung Galaxy Buds2 Pro
Controller: Intel AX210
Bluez: 6bafe1c
Kernel: 6.3.1 + https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/commit/?id=d4b20f0b8491bfb3238ebf7df4b13243189620d2
Pipewire: https://gitlab.freedesktop.org/pvir/pipewire/-/commit/068a132203a9896ff69c0f561406355c86fbdcd5 (master/0.3.70 is also same, this is just to make absolutely sure exactly same data is written to both CIS)
This has occurred also on all previous versions of the software/firmware I've tested, also on bluetooth-next kernel (e6e576ec4e728b201a8013).
Input signal: ax210-samsung-gbuds2pro-tws-desync-3-input.flac.log
HCI trace 1 (desynchronized): ax210-samsung-gbuds2pro-tws-desync-3.log
External mic (close to both earbuds) recording 1: ax210-samsung-gbuds2pro-tws-desync-3.flac.log
HCI trace 2 (desynchronized): ax210-samsung-gbuds2pro-tws-desync-4.log
External mic (close to both earbuds) recording 2: ax210-samsung-gbuds2pro-tws-desync-4.flac.log
You can hear in the recordings that the blip sounds are slightly separated, and it is obvious in stereo. Inspecting the waveforms one can see there's a 10ms offset.
For comparsion (I don't see any suspicious difference however in the traces however):
HCI trace 3 (synchronization ok): ax210-samsung-gbuds2pro-tws-ok.log
The text was updated successfully, but these errors were encountered: