Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Corrupted data packet at SRT receiver output when using encryption / decryption #2502

Closed
mvandenbroecke opened this issue Oct 28, 2022 · 12 comments · Fixed by #2541
Closed
Labels
[core] Area: Changes in SRT library core Priority: High Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@mvandenbroecke
Copy link

Transport stream delivered by the SRT receiver now and then has a continuity count error.

Steps to reproduce the behavior:

  1. Transport stream delivered by SRT receiver, and AES encryption is used.
  2. Monitor continuity count errors on the transport stream over longer time.
  3. CC error can occur at the encryption key rotation period (2^24 SRT packets), so error interval depends on the used transport stream bitrate.
  4. There are no lost/dropped/retransmitted packets when this happens.

Expect no CC errors.

OS: Linux
SRT Version: 1.4.4

@mvandenbroecke mvandenbroecke added the Type: Bug Indicates an unexpected problem or unintended behavior label Oct 28, 2022
@ethouris
Copy link
Collaborator

If you are talking about the continuity count error in MPEG-TS or DVB-SI streams as defined by TR 101 290, then one thing is important about SRT: the only way how the input stream can be modified in the output is when SRT drops packets that were abandoned due to delivery time requirements (as required by TLPKTDROP mechanism). Apart from that, there are no mechanisms in SRT that would anyhow modify the transmitted data, encrypted or not, and definitely not in exactly the 4-th byte of the payload. SRT doesn't even know what kind of data it transmits.

If you can show us an example of an input stream that is perfectly correct (at least as it comes to the continuity bits) on the input and SRT has modified this stream anyhow during the transport and caused errors, we can investigate. Without this we don't even know where to search for anything.

@mvandenbroecke
Copy link
Author

Yes this is about the CC count in an MPEG-TS stream. AES encryption does change the data wright ?
With the default encryption key period being 2^24 RTP packets we see the interval of CC errors for a 20 Mbps TS is 8831s. When a CC error is seen, the time between this error and the previous one is always a multiple of 8831s. No CC errors are seen in between. When we change the key period in code to 2^16 RTP packets, the time interval between errors is always a multiple of 69s (=8831/128) for that same 20 Mbps TS rate. So this proves that there's a dependency on the key rotation. Maybe the first packet after key rotation is sometimes decrypted with the wrong key ?

@ethouris
Copy link
Collaborator

ethouris commented Nov 2, 2022

If the problem was with decryption, you would see these packets as missing. There is also in the statistics structure the pktRcvUndecryptTotal field that counts packets that were not decrypted.

The encryption key change mechanism is executing the procedure in 3 steps: PREANNOUNCE, SWITCH, DECOMMISSION.

The SWITCH event happens after transmitting the number of packets configured as kmrefreshrate option. The kmpreannounce option defines the distance in packets between SWITCH and these two others. During the period between PREANNOUNCE and DECOMMISSION there are two keys active and one of them is being used for decryption; the SWITCH event changes which one.

It will still help if you show us the configuration of your transmission; maybe we can figure out where to hook up. If you experience CC errors then likely also the PES packets for which are incomplete and contain further errors, but if this is anyhow caused by SRT, then you should be losing usually 7 TS packets for one lost UDP packet. The continuity counter is 4 bits, so up to 16 you should be able to find out precisely, how many network packets are missing.

@mvandenbroecke
Copy link
Author

Thanks for your feedback. The configuration is:
RTP input transport stream -> SRT sender -> SRT receiver -> RTP output transport stream.
AES-128 encryption/decryption is used in SRT sender/receiver.
The CC errors are seen at the RTP output transport stream: a full RTP packet (with 7 TS packets) is sporadically missing.
And clearly, the interval between two consecutive missing packets goes along with the HAICRYPT_DEF_KM_REFRESH_RATE (as indicated before).
Again, the issue is not happening at every key rotation, and I need to have 100+ sender/receiver configurations running on our system in order to see it.
I'm trying now to capture (tcpdump) all traffic that comes in/leaves the SRT receiver side server port during that missing packet event.

@mvandenbroecke
Copy link
Author

The RTP transport stream capture at the output of the SRT receiver confirms there's a missing RTP packet (and CC errors).
In order to trigger the issue more frequently, I have changed the TS bitrate to 100 Mbps instead of 20 Mbps, and next to that changed HAICRYPT_DEF_KM_REFRESH_RATE from 0x1000000 0x20000 for srt-live-transmit.
This gives a key change every 2^17 packets, or every 13.8s for 100 Mbps TS rate (with 7 TS packets per RTP packet).
For 60 streams (SRT sender and receiver combinations), a missing packet is seen very frequently. The time between consecutive events is: 83s, 41s, 55s,... all at exact multiples of the key change period (13.8s).
With the exact same test configuration but without encryption/decryption, there are no missing packets seen at all.
A capture of the SRT traffic on the receiver side server port triggered by a missing RTP packet confirms (Wireshark) that this happens during a key change.
As the data is encrypted I cannot determine if the missing RTP sequence number was already lost at the sender side.
If the number of streams is reduced to 20, the issue still happens but much less frequent, around every 10 to 20 minutes.
Would you be able to reproduce somehow ?
I could also add debug tracing in the code if you let me know what and where.
I will also update the bug title description.
Thanks, Michel.

@mvandenbroecke mvandenbroecke changed the title [BUG] Transport stream continuity count error for SRT receiver [BUG] Lost data packet at SRT receiver output when using encryption / decryption Nov 10, 2022
@mvandenbroecke
Copy link
Author

In a new test we have changed the code to not encrypt/decrypt the first 16 bytes of the data so the RTP sequence headers remain clear in our SRT traffic capture. With this change and during a CC error event, we see no missing RTP packet sequence number in the SRT capture around the key change. More important, with every CC error logged by our monitoring system, there's also a sync byte error. So our previous conclusion that there's a missing RTP packet is probably not correct, the capture may have dropped it based on a unrecognized RTP version or first sync byte being incorrect (to be confirmed).
So probably the issue is triggered because now and then, a wrong key is used to decrypt the packet after a key change.

@mvandenbroecke
Copy link
Author

A capture of the receiver output with the original code (that encrypts all data) indeed shows a malformed RTP packet, I previously overlooked that in Wireshark. So confirms the non matching key theory.

@mvandenbroecke mvandenbroecke changed the title [BUG] Lost data packet at SRT receiver output when using encryption / decryption [BUG] Corrupted data packet at SRT receiver output when using encryption / decryption Nov 10, 2022
@mvandenbroecke
Copy link
Author

To further debug we've added the encryption key data in the last bytes of the RTP packet and with that found that the problem is introduced at the SRT sender side. This can be seen in a capture of the SRT packets around the CC error event: in the last SRT packet with encryption status "encrypted with odd key", the key data already is the key data for the next packet with encryption status "encrypted with even key". Do you have enough information to investigate ?

@maxsharabayko
Copy link
Collaborator

Proposed Test Case

Use srt-xtransmit (maxsharabayko/srt-xtransmit#59) built with -DENABLE_HAICRYPT_LOGGING=ON to see key refresh events in the log.

Network: localhost, with artificial impairments 1% packet loss and 50 ms RTT.

The KM refresh period can be set using the SRTO_KMREFRESHRATE

(receive)
.srt-xtransmit receive "srt://:4200?passphrase=abcdefghijk" --enable-metrics --logfa "HAICRYPT" -v

srt-xtransmit generate "srt://127.0.0.1:4200?passphrase=abcdefghijk&kmrefreshrate=1024" --enable-metrics \
    --sendrate 1Mbps -v --logfa "HAICRYPT" --loglevel note

On the sender side the KM refresh events can be found by the following log lines:

11:47:25.050000/T19568.N:SRT.hc: KM[0] Pre-announced
11:47:30.421000/T19568.N:SRT.hc: KM[0] Activated
11:47:35.828000/T19568.N:SRT.hc: KM[1] Deprecated

The srt-xtransmit prints the following log messages on the receiving side with MD5 checksum and packet length validation.

[I] RECEIVE Latency, us: avg 125979, min 123590, max 127155. Jitter: 254us. Delay Factor: 3564us.
    Pkts: rcvd 116998, reordered 0 (dist 0), lost 0, wrong checksums 0, wrong lengths 0

@mvandenbroecke
Copy link
Author

Hi Max, thanks for your suggestion. I believe it will not bring more information by checking MD5 checksum and length. As I indicated above, the capture of the SRT receive side output showed a malformed RTP packet with correct length. It was malformed because it was decrypted with a non matching key and resulting in corrupted RTP data.
I have attached a document to illustrate the bad behavior introduced at the SRT sender side. It shows 3 consecutive SRT packets around the key switch for UDP dst port 20008 (for which there was a corrupted RTP packet seen at the receive side): packet numbers are 56688, 56689, 56711 (they do not follow because the capture contained packets for 20 SRT channels). It can be seen that for packet 56689 the encryption status does not match with key data (added in the last bytes of the payload data for debug purposes). Any idea how this can happen ?
Sender_Encryption_Error.docx

@maxsharabayko maxsharabayko added Priority: High [core] Area: Changes in SRT library core labels Nov 15, 2022
@maxsharabayko maxsharabayko added this to the v1.6.0 milestone Nov 15, 2022
@maxsharabayko
Copy link
Collaborator

Hi @mvandenbroecke
I confirm the issue. I suspect the reason is in the wrong key (odd/even) used for decryption, as you pointed out.
Further debugging is needed to confirm and understand how this happens.

@maxsharabayko
Copy link
Collaborator

maxsharabayko commented Nov 17, 2022

The RcvQ Thread switches the active key (odd/even):

CUDT::processCtrlAck(..)
⋅ ⋅ CUDT::checkSndTimers(..)
⋅ ⋅ ⋅ ⋅ CCryptoControl::sendKeysToPeer(..) // Key refresh

The SndQ Thread first sets the KK frags of a data packet, then encrypts the packet, potentially with a different key if KM refresh has happened.

CUDT::packUniqueData(..)
⋅ ⋅ CCryptoControl::encrypt(..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Priority: High Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants