Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latency spikes #3337

Closed
totaam opened this issue Nov 1, 2021 · 9 comments
Closed

latency spikes #3337

totaam opened this issue Nov 1, 2021 · 9 comments
Assignees
Labels
bug Something isn't working network

Comments

@totaam
Copy link
Collaborator

totaam commented Nov 1, 2021

This has been reported on other network connections, with better latency and bandwidth, without the html5 client.
-d xpra.server.source.source_stats:

12,879 record_latency: took 1003.4 ms round trip, 1003.3 for echo,   35.8 for decoding of   856800 pixels,    28922 bytes sent over the network in  967.6 ms,  967.5 ms for echo
12,879 process ui packet damage-sequence
12,879 record_latency: took  928.2 ms round trip,  928.1 for echo,   23.0 for decoding of      450 pixels,      963 bytes sent over the network in  905.2 ms,  905.1 ms for echo
12,885 processing packet damage-sequence
12,885 process ui packet damage-sequence
12,885 record_latency: took  926.0 ms round trip,  925.8 for echo,   27.7 for decoding of   856800 pixels,    31941 bytes sent over the network in  898.3 ms,  898.1 ms for echo

Although this was tested using the html5 client, the problem may well lie server side, as it should not let the latency degrade that badly.
The latency is normally around 50ms (goes up to 150ms when connecting using the html5 client!) via an 802.11g connection, but it sometimes climbs to 1500ms or more!


The decode_time was incorrectly set by the html5 client: Xpra-org/xpra-html5@dd6fcfd + Xpra-org/xpra-html5@c5e6415 which made it harder to debug using versions before this fix.

@totaam totaam added bug Something isn't working network labels Nov 1, 2021
@totaam totaam self-assigned this Nov 1, 2021
@totaam
Copy link
Collaborator Author

totaam commented Nov 1, 2021

The fact that ack packets seem to arrive bundled up together and that the decoding time is low would seem to point towards network issues (I was stressing the network with downloads at the time):

16,194 record_latency: took  596.1 ms round trip,    0.1 to send,  596.0 for echo,    5.4 for decoding of    90000 pixels,    10657 bytes sent over the network in  590.7 ms,  590.6 ms for echo
16,212 compress:   3.6ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   2.9%  (  351KB to    10KB), sequence   338, client_options={}
16,244 compress:   3.4ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   2.9%  (  351KB to    10KB), sequence   339, client_options={}
16,276 compress:   3.4ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   3.0%  (  351KB to    10KB), sequence   340, client_options={}
16,308 compress:   3.7ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   3.0%  (  351KB to    10KB), sequence   341, client_options={}
16,342 compress:   3.5ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   2.9%  (  351KB to    10KB), sequence   342, client_options={}
16,424 compress:   3.4ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   2.9%  (  351KB to    10KB), sequence   343, client_options={}
16,580 compress:   3.3ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence   344, client_options={}
16,611 pillow.encode('png', XShmImageWrapper(BGRX: 0, 0, 499, 316), 100, 33, True, False, None)
16,617 compress:   6.1ms for  499x316  pixels at    0,0    for wid=1     using       png with ratio   2.5%  (  615KB to    15KB), sequence    52, client_options={}
16,823 compress:   3.3ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence   345, client_options={}
16,876 record_latency: took 1244.6 ms round trip,    0.0 to send, 1244.6 for echo,   26.8 for decoding of    90000 pixels,    10535 bytes sent over the network in 1217.8 ms, 1217.8 ms for echo
16,901 record_latency: took 1238.3 ms round trip,    0.0 to send, 1238.3 for echo,   54.1 for decoding of    90000 pixels,    10443 bytes sent over the network in 1184.2 ms, 1184.2 ms for echo
16,903 record_latency: took 1206.9 ms round trip,    0.0 to send, 1206.9 for echo,   56.2 for decoding of    90000 pixels,    10557 bytes sent over the network in 1150.7 ms, 1150.7 ms for echo
16,903 record_latency: took 1174.4 ms round trip,    0.0 to send, 1174.3 for echo,   56.7 for decoding of    90000 pixels,    10366 bytes sent over the network in 1117.7 ms, 1117.6 ms for echo
16,903 record_latency: took 1142.5 ms round trip,    0.1 to send, 1142.5 for echo,   56.6 for decoding of    90000 pixels,    10652 bytes sent over the network in 1085.9 ms, 1085.9 ms for echo
16,904 record_latency: took 1111.5 ms round trip,    0.0 to send, 1111.4 for echo,   56.8 for decoding of    90000 pixels,    10760 bytes sent over the network in 1054.7 ms, 1054.6 ms for echo
16,904 record_latency: took 1078.8 ms round trip,    0.0 to send, 1078.8 for echo,   57.0 for decoding of    90000 pixels,    10580 bytes sent over the network in 1021.8 ms, 1021.8 ms for echo
16,904 record_latency: took 1044.8 ms round trip,    0.0 to send, 1044.8 for echo,   57.6 for decoding of    90000 pixels,    10595 bytes sent over the network in  987.2 ms,  987.2 ms for echo
16,911 record_latency: took 1019.9 ms round trip,    0.2 to send, 1019.7 for echo,   60.1 for decoding of    90000 pixels,    10495 bytes sent over the network in  959.8 ms,  959.6 ms for echo
16,973 record_latency: took 1050.8 ms round trip,    0.1 to send, 1050.8 for echo,    2.3 for decoding of    90000 pixels,    10466 bytes sent over the network in 1048.5 ms, 1048.5 ms for echo
17,114 record_latency: took 1159.8 ms round trip,    0.0 to send, 1159.8 for echo,    6.8 for decoding of    90000 pixels,    10592 bytes sent over the network in 1153.0 ms, 1153.0 ms for echo
17,119 pillow.encode('png', XShmImageWrapper(BGRX: 0, 0, 499, 316), 100, 50, True, False, None)
17,125 compress:   6.2ms for  499x316  pixels at    0,0    for wid=1     using       png with ratio   2.5%  (  615KB to    15KB), sequence    53, client_options={}
17,145 record_latency: took 1157.9 ms round trip,    0.1 to send, 1157.8 for echo,    3.7 for decoding of    90000 pixels,    10374 bytes sent over the network in 1154.2 ms, 1154.1 ms for echo
17,160 compress:   3.5ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence   346, client_options={}
17,180 record_latency: took 1161.3 ms round trip,    0.0 to send, 1161.2 for echo,    3.3 for decoding of    90000 pixels,    10687 bytes sent over the network in 1158.0 ms, 1157.9 ms for echo
17,203 record_latency: took 1152.0 ms round trip,    0.0 to send, 1152.0 for echo,    4.0 for decoding of    90000 pixels,    10751 bytes sent over the network in 1148.0 ms, 1148.0 ms for echo

So perhaps the only thing we need to change is how quickly we adapt and lower our bandwidth usage.

@totaam
Copy link
Collaborator Author

totaam commented Nov 2, 2021

Running glxgears at the default resolution of 300x300:

07,036 compress:   4.6ms for  300x300  pixels at    0,0    for wid=2     using      jpeg with ratio   3.1%  (  351KB to    11KB), sequence    20, client_options={}
07,050 record_latency: took   13.1 ms round trip,    0.0 to send,   13.0 for echo,    2.2 for decoding of    90000 pixels,    11316 bytes sent over the network in   10.9 ms,   10.8 ms for echo

Then adding bandwidth contention by sending a huge video file to the client over the same link (going in the same direction as the xpra data packets):

11,966 record_latency: took   14.6 ms round trip,    0.1 to send,   14.6 for echo,    2.9 for decoding of    90000 pixels,    11639 bytes sent over the network in   11.7 ms,   11.7 ms for echo
11,971 compress:   3.1ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   3.1%  (  351KB to    10KB), sequence  1442, client_options={}
(..)
12,935 compress:   4.3ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence  1513, client_options={}
12,936 record_latency: took  266.2 ms round trip,    0.0 to send,  266.1 for echo,    6.3 for decoding of    90000 pixels,    11296 bytes sent over the network in  259.9 ms,  259.8 ms for echo
12,936 record_latency: took  253.3 ms round trip,    0.1 to send,  253.2 for echo,    3.8 for decoding of    90000 pixels,    11335 bytes sent over the network in  249.5 ms,  249.4 ms for echo
12,955 record_latency: took  260.7 ms round trip,    0.0 to send,  260.7 for echo,    2.5 for decoding of    90000 pixels,    11464 bytes sent over the network in  258.2 ms,  258.2 ms for echo
12,971 record_latency: took  263.7 ms round trip,    0.0 to send,  263.6 for echo,    2.8 for decoding of    90000 pixels,    11522 bytes sent over the network in  260.9 ms,  260.8 ms for echo
12,977 record_latency: took  257.4 ms round trip,    0.0 to send,  257.4 for echo,    2.2 for decoding of    90000 pixels,    11638 bytes sent over the network in  255.2 ms,  255.2 ms for echo
13,000 record_latency: took  269.0 ms round trip,    0.0 to send,  269.0 for echo,    1.6 for decoding of    90000 pixels,    11353 bytes sent over the network in  267.4 ms,  267.4 ms for echo
13,013 record_latency: took  269.8 ms round trip,    0.2 to send,  269.6 for echo,    1.8 for decoding of    90000 pixels,    11244 bytes sent over the network in  268.0 ms,  267.8 ms for echo
13,025 record_latency: took  269.7 ms round trip,    0.0 to send,  269.7 for echo,    3.0 for decoding of    90000 pixels,    11422 bytes sent over the network in  266.7 ms,  266.7 ms for echo
13,045 record_latency: took  278.0 ms round trip,    0.0 to send,  277.9 for echo,    2.0 for decoding of    90000 pixels,    11595 bytes sent over the network in  276.0 ms,  275.9 ms for echo
13,052 record_latency: took  272.1 ms round trip,    0.0 to send,  272.1 for echo,    1.5 for decoding of    90000 pixels,     4656 bytes sent over the network in  270.6 ms,  270.6 ms for echo
13,053 record_latency: took  261.3 ms round trip,    0.0 to send,  261.3 for echo,    1.6 for decoding of    90000 pixels,     4678 bytes sent over the network in  259.7 ms,  259.7 ms for echo
13,063 record_latency: took  256.9 ms round trip,    0.0 to send,  256.9 for echo,    2.2 for decoding of    90000 pixels,     4674 bytes sent over the network in  254.7 ms,  254.7 ms for echo
13,071 record_latency: took  252.9 ms round trip,    0.0 to send,  252.8 for echo,    2.3 for decoding of    90000 pixels,     4678 bytes sent over the network in  250.6 ms,  250.5 ms for echo
13,075 compress:   3.5ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence  1514, client_options={}
13,088 record_latency: took  229.2 ms round trip,    0.0 to send,  229.2 for echo,    2.7 for decoding of    90000 pixels,     4631 bytes sent over the network in  226.5 ms,  226.5 ms for echo
13,088 record_latency: took  152.2 ms round trip,    0.1 to send,  152.1 for echo,    2.3 for decoding of    90000 pixels,     4626 bytes sent over the network in  149.9 ms,  149.8 ms for echo
13,117 record_latency: took   42.2 ms round trip,    0.0 to send,   42.2 for echo,    1.3 for decoding of    90000 pixels,     4679 bytes sent over the network in   40.9 ms,   40.9 ms for echo
13,150 compress:   3.8ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence  1515, client_options={}
2021-11-02 13:11:13,193 compress:   3.7ms for  300x300  pixels at    0,0    for wid=3     using      jpeg with ratio   1.3%  (  351KB to     4KB), sequence  1516, client_options={}
13,205 record_latency: took   54.9 ms round trip,    0.1 to send,   54.8 for echo,    7.2 for decoding of    90000 pixels,     4595 bytes sent over the network in   47.7 ms,   47.6 ms for echo

So we get a dozen delayed acks with ~260ms round trip latency, the next compress packet comes out after 140ms rather than the usual ~12ms.

That's enough to fix the problems and the latency drops back down to 50ms, before creeping back up again and repeating this pattern.

Doing the same experiment with ~1080p glxgears:

32,441 compress:  16.8ms for 1920x906  pixels at    0,0    for wid=4     using      jpeg with ratio   1.1%  ( 6795KB to    72KB), sequence  2205, client_options={}
32,472 compress:  21.7ms for 1920x906  pixels at    0,0    for wid=4     using      jpeg with ratio   1.1%  ( 6795KB to    72KB), sequence  2206, client_options={}
32,472 record_latency: took   65.5 ms round trip,    0.1 to send,   65.3 for echo,   15.9 for decoding of  1739520 pixels,    74387 bytes sent over the network in   49.6 ms,   49.4 ms for echo
32,498 compress:  18.3ms for 1920x906  pixels at    0,0    for wid=4     using      jpeg with ratio   0.6%  ( 6795KB to    43KB), sequence  2207, client_options={}
32,504 record_latency: took   62.4 ms round trip,    0.1 to send,   62.2 for echo,   20.8 for decoding of  1739520 pixels,    74193 bytes sent over the network in   41.6 ms,   41.4 ms for echo

So the baseline in this case is 40ms to 140ms latency (huge variation already) and the frames are roughly 30ms apart.
Once the bandwidth contention is introduced:

56,409 compress:  17.8ms for 1920x906  pixels at    0,0    for wid=5     using      jpeg with ratio   0.5%  ( 6795KB to    35KB), sequence  1016, client_options={}
56,640 compress:  17.8ms for 1920x906  pixels at    0,0    for wid=5     using      jpeg with ratio   0.5%  ( 6795KB to    35KB), sequence  1017, client_options={}
56,713 record_latency: took  575.8 ms round trip,    0.1 to send,  575.7 for echo,   19.0 for decoding of  1739520 pixels,    36354 bytes sent over the network in  556.8 ms,  556.7 ms for echo
56,918 compress:  16.8ms for 1920x906  pixels at    0,0    for wid=5     using      jpeg with ratio   0.5%  ( 6795KB to    35KB), sequence  1018, client_options={}
56,977 record_latency: took  568.1 ms round trip,    0.0 to send,  568.1 for echo,   14.9 for decoding of  1739520 pixels,    36470 bytes sent over the network in  553.2 ms,  553.2 ms for echo

Transferring a file in the opposite direction - potentially competing with the ACK packets has no noticeable effect.


Questions:

  • although 200ms extra latency isn't a big difference (300x300 case), shouldn't we be handling this better?
  • why is the baseline latency varying so much at ~1080p? (not seen with the python client?)
  • should we be spending the extra time compressing more? (lowering the encoding speed much faster): 20ms to compress frames that are actually ~100ms apart!
  • is the glxgears framerare (~10ms per frame) too fast for the client's display?
  • for some strange reason, I am seeing png frames for the xterm:
2021-11-02 13:40:56,981 pillow.encode('png', XShmImageWrapper(BGRX: 0, 0, 499, 316), 56, 36, True, False, None)
  • how is this affected by XPRA_ACK_JITTER=0 XPRA_ACK_TOLERANCE=0 XPRA_MAX_SOFT_EXPIRED=0, wifi adapter detection, etc

@totaam
Copy link
Collaborator Author

totaam commented Nov 3, 2021

Notes and more questions than answers:

  • after repeated tests, the bandwidth on my ~5 meter 802.11g link is just 22Mbps, not the 54Mbps it is meant to be capable of
  • setting XPRA_ACK_JITTER=0 and XPRA_ACK_TOLERANCE=0 helps to prevent latency spikes but it also reduces the framerate
  • reducing the speed and quality whenever there is any packet backlog helps a lot but the backlog calculations are a little bit expensive to do and should be done once per call to send_delayed_regions (and without updating the baseline which should be updating more slowly?)
  • we should sacrifice more quality to keep the framerate higher when handling video content - I am getting only 10fps at 1080p with nvjpeg and the env flags above
  • setting a lower min-quality and min-speed helps, preventing the quality from going too high helps more
  • video codecs lower the quality for the video region but this does not happen for the html5 client in jpeg mode (could be added easily for nvjpeg, harder for software jpeg which doesn't use the video code path)
  • the browser's connection is detected as ether even though it is actually a wifi adapter..
  • the browser may fail to enable webp over slow links: decode images in worker xpra-html5#20 (comment) (which can actually be a good thing... since webp frames can be lossless and bigger)
  • should we try to maintain the gap between frames in the client? (we overestimate the framerate to ensure we don't drop frames - which can be detrimental?)
  • vertical refresh is not always detected with the html5 client, delay the hello packet by 100ms to ensure we get a value?
  • it is trivial to record how busy the encode_loop is, but this is not a good metric because screen updates come and go - how can we ensure we lower the speed to keep the encode thread more busy?
  • client_latency statistics include time spent writing to the socket - should it? (big packets will skew the value)
  • perhaps log some TCP_INFO to see if we can reliably detect congestion? detect slow send buffer draining pace using TCP_INFO #2421
  • watching the same video over and over again, the bandwidth problems seem to occur at the same spots: these particular frames compress less well and use 3 times the bandwidth, causing the frames following them to be delayed, this is visible as a stutter
  • the lossless threshold logic is duplicated between window_source and the webp encoder.. (so even when dropping the quality down to 85 because of a 1 packet backlog, the webp encoder may choose to use lossless)
  • it's not clear to me if nvjpeg does honour nvjpegChromaSubsampling_t with RGBI input or not
  • although it's quite obvious to a human observer when the latency values are going south, implementing this in the code is much harder
  • I have seen the decoding latency go up quite high (150ms or more), usually following network latency going up - is this because multiple images are queued for decoding close together? Would disabling the decode worker make any difference?
  • pre-mapping windows server side would mitigate latency issues for popup windows (drop down menus, etc)

Log samples:

  • webp kills the bandwidth with a 900KB update:
2021-11-03 20:47:20,575 compress:   9.0ms for 1920x563  pixels at    0,262  for wid=2     using      jpeg with ratio   3.1%  ( 4222KB to   130KB), sequence    11, client_options={}
2021-11-03 20:47:20,797 compress:  83.4ms for 1920x906  pixels at    0,0    for wid=2     using      webp with ratio  13.1%  ( 6795KB to   891KB), sequence    12, client_options={'rgb_format': 'BGRX', 'quality': 100}

The very first frame is also webp at 900KB. Perhaps we should start lower, that used to be the case.

  • despite some signs that the connection is struggling, we end up using quality=100, and even with jpeg this is costly:
2021-11-03 20:47:23,932 compress:  19.4ms for 1920x563  pixels at    0,262  for wid=2     using      jpeg with ratio  14.4%  ( 4222KB to   606KB), sequence    23, client_options={}
  • rgb24 taking too long to parse? Is this just a fluke?
2021-11-03 21:10:57,675 latency:  160.6ms for     18962 pixels :         0.1 send     +  147.9 network     +   12.6 decoding                 7KB,  sequence    24
2021-11-03 21:10:57,676 compress:   0.1ms for    6x13   pixels at   17,301  for wid=1     using     rgb24 with ratio  75.0%  (    0KB to     0KB), sequence    25, client_options={'rgb_format': 'RGB'}
2021-11-03 21:10:57,979 compress:   2.1ms for  499x15   pixels at    0,301  for wid=1     using      webp with ratio  26.1%  (   29KB to     7KB), sequence    27, client_options={'rgb_format': 'BGRX', 'subsampling': 'YUV420P', 'quality': 6}
2021-11-03 21:10:58,003 latency:  325.6ms for        78 pixels :         0.1 send     +    6.5 network     +  319.1 decoding                 0KB,  sequence    25
2021-11-03 21:10:58,006 latency:   26.1ms for      7485 pixels :         0.2 send     +   22.8 network     +    3.1 decoding                 7KB,  sequence    26

Since we convert it back to rgb32, perhaps just rely on the lz4 stream compressor instead?

totaam added a commit that referenced this issue Nov 8, 2021
and make it configurable via an env var
totaam added a commit that referenced this issue Nov 8, 2021
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
we can use more specialized ones if we have the content type hint
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
* always do rgb threshold first
* prefer jpeg / web for lossy
* avoid png
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
we adjust speed and quality earlier if there is congestion
totaam added a commit that referenced this issue Nov 9, 2021
just use the get_auto_encoding() method which works well instead of partially re-implementing it wrong
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
above ~0.5MPixels, it takes too long to compress using webp
totaam added a commit that referenced this issue Nov 9, 2021
totaam added a commit that referenced this issue Nov 9, 2021
@totaam
Copy link
Collaborator Author

totaam commented Nov 20, 2021

#3357 would help with downscaling performance.

totaam added a commit that referenced this issue Nov 21, 2021
totaam added a commit that referenced this issue Nov 21, 2021
totaam added a commit that referenced this issue Nov 22, 2021
totaam added a commit that referenced this issue Nov 22, 2021
so all the picture codecs can be called generically with the same first 4 arguments: coding, image, quality, speed
totaam added a commit that referenced this issue Nov 22, 2021
totaam added a commit that referenced this issue Nov 22, 2021
totaam added a commit that referenced this issue Nov 22, 2021
totaam added a commit that referenced this issue Nov 23, 2021
@totaam
Copy link
Collaborator Author

totaam commented Nov 23, 2021

Many more updates, not all of which have been tagged with this ticket number, in particular 8302076 (the blame goes to git --amend).

totaam added a commit that referenced this issue Nov 24, 2021
also introspect the list of encodings each encoder supports
totaam added a commit that referenced this issue Nov 25, 2021
"normalize" the values so we stretch the low end of the scale
totaam added a commit that referenced this issue Nov 26, 2021
at quality 100, the picture is not subsampled and almost lossless
totaam added a commit that referenced this issue Nov 27, 2021
so don't remove it from the list of 'non-video' encodings,
also remove the 'nvjpeg' special case,
improve debug logging
totaam added a commit that referenced this issue Nov 28, 2021
make it possible to run these codecs as a command line script for diagnostics
totaam added a commit that referenced this issue Nov 28, 2021
totaam added a commit that referenced this issue Nov 28, 2021
totaam added a commit that referenced this issue Nov 28, 2021
totaam added a commit that referenced this issue Nov 28, 2021
the new codec interface does not have the device context so pass it using the options dictionary
totaam added a commit that referenced this issue Dec 7, 2021
so the get_encoding methods don't need quality or speed as arguments,
saves extracting those values from the options
@totaam
Copy link
Collaborator Author

totaam commented Jan 12, 2022

Closing in favour of #3376 and #2421

@totaam totaam closed this as completed Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working network
Projects
None yet
Development

No branches or pull requests

1 participant