Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpra over high latency links batches too much #153

Closed
totaam opened this issue Jun 26, 2012 · 25 comments
Closed

xpra over high latency links batches too much #153

totaam opened this issue Jun 26, 2012 · 25 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Jun 26, 2012

Issue migrated from trac ticket # 153

component: core | priority: major | resolution: fixed

2012-06-26 15:52:19: antoine created the issue


As per this mailing list post, it seems that the damage sequence (which we use to see how far behind the client is) is causing the damage batch delay to go up when it is not necessary. This causes the picture latency to go up.

The only alternative explanation is that somehow the "damage-sequence" packet is having an adverse effect, which is very unlikely.
To prove this, simply apply this patch to trunk:

--- src/xpra/server_source.py	(revision 975)
+++ src/xpra/server_source.py	(working copy)
@@ -248,7 +248,7 @@
             if self._damage_data_queue.qsize()>3:
                 #contains pixmaps before they get converted to a packet that goes to the damage_packet_queue
                 update_batch_delay("damage data queue overflow: %s" % self._damage_data_queue.qsize(), logp10(self._damage_data_queue.qsize()-2))
-        if not last_delta:
+        if True:
             return
         #figure out how many pixels behind we are, rather than just the number of packets
         all_unprocessed = list(self._damage_packet_sizes)[-delta:]

If the problem goes away, then it is clearly our batching decision making (the part that is based on client feedback) which is wrong, as I suspect it is.


The problem is that although the behaviour is suboptimal over high latency links, I do not know which application to test or what bandwidth limitations to apply to simulate the link in question (2Mbps down 512Bpbs up?).

Using trickle, here is what I tested with:

  • Server:
xpra --no-daemon --bind-tcp=0.0.0.0:10000 start :10
  • An app that will use lots of bandwidth (enlarge the window to use more):
DISPLAY=:10 glxgears
  • The throttled client:
trickle -d 256 -u 64 -s -L 300 xpra attach tcp:192.168.42.100:10000 --no-mmap

We try to converge on the best batch delay possible given the limited bandwidth and it oscillates between 20ms and 400ms, clearly a far too wide range.
I think that is because of buffering and "smoothing" over time, which allows lots of packets to go through (and we reduce the batch delay) then when we hit the limit it starts delaying them (and we increase it). Problem is that each of these decisions is delayed, so they it struggles to cope with the changing conditions and keeps oscillating. Slowing down the rate of change would definitely help in this particular case, but not in others... It might be worth taking into account the connection latency to reduce this rate of change.

Thoughts welcome.

@totaam
Copy link
Collaborator Author

totaam commented Jun 26, 2012

2012-06-26 18:36:51: antoine commented


Another area where we ought to make some improvements is the scheduling of calculate_batch_delay: at the moment, it fires no more than 40 times per second and it will not fire unless there are screen updates.

The problem with this is that if there is a period of inactivity, it won't run at all... It might be better to schedule it with a value derived from the current batch delay, with an increasing delay if no new frames come in (so as to not wake up the server too often unnecessarily).

The other thing is that when we modify this batch delay, we ought to take into account how many times the current value was used, and when it was used. Some sort of moving average would be relevant too: the further we move away from it, the slower the change.

Finally, let's try to calculate the link latency: record the send time in damage_packet_sent and compare with the receive time _process_damage_sequence (minus the decode_time which we have)

@totaam
Copy link
Collaborator Author

totaam commented Jun 26, 2012

2012-06-26 20:33:05: antoine changed status from new to accepted

@totaam
Copy link
Collaborator Author

totaam commented Jun 26, 2012

2012-06-26 20:33:05: antoine commented


Some really hackish code to try to avoid this oscillation of the batch delay.
First we keep historical records of the size of the damage_data_queue, and try to stop increasing the delay so quickly as soon as things are starting to get better (and not just when it is fully back to normal).
Also, when lowering the batch delay, we take into account a weighted moving average to try to avoid lowering it too quickly and too far.

This does seem to allow it to narrow down on the correct value better - without oscillation, a the cost of taking longer to lower it... Not ready for merging as it is, but something like it should get applied eventually.

Still TODO:

  • the high latency will cause the client to be a few packets behind (really depends on the ratio of latency to bandwidth), and so we should ignore the packets in transit (using the latency we measure) - this only works now because of the return statement at the end of the if self._damage_data_queue.qsize()>2: block.

Please give feedback!

--- src/xpra/server_source.py	(revision 975)
+++ src/xpra/server_source.py	(working copy)
@@ -123,6 +123,7 @@
         self._mmap_bytes_sent = 0
         protocol.source = self
         self._damage_data_queue = Queue()
+        self._damage_data_queue_sizes = maxdeque(100)
         self._damage_packet_queue = Queue()
 
         self._closed = False
@@ -232,22 +233,55 @@
             self.batch_configs[wid] = batch
         return batch
 
-    def calculate_batch_delay(self, wid, batch):
+    def calculate_batch_delay(self, wid, window, batch):
         def update_batch_delay(reason, factor=1, delta=0):
-            batch.delay = max(batch.min_delay, min(batch.max_delay, int(100.0*batch.delay*factor)/100.0)-delta)
+            new_delay = max(batch.min_delay, min(batch.max_delay, int(100.0*batch.delay*factor)/100.0)-delta)
+            if len(batch.last_delays)>0:
+                #get the weighted average:
+                tv, tw = 0.0, 0.0
+                now = time.time()
+                for when, delay in batch.last_delays:
+                    #newer matter more, and even more so when factor is high
+                    w = 1.0/(0.1+(now-when)*factor)
+                    tv += delay*w
+                    tw += w
+                avg = tv / tw
+                log.info("update_batch_delay: average=%s, weighted average=%s, new_delay=%s", sum([x for _,x in batch.last_delays])/len(batch.last_delays), avg, new_delay)
+                if new_delay<avg:
+                    new_delay = (3*new_delay+avg)/4
+            batch.delay = new_delay
             batch.last_updated = time.time()
-            log("update_batch_delay: %s, wid=%s, factor=%s, delta=%s, new batch delay=%s", reason, wid, factor, delta, batch.delay)
+            log.info("update_batch_delay: %s, wid=%s, factor=%s, delta=%s, new batch delay=%s", reason, wid, factor, delta, batch.delay)
 
         last_delta = self.last_client_delta
         delta = self._damage_packet_sequence-self.last_client_packet_sequence
         self.last_client_delta = delta
         if True:
-            if self._damage_packet_queue.qsize()>3:
+            if self._damage_packet_queue.qsize()>2:
                 #packets ready for sending by network layer
                 update_batch_delay("damage packet queue overflow: %s" % self._damage_packet_queue.qsize(), logp2(self._damage_packet_queue.qsize()-2))
-            if self._damage_data_queue.qsize()>3:
+                return
+            if self._damage_data_queue.qsize()>2:
                 #contains pixmaps before they get converted to a packet that goes to the damage_packet_queue
-                update_batch_delay("damage data queue overflow: %s" % self._damage_data_queue.qsize(), logp10(self._damage_data_queue.qsize()-2))
+                qsize = self._damage_data_queue.qsize()
+                factor = logp10(qsize)
+                if len(self._damage_data_queue_sizes)>0:
+                    last_10 = list(self._damage_data_queue_sizes)[-10:]
+                    if max(last_10)==qsize:
+                        log.info("damage data queue overflow: %s worst of %s" % (qsize, last_10))
+                        #it's the worst it's been recently, ouch increase faster
+                        factor += 0.2
+                    elif min(last_10)==qsize:
+                        log.info("damage data queue overflow: %s best of %s" % (qsize, last_10))
+                        #it's the best it's been recently, leave it alone
+                        factor = 1.0
+                    elif len(last_10)>4 and last_10[-1]>qsize and last_10[-2]>qsize:
+                        log.info("damage data queue overflow: %s improving: %s" % (qsize, last_10))
+                        factor = (factor+2)/3.0
+                    else:
+                        log.info("historical=%s, last_10=%s, max=%s, min=%s, qsize=%s" % (self._damage_data_queue_sizes, last_10, max(last_10), min(last_10), qsize))
+                update_batch_delay("damage data queue overflow: %s" % self._damage_data_queue.qsize(), factor)
+                return
         if not last_delta:
             return
         #figure out how many pixels behind we are, rather than just the number of packets
@@ -262,19 +296,21 @@
         last_pixels_behind = sum(last_unprocessed)
         log("calculate_batch_delay: wid=%s, unprocessed=%s, last_unprocessed=%s, pixels_behind=%s, last_pixels_behind=%s", wid, unprocessed, last_unprocessed, pixels_behind, last_pixels_behind)
         if packets_due<=2 and last_packets_due<=2:
-            return update_batch_delay("client is up to date: %s packets in queue" % packets_due, 0.9, 1)
+            return update_batch_delay("client is up to date: %s packets in queue" % packets_due, (7+packets_due)/10.0, 1)
         if packets_due<last_packets_due and pixels_behind<last_pixels_behind:
-            return update_batch_delay("client is catching up: down from %s to %s packets in queue, %s to %s pixels due" % (last_packets_due, packets_due, last_pixels_behind, pixels_behind), 0.9, 1)
-        low_limit, high_limit = 8*1024, 32*1024
+            return update_batch_delay("client is catching up: down from %s to %s packets in queue, %s to %s pixels due" % (last_packets_due, packets_due, last_pixels_behind, pixels_behind), (9-min(2,last_packets_due-packets_due))/10.0, 1)
+        ww, wh = self.get_window_dimensions(window)
+        low_limit, high_limit = ww*wh, ww*wh*4
         if self._mmap and self._mmap_size>0:
-            low_limit, high_limit = 256*1024, 1*1024*1024
+            low_limit *= 4
+            high_limit *= 4
         #things are getting worse or unchanged:
         if pixels_behind<=low_limit:
             if packets_due<=5:
                 return update_batch_delay("client is only %s pixels and only %s packets behind" % (pixels_behind, packets_due), 0.8, 1)
             if packets_due<10 and packets_due<last_packets_due:
                 return update_batch_delay("client is only %s pixels and %s packets behind" % (pixels_behind, packets_due), 0.9, 0)
-        if pixels_behind>=high_limit:
+        if pixels_behind>high_limit:
             return update_batch_delay("client is %s pixels behind!" % pixels_behind, min(1.5, logp2(1.0*pixels_behind/high_limit)))
         if pixels_behind<last_pixels_behind:
             #things are getting better:
@@ -341,7 +377,7 @@
             pixmap = self.get_window_pixmap(wid, window, self._sequence)
             if pixmap:
                 self._process_damage_region(now, pixmap, wid, x, y, w, h, coding, self._sequence, options)
-                batch.last_delays.append(0)
+                batch.last_delays.append((now, 0))
         #record this damage event in the damage_last_events queue:
         now = time.time()
         last_events = self._damage_last_events.setdefault(wid, maxdeque(100))
@@ -360,7 +396,7 @@
             return
 
         if batch.last_updated+0.025<now:
-            self.calculate_batch_delay(wid, batch)
+            self.calculate_batch_delay(wid, window, batch)
         event_min_time = now-batch.time_unit
         all_pixels = [pixels for event_time,pixels in last_events if event_time>event_min_time]
         beyond_limit = len(all_pixels)>batch.max_events or sum(all_pixels)>batch.max_pixels
@@ -385,7 +421,7 @@
                 log("window %s already removed from delayed list?", wid)
             return False
         log("damage(%s, %s, %s, %s, %s) scheduling batching expiry for sequence %s in %sms", wid, x, y, w, h, self._sequence, batch.delay)
-        batch.last_delays.append(batch.delay)
+        batch.last_delays.append((now, batch.delay))
         gobject.timeout_add(int(batch.delay), send_delayed)
 
     def send_delayed_regions(self, damage_time, wid, window, damage, coding, sequence, options):
@@ -446,6 +482,7 @@
         if data:
             log("process_damage_regions: adding pixel data %s to queue, elapsed time=%s, queue size=%s", data[:6], int(1000*(time.time()-damage_time)), self._damage_data_queue.qsize())
             self._damage_data_queue.put(data)
+            self._damage_data_queue_sizes.append(self._damage_data_queue.qsize())
 
     def data_to_packet(self):
         while not self._closed:
Index: src/xpra/server.py
===================================================================
--- src/xpra/server.py	(revision 975)
+++ src/xpra/server.py	(working copy)
@@ -1003,7 +1003,7 @@
         for wid in self._id_to_window.keys():
             batch = source.get_batch_config(wid)
             if batch:
-                for d in batch.last_delays:
+                for _,d in batch.last_delays:
                     batch_delays.append(d)
         if len(batch_delays)>0:
             info["min_batch_delay"] = int(max(batch_delays))

@totaam
Copy link
Collaborator Author

totaam commented Jun 27, 2012

2012-06-27 03:22:21: mebourne commented


Link capacity, just tested with scp on a large file.
Upload bandwidth: mean 152.7KB/s
Download bandwidth: mean 590.9KB/s
Idle RTT: min 301ms
Uplink saturated RTT: max 785ms
Downlink saturated RTT: max 454ms

I don't think glxgears is at all a useful application to test with for this behaviour. I'm not interested in frame rate or throughput here but in interactivity, which is mostly defined by the latency from an action (keypress/mouse) to the reaction (relevant screen update). The apps I'm using are as per the original post: emacs, xterm, evolution, firefox, rdesktop

A good demonstration of the interactivity problems is running an emacs side by side over old and new xpra. On the old one every cursor movement is reflected close to immediately, if I hold down the cursor key I see the cursor move through every character as you would locally and when I let go it stops about where expected. On vanilla 0.4.0 it batches and so the cursor jumps many characters at a time in a very jerky movement and when I let go I'm not sure when it has stopped moving or is just pausing before the next update. Editing code in such conditions is challenging.

Another demonstration is clicking with a mouse to open a menu. It should be close to immediate but can take seconds on vanilla 0.4.0 which kills usability.

On to your changes:
The one line change didn't seem to help much. I didn't test it against vanilla 0.4.0 today but it was painful to use, cursor was very jerky and applications took up to 5 seconds to respond.

On the other hand the larger patch is excellent. It's really made a huge difference, cursor movement is smooth and the GUI applications seem quite responsive. It's early days but I'll continue to use this and see how I get on.

Redraw is noticeably different from the old version, scrolling in emacs used to paint the screen from the top to the bottom, now the screen updates as a block but after a delay. This makes it look more like emacs running locally but reduces the immediacy of the feedback and doesn't seem to affect overall redraw time (that is difficult to measure but I tried using a stopwatch and paging down 10 times pressing each key as soon as the screen was updated).

Thanks for your help and super quick response!

@totaam
Copy link
Collaborator Author

totaam commented Jun 30, 2012

2012-06-30 09:28:19: antoine commented


r982 is a more proper fix for this (details in commit message), please let me know if this fixes it for you so I can close this ticket. I will follow up with some more improvements in #155.

There were many problems with the old code:

  • increasing the batching too fast and too far (and often, too late!)
  • not noticing when the windows stopped updating (and the batching delay should have been cut)
  • oscillation due to using instant values rather than averages
    etc..

FYI: glxgears is a great way of testing latency/bandwidth limited links as it allows you to vary the bandwidth usage simply by resizing the window. As you ask xpra to push more pixels through a limited pipe, eventually reaching capacity, it needs to start adjusting the batch delay upwards. This was the easiest way for me to reproduce your latency issues.

The all important xterm "while true; do dmesg; done" test, which is part of the test suite, now runs fine too!

@totaam
Copy link
Collaborator Author

totaam commented Jul 5, 2012

2012-07-05 19:39:45: antoine commented


Much better/cleaner code in r1046 - see commit message for lots of details.

@totaam
Copy link
Collaborator Author

totaam commented Jul 9, 2012

2012-07-09 07:58:48: mebourne commented


Ran with r982 all last week, it worked well. Running with r1060 this week.

@totaam
Copy link
Collaborator Author

totaam commented Jul 9, 2012

2012-07-09 19:40:59: antoine commented


r1063 Removes all the nasty hardcoded constants and instead relies on simple arithmetic.

Overall, this works much better than r1046 in almost all cases (see spreadsheet of test results), and although there are some corner cases regressions compared to 0.3.x, these will be dealt with separately if possible. The important metric here is that we seem to manage to prevent latency from going too high (except with png encoding and pathological tests..)

Note: it may not be possible to satisfy all scenarios with heuristics alone.

@totaam
Copy link
Collaborator Author

totaam commented Jul 9, 2012

2012-07-09 19:42:20: antoine uploaded file 0.3.3-vs-0.4.0rc7.csv (28.0 KiB)

test results comparing 0.3 branch, 0.4.0 (circa r1061) and 0.4.0-rc7 (aka r1063)

@totaam
Copy link
Collaborator Author

totaam commented Jul 10, 2012

2012-07-10 04:33:34: mebourne commented


Ok thanks; running that now, with debug_delay enabled. I am using png encoding as jpg did not give sufficient text quality unless on a very high setting after which it seemed to give no performance improvement. I use xterm/emacs a lot with a quite small bitmap font on a high res display so need good text reproduction. Hopefully my test is not too pathological.

@totaam
Copy link
Collaborator Author

totaam commented Jul 10, 2012

2012-07-10 07:50:48: antoine commented


Just be aware that whenever the debug fires (every 30s or when there are more than 1000 records to print, whichever comes first), it will cause some lag...

@totaam
Copy link
Collaborator Author

totaam commented Jul 12, 2012

2012-07-12 08:22:40: mebourne commented


r1063 didn't turn out so well. I ran it with the logging for a day, it was quite sluggish to use. Unfortunately when I stopped the logging it didn't really improve. I had to revert back to r1062 on the server and it was fine then.

I have a full day's debug_delay log if it helps you at all, though it's 380MB (65MB bzipped) so if you want me to preprocess it to get the interesting stuff let me know by email.

@totaam
Copy link
Collaborator Author

totaam commented Jul 13, 2012

2012-07-13 21:41:40: antoine commented


r1098 should improve things for you - whether that is now as good as r1062... I don't know (my automated tests do not capture your typical usage apparently - I wished they did, ideas welcome!)

The one problematic factor is "damage data queue size", it weighs more because items in this queue will have to be processed by all the other queues and then sent out, unfortunately it only counts items and not how many pixels are in them... Something needs to be done to improve this.

ok, here are some of the most interesting log extracts, chosen at tipping points: before or after a jump/drop in batch delay.

I found the location of spikes by looking for the new batch delay value in the log extracts with:

cat xpra-[r1064](../commit/991b95cc8de771666a4ecaf9d7564e67dddcd7fe)-ab  | awk -F "new delay=" '{print $2}' | awk '{print $1}' | more
update_batch_delay: wid=9, change factor=44.2%, delay min=49.0, avg=64.3, max=99.7, cur=82.7, w. average=72.7, tot wgt=32.1, hist_w=15.0, new delay=119.4 --
[  ('damage processing latency: avg=17.4, recent=17.0', 1.009869654093447, 0.9993499058538527),
   ('damage send latency: avg=19.9, recent=19.1', 0.9988767378654252, 1.0011502526907246),
   ('network send speed: avg=189347, recent=177379 (KBytes/s)', 1.04786777295379, 1.04786777295379),
   ('client decode speed: avg=11.6, recent=11.6 (MPixels/s)', 0.5533322396555136, 1.0),
   ['delay not updated for 135.0 ms (skipped 3 times)', 0, 0.4617549787366009],
   ('client latency: avg=331.2, recent=334.0', 1.397990316854844, 1.0024348667059213),
   ['damage packet queue size: avg=1.77680196624, recent=1.72500583591, target=1, div=1', 1.3122801851043002, 0.4852180911832026],
   ['damage packet queue pixels: avg=34526.3476559, recent=36544.7159551, target=1, div=1089066.0', 0.03191879574328046, 1.0415651555598613],
   ['damage data queue: avg=3.70385296869, recent=5.5146282146, target=0, div=1', 3.7036828475176695, 4.985079509269735],
   ['client has some ACKs pending: 1', 1.0, 0],
   ['packets/pixels backlog from 0/0 to 0/0', 0.0, 0.5]  ]

That's a 44% increase in the delay, mostly due to the backlog in the damage data queue.

update_batch_delay: wid=3, change factor=65.8%, delay min=49.0, avg=117.9, max=320.1, cur=537.0, w. average=178.6, tot wgt=33.0, hist_w=14.6, new delay=890.4 --
[  ('damage processing latency: avg=33.6, recent=45.0', 1.04877707321461, 1.0156410027957719),
   ('damage send latency: avg=46.8, recent=64.0', 1.059864475384888, 1.0233804662676518),
   ('network send speed: avg=95790, recent=54358 (KBytes/s)', 1.4658212520431309, 1.4658212520431309),
   ('client decode speed: avg=22.6, recent=21.6 (MPixels/s)', 0.8991863663126225, 1.288416026946962),
   ['delay not updated for 56.9 ms (skipped 1 times)', 0, 0.0],
   ('client latency: avg=323.2, recent=324.8', 1.3881096435205884, 1.0014330630396993),
   ['damage packet queue size: avg=5.91231821803, recent=8.22406320798, target=1, div=1', 2.768934254729569, 0.6396246761650424], 
   ['damage packet queue pixels: avg=87764.8947987, recent=128303.007021, target=1, div=800000.0', 0.1465504231143261, 1.2997704145088427],
   ['damage data queue: avg=8.4125154456, recent=12.3757953354, target=0, div=1', 4.741552772744084, 6.031202546394014],
   ['client has some ACKs pending: 34', 1.0, 0],
   ['packets/pixels backlog from 1/11 to 0/0', 0.0, 0.0] ]

That's a 65% increase, again mostly due to the backlog in the data queue, but also in the packet queue..

update_batch_delay: wid=5, change factor=-4.5%, delay min=719.3, avg=1511.4, max=5267.7, cur=778.4, w. average=901.7, tot wgt=13.7, hist_w=5.7, new delay=742.9 --
[  ('damage processing latency: avg=10.1, recent=10.9', 1.0012974014642182, 1.001085368146258),
   ('damage send latency: avg=12.5, recent=12.9', 0.9901946553188158, 0.9994942119306733),
   ('network send speed: avg=142411, recent=108396 (KBytes/s)', 1.2102694480959981, 1.2102694480959981),
   ('client decode speed: avg=6.2, recent=6.2 (MPixels/s)', 0.3239169543398469, 0.999999083426363),
   ['delay not updated for 1085.5 ms (skipped 27 times)', 0, 4.671985090122343],
   ('client latency: avg=720.9, recent=918.6', 1.919049293419958, 1.1132563179902775),
   ['damage packet queue size: avg=1.85173179032, recent=1.59141169452, target=1, div=1', 1.2595885423108755, 0.4269648367628348], 
   ['damage packet queue pixels: avg=17460.8428057, recent=21152.9546567, target=1, div=124098.0', 0.15528091512926995, 1.1449971957849034],
   ['damage data queue: avg=0.658772944417, recent=0.764245320806, target=0, div=1', 1.8190511836295682, 2.883258558416875],
   ['client has some ACKs pending: 5', 1.0, 0],
   ['packets/pixels backlog from 0/0 to 1/11', 0.5, 0.7924812503605781]]

The delay should decrease faster here..

@totaam
Copy link
Collaborator Author

totaam commented Jul 15, 2012

2012-07-15 12:04:45: antoine uploaded file 0.3.3-rc1-vs-0.4.0-rc9.csv (19.5 KiB)

test results comparing 0.3 branch and 0.4.0-rc9 (aka r1100)

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2012

2012-07-18 20:07:12: antoine commented


more improvements in r1130

@totaam
Copy link
Collaborator Author

totaam commented Jul 21, 2012

2012-07-21 19:25:27: antoine commented


Many more fixes in r1144 (see changelog)

@totaam
Copy link
Collaborator Author

totaam commented Jul 23, 2012

2012-07-23 18:10:24: antoine changed status from accepted to closed

@totaam
Copy link
Collaborator Author

totaam commented Jul 23, 2012

2012-07-23 18:10:24: antoine changed resolution from ** to fixed

@totaam
Copy link
Collaborator Author

totaam commented Jul 23, 2012

2012-07-23 18:10:24: antoine commented


I believe this is complete.

Feel free to re-open if necessary.

@totaam totaam closed this as completed Jul 23, 2012
@totaam
Copy link
Collaborator Author

totaam commented Jul 27, 2012

2012-07-27 07:01:23: antoine commented


Found some problems still...

  • r1181 - general improvements
  • r1191 - bug fix where we miscalculated a factor by dividing the number of pixels from all windows by the window being evaluated (meaningless and plain wrong!) and also some improvements/cleanups

@totaam
Copy link
Collaborator Author

totaam commented Jul 27, 2012

2012-07-27 15:21:29: antoine commented


I will probably also backport part of r1195 which caused delay to stay high when the average used in calculations is close to zero..

@totaam
Copy link
Collaborator Author

totaam commented Jul 27, 2012

2012-07-27 15:21:29: antoine

@totaam
Copy link
Collaborator Author

totaam commented Aug 2, 2012

2012-08-02 20:05:12: aradtech uploaded file xpra.log (648.2 KiB)

aradtech xpra logfile with latency debugging

@totaam
Copy link
Collaborator Author

totaam commented Aug 2, 2012

2012-08-02 20:28:03: aradtech uploaded file xpra-2.zip (174.1 KiB)

second try of logging issue

@totaam
Copy link
Collaborator Author

totaam commented Aug 2, 2012

2012-08-02 21:21:33: antoine commented


The problem seems to be caused by a very large "damage processing and send latency", as seen by records like this one:

2012-08-02 12:15:35,378 update_batch_delay: wid=1, fps=100, last updated 41.99 ms ago, decay=1.0, change factor=1.4%, delay min=6.2, avg=16.4, max=41.2, cur=37.4, w. average=28.5, tot wgt=25.5, hist_w=12.7, new delay=37.9
 ('damage processing latency: avg=1129.3, recent=1661.2 avg=1.12933581412, recent=1.66122890682, target=0.05228515625, div=1.0', 2.99, 1.58)
 ('damage send latency: avg=1155.8, recent=1674.1 avg=1.15583599438, recent=1.67416776013, target=0.0757421875, div=1.0', 2.59, 1.37)
 ('network send speed: avg=177174, recent=155615 (KBytes/s), div=10.2488096084', 1.09, 0.1)
 ('client decode speed: avg=61.5, recent=61.2 (MPixels/s)', 1.0, 0.0)
 ('delay not updated for 41.2 ms (skipped 1 times - highest latency is 1674.1)', 0.0, 0.0)
 ('client latency: lowest=4.4, avg=56.5, recent=31.3 avg=0.056502563324, recent=0.0313947186921, target=0.01, div=1.0', 0.86, 0.17)
 ('damage packet queue size: avg=0.139678134796, recent=0.0549618248307, target=1.0 avg=0.139678134796, recent=0.0549618248307, target=1.0, div=1.0', 0.04, 0.96)
 ('damage packet queue pixels: avg=0.0, recent=0.0, target=1.0 avg=0.0, recent=0.0, target=1.0, div=886784', 0.0, 0.99)
 ('client packets backlog avg=0, recent=0, target=0, div=1.0', 0.0, 0.99)
 ('client pixels backlog avg=0, recent=0, target=0, div=886784', 0.0, 0.99)

2 things:

  • why? what is causing it to go so high? idle_add scheduling?
  • the batch delay should go much higher/quicker when this happens, the factors are too low

Will try to reproduce with something I can automate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant