GFX connection reliably crashes with SIGABRT #2968

metalefty · 2024-02-26T04:54:58Z

This issue is likely to be identical to the third issue I reported in #2942 (comment).

Testing the latest v0.10 branch c3cb855 on AlmaLinux 9.3.

Here's a simple program to reproduce: https://gist.github.com/metalefty/99b688ec67e5debb4d662fb6a12fc069

Connect to xrdp session from 4k monitor (I tested 3840x2160)
Start terminal emulator and maximize the window
Run the program to reproduce
- ./a.out 200 300
- adjust cols arguments to fit your window, and set rows to some big value for example 200.
xrdp process crashes with SIGABRT when the output of the program drawn

[2024-02-26T04:01:33.731+0000] [ERROR] Child 194657 terminated unexpectedly with signal SIGABRT

When trying to reconnect, xrdp process crashes SIGABRT as soon as reconnected because the same screen will be drawn again. If resize happens when reconnecting (reconnect from a different size), it gets back working. As far as I tested, this issue occurs when the session size is 4k. I can't reproduce this issue with 2560x1600 geometry. It is a smaller size than 3840x2160. It doesn't crash when the cols are less than 1/2 of the total screen width.

On FreeBSD, xrdp process doesn't crash but I get a black blank screen instead. As same as AlmaLinux, it gets back to working when I make resizing happen (moving mstsc to other monitors). It is something related to the GFX encoder.

I also tried v0.10 + cherry-pick #2962 but it doesn't affect this issue.

@Nexarian I remember you have a 4k monitor. Can you confirm if you can reproduce this issue?

The text was updated successfully, but these errors were encountered:

Nexarian · 2024-02-26T05:22:13Z

Yup, I will be able to take a look tomorrow night (~7 PM EST on 2/27 and onwards)

My first guess is this could be some sort of memory limit that we're hitting.

matt335672 · 2024-02-26T09:43:28Z

I can reproduce this with a Windows VM with a 4K screen size (even though it doesn't fit on my main monitor!)

Stack trace is:-

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140029321954880) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140029321954880) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140029321954880, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007f5b1ec42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f5b1ec287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007f5b1ec89676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f5b1eddbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007f5b1eca0cfc in malloc_printerr (str=str@entry=0x7f5b1edd97ec "malloc(): corrupted top size") at ./malloc/malloc.c:5664
#7  0x00007f5b1eca46f2 in _int_malloc (av=av@entry=0x7f5b18000030, bytes=bytes@entry=3153864) at ./malloc/malloc.c:4373
#8  0x00007f5b1eca5139 in __GI___libc_malloc (bytes=bytes@entry=3153864) at ./malloc/malloc.c:3329
#9  0x000055726193135e in xrdp_egfx_wire_to_surface2 (bulk=bulk@entry=0x0, surface_id=surface_id@entry=0, codec_id=codec_id@entry=9, codec_context_id=codec_context_id@entry=0, pixel_format=pixel_format@entry=32, 
    bitmap_data=bitmap_data@entry=0x7f5b1b302e70, bitmap_data_length=3145437) at xrdp_egfx.c:593
#10 0x00005572619136e4 in gfx_wiretosurface2 (self=self@entry=0x557265dc5bb0, bulk=bulk@entry=0x0, in_s=in_s@entry=0x7f5b1dfe39e0, enc_gfx_cmd=enc_gfx_cmd@entry=0x557265d4d960) at xrdp_encoder.c:696
#11 0x0000557261914595 in process_enc_egfx (self=0x557265dc5bb0, enc=0x557265d4d940) at xrdp_encoder.c:946
#12 0x0000557261914b8f in proc_enc_msg (arg=0x557265dc5bb0) at xrdp_encoder.c:1090
#13 0x00007f5b1ec94ac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#14 0x00007f5b1ed26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I'll try running with valgrind and see if I can pick up the initial memory violation

matt335672 · 2024-02-26T10:18:25Z

I can see what's happening and I have a fix, but it will need @Nexarian and/or @jsorg71 to work out why it works. I think we need to do this.

valgrind trace is :-

==4019== Thread 2:
==4019== Invalid write of size 1
==4019==    at 0x14915F: rfx_encode_diff_rlgr1 (rfxencode_diff_rlgr1.c:239)
==4019==    by 0x144613: rfx_pro_compose_message_region (rfxencode_compose.c:884)
==4019==    by 0x144A44: rfx_pro_compose_message_data (rfxencode_compose.c:952)
==4019==    by 0x141092: rfxcodec_encode_ex (rfxencode.c:403)
==4019==    by 0x1411C3: rfxcodec_encode (rfxencode.c:439)
==4019==    by 0x119684: gfx_wiretosurface2 (xrdp_encoder.c:677)
==4019==    by 0x11A594: process_enc_egfx (xrdp_encoder.c:946)
==4019==    by 0x11AB8E: proc_enc_msg (xrdp_encoder.c:1090)
==4019==    by 0x4ABFAC2: start_thread (pthread_create.c:442)
==4019==    by 0x4B50A03: clone (clone.S:100)
==4019==  Address 0xf52f040 is 0 bytes after a block of size 3,145,728 alloc'd
==4019==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4019==    by 0x1195D3: gfx_wiretosurface2 (xrdp_encoder.c:669)
==4019==    by 0x11A594: process_enc_egfx (xrdp_encoder.c:946)
==4019==    by 0x11AB8E: proc_enc_msg (xrdp_encoder.c:1090)
==4019==    by 0x4ABFAC2: start_thread (pthread_create.c:442)
==4019==    by 0x4B50A03: clone (clone.S:100)
==4019== 
==4019== Invalid write of size 1
==4019==    at 0x1490C5: rfx_encode_diff_rlgr1 (rfxencode_diff_rlgr1.c:238)
==4019==    by 0x144613: rfx_pro_compose_message_region (rfxencode_compose.c:884)
==4019==    by 0x144A44: rfx_pro_compose_message_data (rfxencode_compose.c:952)
==4019==    by 0x141092: rfxcodec_encode_ex (rfxencode.c:403)
==4019==    by 0x1411C3: rfxcodec_encode (rfxencode.c:439)
==4019==    by 0x119684: gfx_wiretosurface2 (xrdp_encoder.c:677)
==4019==    by 0x11A594: process_enc_egfx (xrdp_encoder.c:946)
==4019==    by 0x11AB8E: proc_enc_msg (xrdp_encoder.c:1090)
==4019==    by 0x4ABFAC2: start_thread (pthread_create.c:442)
==4019==    by 0x4B50A03: clone (clone.S:100)
==4019==  Address 0xf52f041 is 1 bytes after a block of size 3,145,728 alloc'd
==4019==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4019==    by 0x1195D3: gfx_wiretosurface2 (xrdp_encoder.c:669)
==4019==    by 0x11A594: process_enc_egfx (xrdp_encoder.c:946)
==4019==    by 0x11AB8E: proc_enc_msg (xrdp_encoder.c:1090)
==4019==    by 0x4ABFAC2: start_thread (pthread_create.c:442)
==4019==    by 0x4B50A03: clone (clone.S:100)
==4019== 
==4019== Thread 1:
==4019== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==4019==    at 0x4B528FE: __libc_send (send.c:28)
==4019==    by 0x4B528FE: send (send.c:23)
==4019==    by 0x486D854: g_sck_send (os_calls.c:1406)
==4019==    by 0x4876172: trans_tcp_send (trans.c:84)
==4019==    by 0x48770E4: trans_write_copy_s (trans.c:624)
==4019==    by 0x48C4158: xrdp_iso_send (xrdp_iso.c:637)
==4019==    by 0x48CA0F6: xrdp_mcs_send (xrdp_mcs.c:1473)
==4019==    by 0x48E786D: xrdp_sec_send (xrdp_sec.c:1823)
==4019==    by 0x48BDE14: xrdp_channel_send (xrdp_channel.c:150)
==4019==    by 0x48C1099: xrdp_channel_drdynvc_data_first (xrdp_channel.c:1002)
==4019==    by 0x489920F: libxrdp_drdynvc_data_first (libxrdp.c:1458)
==4019==    by 0x1345DE: xrdp_egfx_send_data (xrdp_egfx.c:64)
==4019==    by 0x12CC75: xrdp_mm_process_enc_done (xrdp_mm.c:3469)
==4019==  Address 0x9848443 is 19 bytes inside a block of size 32,768 alloc'd
==4019==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4019==    by 0x48C3E85: xrdp_iso_init (xrdp_iso.c:603)
==4019==    by 0x48C9CAE: xrdp_mcs_init (xrdp_mcs.c:1380)
==4019==    by 0x48E31F5: xrdp_sec_init (xrdp_sec.c:647)
==4019==    by 0x48BDBED: xrdp_channel_init (xrdp_channel.c:90)
==4019==    by 0x48C0EC0: xrdp_channel_drdynvc_data_first (xrdp_channel.c:980)
==4019==    by 0x489920F: libxrdp_drdynvc_data_first (libxrdp.c:1458)
==4019==    by 0x1345DE: xrdp_egfx_send_data (xrdp_egfx.c:64)
==4019==    by 0x12CC75: xrdp_mm_process_enc_done (xrdp_mm.c:3469)
==4019==    by 0x12D43D: xrdp_mm_check_wait_objs (xrdp_mm.c:3717)
==4019==    by 0x140079: xrdp_wm_check_wait_objs (xrdp_wm.c:2351)
==4019==    by 0x133F93: xrdp_process_main_loop (xrdp_process.c:287)
==4019==

The memory buffer which isn't big enough is allocated at xrdp_encoder.c:669:-

xrdp/xrdp/xrdp_encoder.c

Lines 659 to 670 in c3cb855

    
           if (ENC_IS_BIT_SET(flags, 0)) 
        
           { 
        
               /* already compressed */ 
        
               bitmap_data_length = enc_gfx_cmd->data_bytes; 
        
               bitmap_data = enc_gfx_cmd->data; 
        
               do_send = 1; 
        
           } 
        
           else 
        
           { 
        
               bitmap_data_length = self->max_compressed_bytes; 
        
               bitmap_data = g_new(char, bitmap_data_length); 
        
               if (bitmap_data == NULL)

and self->max_uncompressed_bytes is initialised here:-

xrdp/xrdp/xrdp_encoder.c

Lines 190 to 195 in c3cb855

    
           if (client_info->gfx) 
        
           { 
        
               // Magic numbers... Why? 
        
               self->frames_in_flight = 2; 
        
               self->max_compressed_bytes = 3145728; 
        
           }

I've done some playing around, and if I set max_uncompressed_bytes to 6MB (i.e. 6 * 1024 * 1024), all is well. I still get errors however if I only set it to 4MB (i.e. 4 * 1024 * 1024)

I'm unable to comment further as I don't understand the 'magic number' comment.

metalefty · 2024-02-26T12:56:38Z

Thanks @matt335672. It makes sense it happens only 4k geometry with a big screen update.

jsorg71 · 2024-02-26T21:47:07Z

It looks like I added it in sha1 966a819 in the origin egfx branch in 09/20/2020. I don't know where I got that and can't find any limit in MS-RDPEGFX. librfxcodec should not crash though, rfx_encode_diff_rlgr1 should be checking the out buffer. I'll check on that.
RemoteFX codec has both these settings but I don't know if GFX does.
It's probably OK to increase it.

Nexarian · 2024-02-26T21:51:11Z

I called it a magic number precisely because I didn't know why it was there, and was largely copy/pasting! Thanks Jay for the follow-up. I'm pretty sure increasing it is fine as well.

metalefty · 2024-02-27T04:30:28Z

Let's increase it to enough size for 4K dual monitor (maybe 12 * 1024 * 1024?).

jsorg71 · 2024-02-27T09:14:46Z

@metalefty I think it only needs to be as big as the biggest monitor. The shared memory and the encoder work on a monitor at a time. If there is no monitors and a huge desktop, it's like you say.

matt335672 · 2024-02-27T09:27:31Z

However, if there are no monitors and a huge desktop (e.g. if using VNC) we don't use the encoder. So I think this problem only occurs for GFX.

matt335672 · 2024-02-27T09:27:49Z

Sorry - I mean GFX with XUP.

metalefty · 2024-02-27T09:39:49Z

Jay, thanks for the info.

What about assuming a 16k x 16k per monitor as a maximum size for the moment then? It is the maximum size that xrandr (xorgxrdp?) reports.

$ xrandr
Screen 0: minimum 256 x 256, current 1920 x 1080, maximum 16384 x 16384
rdp0 connected primary 1920x1080+0+0 0mm x 0mm
   1920x1080     50.00*

matt335672 · 2024-02-27T09:45:40Z

16k x 16k gives us a value of 256Mb. That's assuming that @metalefty's test program is producing worst case sizes, and it may not be.

We can also set the size based on the largest monitor:-

--- a/xrdp/xrdp_encoder.c
+++ b/xrdp/xrdp_encoder.c
@@ -85,6 +85,40 @@ xrdp_enc_data_done_destructor(void *item, void *closure)
     g_free(enc_done);
 }
 
+/*****************************************************************************/
+static unsigned int
+get_largest_monitor_pixels(struct xrdp_mm *mm)
+{
+    int max_pixels;
+
+    struct xrdp_client_info *client_info = mm->wm->client_info;
+    struct display_size_description *display_sizes;
+    display_sizes = &client_info->display_sizes;
+
+    if (display_sizes->monitorCount < 1)
+    {
+        max_pixels = display_sizes->session_width *
+                     display_sizes->session_height;
+    }
+    else
+    {
+        max_pixels = 0;
+        struct monitor_info *minfo = display_sizes->minfo;
+        int i;
+        for (i = 0 ; i < display_sizes->monitorCount; ++i)
+        {
+            unsigned int pixels = (minfo[i].right + 1) - minfo[i].left;
+            pixels *= (minfo[i].bottom + 1) - minfo[i].top;
+            if (pixels > max_pixels)
+            {
+                max_pixels = pixels;
+            }
+        }
+    }
+
+    return max_pixels;
+}
+
 /*****************************************************************************/
 struct xrdp_encoder *
 xrdp_encoder_create(struct xrdp_mm *mm)
@@ -189,9 +223,12 @@ xrdp_encoder_create(struct xrdp_mm *mm)
     self->xrdp_encoder_term = g_create_wait_obj(buf);
     if (client_info->gfx)
     {
-        // Magic numbers... Why?
+        // Assume compressor needs to cope with largest monitor with
+        // ineffective compression
         self->frames_in_flight = 2;
-        self->max_compressed_bytes = 3145728;
+        self->max_compressed_bytes = get_largest_monitor_pixels(mm) * 4;
+        LOG_DEVEL(LOG_LEVEL_INFO, "Using %d max_compressed_bytes for encoder",
+                  self->max_compressed_bytes);
     }
     else
     {

That gives us 32Mb for a 4K monitor. Also, we're allocating max_compressed_bytes for each encoder in multimon which is somewhat wasteful.

What so you think @jsorg71 ?

jsorg71 · 2024-02-27T09:48:53Z

Ah, that looks best. Didn't test but that's the idea.

metalefty · 2024-02-27T09:54:48Z

Wow, 256M per monitor per user is huge and wasteful. That looks best to me, too.

matt335672 · 2024-02-27T10:00:52Z

I'll put a PR together in the next hour or so.

Jay - can you look at whether we need to test the buffer while we're writing into it?

jsorg71 · 2024-02-27T10:03:53Z

@matt335672 yes, sure I am already. In fact I'm confused on why it crashes. I'm planning to set max_compressed_bytes really low and debug what's happening. I'll create another issue for this so we can close this one.

matt335672 · 2024-03-03T12:46:52Z

Reopened as it needs more investigation.

jsorg71 · 2024-03-14T20:37:20Z

This is taking longer than expected to fix. The way RFX works is that 'tiles_written' is returned and if that is not the whole tile set, it's called again.
This does not work with GFX because we can't have missing tiles in the tile set. Remember the green tiles in #2875 . This was caused by the missing tiles.
So I have to fix this a different way. Still making multiple calls to librfxcodec but using a region to control the tile set.

jsorg71 · 2024-03-18T08:09:10Z

To my surprise, what I said last is not true. It looks like as long as all the tiles are send before the end frame we're OK. I wish there was documentation on how the works.
I have working librfxcodec and xrdp changes to fix this without the huge buffer.
I'll prepare a PR for each.

jsorg71 · 2024-03-23T18:30:50Z

should be fixed with #3013 and neutrinolabs/librfxcodec#61

matt335672 · 2024-03-24T11:12:59Z

That's great news @jsorg71.

I'll test this on my own setup tomorrow (UK time).

metalefty · 2024-05-21T13:06:31Z

This one can be closed now.

matt335672 mentioned this issue Feb 27, 2024

Increase GFX output buffer size #2969

Merged

metalefty closed this as completed in #2969 Mar 3, 2024

matt335672 reopened this Mar 3, 2024

metalefty mentioned this issue Mar 21, 2024

Triage before v0.10.0 release #3010

Open

matt335672 mentioned this issue Mar 25, 2024

gfx send multiple wire to surface messages when compressed data is la… #3013

Merged

metalefty closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFX connection reliably crashes with SIGABRT #2968

GFX connection reliably crashes with SIGABRT #2968

metalefty commented Feb 26, 2024 •

edited

Nexarian commented Feb 26, 2024

matt335672 commented Feb 26, 2024

matt335672 commented Feb 26, 2024

metalefty commented Feb 26, 2024

jsorg71 commented Feb 26, 2024

Nexarian commented Feb 26, 2024

metalefty commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

matt335672 commented Feb 27, 2024

matt335672 commented Feb 27, 2024

metalefty commented Feb 27, 2024

matt335672 commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

metalefty commented Feb 27, 2024

matt335672 commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

matt335672 commented Mar 3, 2024

jsorg71 commented Mar 14, 2024

jsorg71 commented Mar 18, 2024

jsorg71 commented Mar 23, 2024

matt335672 commented Mar 24, 2024

metalefty commented May 21, 2024

GFX connection reliably crashes with SIGABRT #2968

GFX connection reliably crashes with SIGABRT #2968

Comments

metalefty commented Feb 26, 2024 • edited

Nexarian commented Feb 26, 2024

matt335672 commented Feb 26, 2024

matt335672 commented Feb 26, 2024

metalefty commented Feb 26, 2024

jsorg71 commented Feb 26, 2024

Nexarian commented Feb 26, 2024

metalefty commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

matt335672 commented Feb 27, 2024

matt335672 commented Feb 27, 2024

metalefty commented Feb 27, 2024

matt335672 commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

metalefty commented Feb 27, 2024

matt335672 commented Feb 27, 2024

jsorg71 commented Feb 27, 2024

matt335672 commented Mar 3, 2024

jsorg71 commented Mar 14, 2024

jsorg71 commented Mar 18, 2024

jsorg71 commented Mar 23, 2024

matt335672 commented Mar 24, 2024

metalefty commented May 21, 2024

metalefty commented Feb 26, 2024 •

edited