-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak - 700mb RAM #394
Comments
Could you describe your general usage; SMS, sshfs, etc?
Yes, these are not useful for this type of debugging. This will be removed from user view for the next release. You'll have to enable debugging as described here: Debugging. |
That's the thing, on days where I found high RAM usage, I hadn't used it at all. Maybe once or twice I opened the collapsible gnome settings panel to see what my tablet's battery level was. I haven't used it to send files, texts, or anything other than occasionally pausing a movie from bed from multimedia controls or remote input. I've restarted the process from the debugging window and am now at a healthy 26mb. If/when it rises again, would a gbd memory dump of the process be more useful? In the meantime I've enabled "Debug logging" and am keeping an eye on the terminal and process. |
A gdb dump probably wouldn't be much help without a lot of debug symbols, and even then I'd really hope it was my mistake and not GJS's since that would be a lot easier to fix 😝 Fortunately restarting the service should be relatively easy, if you can't find any obvious debug issues you can restart the service with There weren't any reports about this in v16, so hopefully I can track down what might have caused this. EDIT one thing that might help; does this only happen when the screen is off? I know there are some extension problems in the default Ubuntu session, maybe they are also affecting GSConnect. |
Actually, I do shut my monitor off and leave my desktop logged in throughout the night. And with the (minor) testing I could do in 10 minutes, messing around with remote input while the screen is off did make memory go up by about 15 MB. Also, opening the "Send files" dialog multiple times causes RAM to increase by 1-2MB and it never goes back down, even after triggering garbage collection. So it could definitely be a gjs issue. For now I'll refrain from restarting the service and let it run throughout the night, report back if I can confirm that the screen being off is the cause of the memory leak. Also, thanks for the great work on this wonderful extension! |
Hmm, thanks those are some good places for me to start looking. It is strange that your help dump is so small, even though the memory usage is so high, not that there is explicitly useful information inside it. |
I believe there is a particular instance that is causing this. It seems like there is a leak that occurs only when destroying all rows in a I'm waiting for some advice from the GJS maintainer, but in the mean time if you'd like to give this Zip a try, I'd be interested to hear if it fixes or at least keeps the problem under control: gsconnect@andyholmes.github.io.zip You can install it using the instructions in the Wiki, or alternatively just build straight from git. EDIT nevermind, I think this is just a second bad leak I found 😞. I just had this memory leak appear after only a few minutes of use. Seems like a bad one :( |
It's funny, I left it all night with the monitor off, like I said. In the morning it was up to 68mb which wasn't too bad. I almost forgot about it today. Haven't used KDE Connect at all. Then all of the sudden I see it had jumped up to 488mb sometime in the span of an hour? I've got the GJS log which might indicate something (very likely truncated because of the limited terminal scrollback) One thing I did notice is the icon kept disappearing from gnome tray throughout the day. My android probably kept killing KDEConnect because of memory, and I also jump between different hotspots throughout the day so perhaps constant disconnections/reconnections could occasionally cause a memory leak somehow? Pure speculation. As I was typing this, the icon appeared and disappeared about 5 times.. sadly it didn't correspond to a memory increase. I'll try your build for a day and see how that affects things. |
Ah ha, good speculation! I think that causes it, since I've duplicated it a few times now by walking my phone to where another router picks up the signal. I'm not sure how it exactly it happens or how to fix it, but this is a very good start to bad bug (for me it's almost always ~800mb 😨) |
Great to hear you can reliably duplicate it! The build I tried yesterday didn't do much difference as I'm now back at 453MB ram. If you can track it down, let me know if you have any other builds you would like me to try. |
Well, not reliably :) It is a weird situation and I'm not entirely sure why it happens (this kind of race condition should already be addressed), or where the giant memory leak comes from. I noticed output like this:
So I guess there are a bunch if incoming connections and the wrong one resolves first, resulting in a dead connection. ~800MB is probably more than a couple months worth of packet data, so I guess maybe it's a dead socket reading tons of null bytes into nowhere or something? In any case, here's an attempt at a fix, so if you could test this and see if it reliably fixes the problem I'll try and do the same. The approach is just to keep track of connections coming from a single host (eg. 192.168.1.68) and always favour the newest incoming connection, closing any current one. This might not work if your IP is changing between networks/hotspots, but it's unclear whether the connection lasts long enough to determine what device it's coming from or if there's only a chance to get the IP. |
Sadly, I may have spoken too soon as I'm now back at 800mb, completely on its own before I had the chance to stress test it. This one might take a while 😣 |
You're both on Ubuntu, right? I wonder if that common thread is somehow meaningful? Because, I've never seen anything even remotely like this, running Granted, my PC:phone connection might be a bit more stable than average, but there are times when I'm at home with the phone in my pocket (not charging), in which case the WiFi would disconnect every time I lock it and reconnect every time it wakes up again, and still no hints of this. (I was also wondering if Android version might be involved, but @x0a 's tablet is likely running an older release than my Android N phone, and both are Samsung so that's probably not significant.) |
@ferdnyc I tried something that I hadn't tried before, restarted my tablet. I hadn't figured it necessary since the service/app had been killed by Android dozens of times throughout the day. But since I've restarted last night I haven't seen memory rise above 35mb after 6 hours. I am running Android 4.4.2. Could be an obscure issue with the kdeconnect client. But I think it also lends credence to the idea that constant reconnections are responsible for the memory leak as restarting Android also kills any misbehaving services that could have been responsible for Android constantly having to kill the kdeconnect service. |
I think multiple connections is a good bet, but really GSConnect should be able to handle any of these events as well as KDE Connect. The real question is, what is this leaked memory made of and where is it being leaked to? ~800MB is a massive amount of memory, especially in this application, but on top of that why isn't being freed later anyways? It's a head scratcher, for sure. I might have to try and figure out a way to manufacture this situation so I can really track down what's happening :/ |
I wonder if running I ran ..About as uninformative as you'd expect, under normal operation. Still, *shrug* |
@ferdnyc I've been eager to try this and have been waiting for the bug to manifest itself again, but I haven't been able to reproduce the memory leak since I restarted my tablet (nearly 36 hours ago). Great news for me obviously since I don't have to keep killing GJS anymore, but bad news for debugging. I suppose if it doesn't show up again after a month we can chalk it up to a bad instance of KDEConnect and call it done? |
This might also be related to a bug fixed in kdeconnect-android (d4d4849) where sockets were not being properly closed. GSConnect on the other hand, goes way over board when closing a channel, but that might still mean there's an obscure bug lurking somewhere. I've haven't hit this bug in awhile myself, but I've also been running nightly builds on and off so that may have something to do with it. In any case, I'll probably release v18 soon regardless, since it's been some time since v17 and there are a lot of bug fixes. I'll probably just commit the "fix" I added anyways, since it doesn't seem to harm anything. |
How has this one been going? Personally this hasn't reoccurred for a very long time, safe to close it now? |
Yeah I think we can call this one. Zero re-occurrences Thanks for the hard work! |
Hi, could you describe your usage when the leak happened or how to reporoduce? |
I don't really know why it happens, maybe it has something to do with restarting the phone... I'm not really sure. I can't figure out if there is some kind of pattern to it. |
This has happened to me as well a few times. Each time is was right after I closed my laptop lid with power attached. Disconnect power and take my device to another room and open the lid. Consistantly I hear my CPU fan ramp up. 700MB+ used. Haven't touched my phone in hours....but I do have two ubuntu systems, with gsconnect enabled, connected to my phone. |
I have actually, now, seen this exactly once — my occurrence looked exactly the same as @ivandotv (also on Fedora 29) — I have no idea what triggered it, and was unable to find any causes. (Interestingly, though, I'm on a desktop computer with no lid, no suspend beyond display sleep, etc.) But it's only ever been that one time. I've since restarted, and the |
I'm using desktop computer |
Correct on CPU pegged just 1 core. The spike never ended, I just killed it. I can tell when it's doing by my CPU temp and my fan starts ramping up. It was the gjs process. |
OK so today my VM Ubuntu 18.04 is doing it. See Photo Sorry for all the edits. One thing I am noticing is that there are two old notifications in my notifications panel from 2 days ago. Unknown Contact (Ongoing Call) and Smarthings battery low. These notifications were cleared off my phone yesterday and the VM has been up for 2 days. I also got the Ubuntu update prompt this morning, which I did with no reboot. I also noticed that my waiting channel is blank vs the screenshot @x0a put up. May not be anything just sharing what I have found. Ping from phone to desktop work. The battery power is incorrect on my workstation. It says 97% but it's actually 51%. 97% would have been when I charged it last night around 8PM EST or so. What is really strange about that is I remember unplugging my phone from the charger at exactly 97%. When I ring my workstation it does ring, the gsconnect app comes up on my panel, but the gui never opens. I had to do a quit from the menu and gjs went from 40% cpu back to the 25% No matter what I do I cannot get the GSConnect window to open. It comes up on my side bar, but no window. I am wondering if this because I use Gnome Tweaks and hide all my desktop icons....I see that GSConnect mounts the phone network share. |
@driscollw 's report inspired me to look at my own That's a helpful narrowing of the window, since in my case the size increase still isn't associated with any sort of increased CPU usage... total CPU time for that Anyway, I first launched the GUI and ran "Generate Support Log". This is the entirety of what that produced:
Um. Seems a little... spare. So I ran The lines that look interesting to me, because (a) I've never seen them before, and (b) they fall nicely within the time period in question, having been logged around 8 hours ago, are these. (Line-wrapped so you don't have to scroll horizontally.)
Other than that, everything that's visible in the log seems to be business as usual, mostly just hundreds and hundreds of lines of " Notes
|
On my T420 as SOON as I opened my lid the CPU went strait to 25% 220MB to 525MB in a matter of 10 seconds. It was fine for about 10secs as I still had system monitor up from when I suspended it last night. GSConnect menu showing incorrect battery level. gjs was running fine last night when I suspended it. |
Yeah, after the third person posted a 10MB+ heap dump and several complaints about GSConnect being insecure because debug logging was left on by accident, it became obvious the debug tool was confusing users. The developer tool is also hidden behind a GAction now. So the usage is:
You can still turn on debug logging, but it must be done in dconf and whenever the Generate Support Log dialog closes, it disables it.
These lines here are probably the most helpful yet. It indicates that the Android app is sending a broken identity packet, which is only failing once
Probably this is an unresolved async function/Promise that eventually gets culled by GJS. Promises are queued in GJS in the GMainLoop with a custom GSource, but there's not much information here about which it is. Most likely it's some async function in I just pushed 71337e9 which catches errors in Updated ZIP |
It's happened twice in the past hour for me. One thing I will add is that I am on a 5GHz wifi channel with an 6MB/s connection speed. Thinking more about this I often see this the most when my signal is poor. Out of suspend I sometimes have a ? where my wifi signal should be. Also my phone is on 5Ghz and has a poor signal in my Den. This morning when I stated that the power level was incorrect on my Ubuntu VM (Excellent Signal, right at AP) at 97% I was charging my phone in the Den (The night before). |
...So, here's another interesting wrinkle: That very same So now I'm wondering if those |
Last night I sat my phone right next to my T420 for about 2 hours. I watched the memory usage go from about 22MB to 225MB in the course of two hours. I did not use either device. I just moved my mouse once in a while to see what is was currently using. |
It's the same for me on Arch Linux. The only thing I do with GSConnect is to sync notification from my phone to my desktop and yet I end at 400-800MiB at the end of the day
|
Hi, since I still can't reproduce this locally I rely on people who can to test the fixes I apply. Here is a Zip built from master which includes 71337e9 from the Zip above: gsconnect@andyholmes.github.io.zip See Install from Zip in the Wiki for instructions. |
g_data_input_stream_read_line() will return NULL when there is no data to read, considering it the "end of stream". In practice this seems to happen occasionally with KDE Connect sockets even when the connection is not closed, however ignoring the NULL byte can result in the receive loop spinning out. We now close the connection when we read a NULL byte, which *may* solve the mysterious high CPU usage and large memory leak, but comes with the risk that clients won't be able to connect at all depending on why this happens for KDE Connect packet streams. addresses #394
I've pushed 5401bc2 with an additional attempt to plug this leak. As detailed in the commit message, this comes with the risk that some clients may be unable to connect at all, so this won't be in a stable release unless I get lots of testing and confirmation. Here is an updated Zip: |
Thanks! |
Normal memory usage is about 30-40MB (resident - shared). |
Looks good. yesterdays normal usage at work and at home. gjs still chills at 24-36MiB. Usually I hit the bug every day once or twice at some point, so I'm really happy with what I see now. I'll go on with the testing for some longer time and report back |
@andyholmes I found something interesting as well. However I did not test it yet as I would need to clone and build the plugin. There is a missing 'await' in front of 'this.__cache_write()' __cache_write method returns a promise and because the code does not wait for it to be resolved it creates a loop. The data to be cached is not being released from the memory: This is just a hypothetical scenario as I could not test it yet. |
I may be misunderstanding you, because that code is a bit convoluted (mixing task threads in a single-threaded engine, while maintaining file IO thread safety is a bit weird), but here's my take:
Hmm, I think there is a recursive loop problem here, although I'm not sure if it's the one you're thinking. I believe this conditional: should be outside the There's two "mutex" related variables: As an example, assuming the cache data has changed before each call to // (1) First attempt is made and __cache_lock is false...
// so __cache_lock is set to true and IO is started.
foo.__cache_write();
// (2) Second attempt is made and __cache_lock is true, so __cache_queue is set to true and
// so __cache_queue is set to true and the function returns
foo.__cache_write();
// (3) Third attempt is made and __cache_lock is true, so __cache_queue is set to true and
// the function returns
foo.__cache_write();
// (4) First attempt completes, __cache_lock is set to false
// __cache_queue is true, so __cache_queue is set to false and the function recurses Did I explain and/or understand you correctly? |
Because finally will be invoked when returning a try block, the mutex check in the __cache_write() should be outside the try block for the function to recurse in proper order. addresses #394
I found a 100 thousand lines of this in my journalctl log after I noticed that gjs locks in one core at 100% cpu usage raising my cpu temperature: |
This is not related to high memory usage. Most likely you are using a distribution which has already updated to glib2 |
Since there have been no further reports of the original high-memory usage bug, I'm going to close this issue. I believe the original issue was fixed in 5401bc2. Thanks for everyone help with logs and testing on this bug! |
I've the same issue with GSConnect 24 on Ubuntu 18.04.
I'll try to send an heap file next time, the process was killed by OOM before a can get on :( |
@vberthet this is probably a different bug than the one fixed in this issue. Can you open a new issue with logs for this? |
I have the same issue! |
Hello, I'm not sure if this one is related but I experienced giant memory usage as well. In my case it's caused by a software that send massive amount of notifications. Perhaps the more notification in a session, the bigger ram usage is. Here is the video of my reproduction: |
Hm, that's interesting! Some differential diagnosis:
|
4~5. there is no notification shared. At the time I didn't connect to any android devices. I only use GSconnect occasionally to copy files. I never do anything else with it. I'm not sure how do I do that. About that app: This can also be reproduced by calling I'm not sure if there is any case where user would receive a lot of notification. One thing that I can think of is if I enable all notification instead of just mentions for Discord and run it for long period of time without rebooting. Doing the discord thing appears to have similar effect. Memory usage slowly increase, sometimes it went down but overall it kept increasing. In my case it's still not a big problem. Perhaps this is different than what people discussed above. |
Ah, sorry, I was on my phone when I sent that previous reply and couldn't look into things too deeply. On the Android side, to receive notifications from a Linux peer you'd want the "Receive notifications" plugin enabled for that peer — "Notification sync" enables sharing in the other direction, for notifications that originate locally on the Android device. On the Linux side, the Notifications plugin handles sharing in both directions, unlike on Android where it's two separate plugins. It can be enabled/disabled under "Advanced". (It's normally enabled). When enabled, the Notifications settings panel lets you enable/disable global notification sending, and then if that's enabled, also disable sharing for specific Linux application sources. One thing I notice is that the DBus notification handler unpacks the notification contents anytime a notification comes in, because it handles managing the applications list unconditionally. Only then does it check whether notification sharing is enabled. If not, it bails, otherwise it attempts to generate a message over the wire to the connected peer. gnome-shell-extension-gsconnect/src/service/plugins/notification.js Lines 232 to 265 in aa02b6a
So, it's possible there's some sort of leak in The other thing, though, is that AFAICT all of this plugin code runs on a per-peer basis. So if you didn't have any peers connected to the Linux machine at the time, none of this should even have been running. If that's the case, I'm really at a loss as to how the gjs process could've racked up so much memory usage. Unless perhaps we're not correctly dropping our DBus listener connections when peers disconnect, so that we'd still be bombarded with unhandled notifications. You don't happen to see any messages like these in your user journal, referencing the GSConnect session bus connection, do you?
(I know what that message would look like because, in an unrelated problem, I've been getting flooded with them for |
Describe the bug
Accumulates hundreds of MB of ram over the course of several hours.
To Reproduce
Not entirely sure how to reproduce. But it happens pretty consistently. I've not changed any of the defaults, I've only connected one device, my tablet.
Screenshots
![screenshot from 2018-12-16 18-52-29](https://user-images.githubusercontent.com/13019784/50061247-c7622980-0163-11e9-9ff7-379e491cb9fe.png)
![screenshot from 2018-12-16 12-34-18](https://user-images.githubusercontent.com/13019784/50061240-a4d01080-0163-11e9-8de1-703d528b3aba.png)
Debug output
Heap attached. Doesn't seem to match what the terminal shows.
gsconnect.heap.zip
System Details (please complete the following information):
GSConnect environment (if applicable):
The text was updated successfully, but these errors were encountered: