Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage increases to 24% for 10-20 minutes after exiting OpenCPN #3042

Closed
we9v opened this issue Mar 2, 2023 · 62 comments
Closed

CPU usage increases to 24% for 10-20 minutes after exiting OpenCPN #3042

we9v opened this issue Mar 2, 2023 · 62 comments
Assignees
Labels
Milestone

Comments

@we9v
Copy link

we9v commented Mar 2, 2023

Describe the bug
After exiting OpenCPN either via the X or Navigate>Exit OpenCPN menu item, the CPU usage for the OpenCPN task goes from ~10% while running, up to 24-25% after being closed, as observed in the Task Manager. It stays at this 24% for approximately 10-15 minutes after closing.

Expected behavior
I expect the task to be killed in short order, and not consume that much CPU after being closed.

Desktop (please complete the following information if applicable):
Raspberry Pi 4, 2GB, OpenPlotter v3, OpenCPN Installer 3.2.6
Debian bullseye
OpenCPN 5.6.2+dfsg-1~bpoll+2

@we9v we9v added the bug label Mar 2, 2023
@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

It is not clear to me whether the opencpn process is still running during these 20 minutes, is it? May we have a look at your logfile, please?

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

I did a quick test on my Rpi-4 with the same version. Nothing of this kind occurs here. @nohal is right: The first step would be to have a look at the logfile.

@we9v
Copy link
Author

we9v commented Mar 2, 2023

O was running at 7:38am
o_running_738

Then I closed it at 7:38
o_closed_738

10 minutes later, at 7:48, the process/task was still running. At this time, the log file still had a time stamp of 7:38.
o_closed_748

12 minutes after closing, at 7:50, the process was gone. Also, the log file had a new time stamp of 7:50.
Log file attached.
opencpn.log

@leamas leamas removed the info needed label Mar 2, 2023
@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

From the log I see that the only plugin you have installed is StatusBar. I did the same installation, but still cannot reproduce the problem.

@we9v : Is this reproducible for you i. e., does it happen every time?

@we9v
Copy link
Author

we9v commented Mar 2, 2023

Several plugins installed, but only Dashboard (1.2), and StatusBar (1.0.3.0) enabled (as well as ChartDownlaoded and WMM, of course). I believe it also occurred when I had Dashboard Tactics enabled instead of Dashboard. The other loaded but disabled plugins are: GRIB, WeatherRouting, Watchdog, DashboardTactics, Climatology.

VERY reproducible. From what I can tell, 100%.

Probably obvious, but if I try to re-start O, within those ~10+ minutes, it does not work. Sometimes after much delay, it complains "Sorry, and existing instance of OpenCPN may be too busy to respond. Please retry.", which led me to look at the task manager originally.

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

Just a though before starting to dig deeper into it, do you have many waypoints, tracks or routes?

@we9v
Copy link
Author

we9v commented Mar 2, 2023

0 Routes, 0 Tracks, 0 Waypoints.

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

Another simple-minded test: does it happen if you disable all plugins?

@bdbcat
Copy link
Member

bdbcat commented Mar 2, 2023

Any network GPS data sources?

@we9v
Copy link
Author

we9v commented Mar 2, 2023

@bdbcat Occurs whether or not there are connected GPS sources. When Pi fires up, it automatically connects to my Starlink network and there are no GPS sources. When I change the network connection to my Yacht Devices YDNR-02, the it will receive GPS data from up to 3 sources: AIS, helm GPS, nav desk GPS. Log provided earlier is without any GPS source, or any NMEA (0183 or 2k) traffic at all, since it was connected to Starlink, not boat NMEA multiplexer.

@leamas With O running, I disabled the 4 previously mentioned plugins, including the default ChartDownloader and WMM. Closing that O session resulted in the same behavior (24% CPU, etc.). In the interest of time, kill that process, restart O, complains about last run failed, answer No to start in safe mode. Verify no plugins enabled. Close O, CPU jumps to 24%. Kill the task again, start O again, this time answer YES to start in safe mode. Close O again, and CPU again goes from 0-2% while running up to 24% after closed. These tests were all without GPS source(s), no NMEA data from multiplexer.

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

When I change the network connection [...]

what if you disable all connections completely?

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

And after that I would like to see what perf top -p $(pidof opencpn) says during the time the CPU usage gets high. If perf is not installed on OpenPlotter by default, you might need sudo apt install linux-tools-generic, I think...

@we9v
Copy link
Author

we9v commented Mar 2, 2023

@leamas Disconnected the WiFi connection to Starlink (I have external USB dongle for an additional WiFI), and also unchecked any connections in O's options>connections. Same issue.

@nohal I'm linux illiterate. That command to install did not work, resulted in E: Unable to locate package linux-tools-generic
Some Googling suggested sudo apt install linux-tools-common but similar error. Another place suggested sudo apt install linux-tools-$(uname -r) linux-tools-generic but also similar errors. WiFi was reconnected and Starlink working fine, able to browse web. sudo apt update prior to this seemed to work fine (Fetched 790 kb in 19s).
I'd be happy to try anything else...sorry I don't know linux, you'll have to be explicit (as you were above).

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

Sorry, I don't know OpenPlotter (apart that it is somehow based on Ubuntu/Debian) and don't have any RPi laying around to try it. @leamas can you help us to find a way to install perf there?

@we9v
Copy link
Author

we9v commented Mar 2, 2023

More googling suggested that the correct install for Debian Bullseye is sudo apt install linux-perf
But this installs for 5.10. Doing uname -r shows 5.15.84-v8+. When I try to run perf, it complains linux-perf-5.15 is not installed.

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

@nohal: Raspbian is derived from Debian, it's basically an Ubuntu sibling.

EDIT: OpenPlotter is just some configuration and installed software on top of Raspbian

@we9v Will look into the perf mess

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

@we9v :

To install: sudo apt install linux-perf-5.10.

To run: perf_5.10

@we9v
Copy link
Author

we9v commented Mar 2, 2023

@leamas That worked splendidly. Sorry to be PITA.

@nohal I can attach screenshots if you wish, but a description may be adequate: Before closing, there were several items in the sub-2%. After closing, libc-2.31.so is at 99.90%, three between 0.01-0.03%, and the rest 0.00%.

@we9v
Copy link
Author

we9v commented Mar 2, 2023

Decided to do screen shots anyway, as things changed slightly after sending my response.

This is how it looked right after closing.
perf-after-close

And after a while, it looked like this.
perf-a-while-later

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

That was a good screenshot. Code: (tcmgr.cpp, around line 635

TCMgr::~TCMgr() { PurgeData(); }

void TCMgr::PurgeData() {
  //  Index entries are owned by the data sources
  //  so we need to clear them from the combined list without
  //  deleting them
  while (m_Combined_IDX_array.GetCount()) {
    m_Combined_IDX_array.Detach(0);
  }

  //  Delete all the data sources
  m_source_array.Clear();
}

Thoughts?

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

Can we really spend 20 minutes here? How many entries can there be? Anyway, m_Combined_IDX_array should probably be a vector of pointers and we should simply throw it away...

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

Does that comment make sense?

  • m_Combined_IDX_array is an ArrayOfIDXEntry
  • ArrayOfIDXEntry is declared using WX_DECLARE_OBJARRAY i. e. it's an array of objects, not pointers
  • Which means that these objects are owned by the array.
  • When they are detached it's the end of their lifespan, and the dtor will be invoked

Which, FWIW, invalidates the comment. Or?

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

It should not be an array, especially if cleaning it takes 20 minutes...

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

hm... IDX_Entry.cpp:


IDX_entry::IDX_entry() { memset(this, 0, sizeof(IDX_entry)); }

IDX_entry::~IDX_entry() { free(IDX_tzname); }
                                                  

That is, an entry will reliably be initiated to all zero. But just grepping around for IDX_tzname I am sure there are cases where IDX_tzname is not set. The DTOR should definitely be { if (IDX_tzname) free(IDX_tzname); }.

However, this might very well be a distraction...

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

Well, the problem on this particular system seems to be, as the logfile suggests, that the tide dataset seems to be loaded (And at the end cleared) 124 times instead of once...

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

The DTOR is indeed an distraction. Forget it.

@we9v
Copy link
Author

we9v commented Mar 2, 2023

Deleted the 2 extra, opened O, removed the connection and 4 plugins, closed O, there was one added.
Deleted that one, opened then closed O, there was one added back.

Summary, every time I close O, another one is added.

@nohal
Copy link
Collaborator

nohal commented Mar 2, 2023

Really weird we have never heard about this before (and certainly it does not happen on any of my systems AFAICT)

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

@nohal: It is possible to run in gdb, the debug symbols are available. Could it be the way?

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

It might be some other part of the configuration.

@we9v If you make a backup copy of your config file, and then removes the config file. And then starts opencpn without any config file, a new fresh one will be created. In this file, are there still added tcds lines each time you exit OpenCPN?

@we9v
Copy link
Author

we9v commented Mar 2, 2023

Yes. First instance created the file with two entries. Opened and closed O two more times after that, then checked the config file and there were 4 entries, 3 of the =/usr/share/opencpn/tcdata/HARMONICS_NO_US.IDX

@leamas
Copy link
Collaborator

leamas commented Mar 2, 2023

@we9v: Bingo! I can reproduce this. Each and every time I exit OpenCPN a new such line is added also for me.

Important step. I think we should be able to process this now, but I need some sleep (which timezone are you in?)

@we9v
Copy link
Author

we9v commented Mar 2, 2023

Central. Living on my new-to-me boat in Florida, escaping Wisconsin winters.

@leamas
Copy link
Collaborator

leamas commented Mar 3, 2023

Also Central, but the European one. Good morning!

@leamas
Copy link
Collaborator

leamas commented Mar 3, 2023

I have tested patch below against bullseye, and it sort of fixes the issue by avoiding duplicated tcds lines in the config file. However, I have absolutely no idea what the underlying reason for the duplicates is.

The patch should IMHO be fine as a quick-fix for current backport, I will initiate the bureaucracy to get it in place for Debian.

However, unsure if this is the right thing to do in master.

@nohal @bdbcat: thoughts?

EDIT: The problem is not present in master, so the patch is certainly not necessary. Question is just if we need an extra safety net, especially since we still don't know the core reason for the internal duplicates.

diff --git a/src/navutil.cpp b/src/navutil.cpp
index 76ab43d..62cece8 100644
--- a/src/navutil.cpp
+++ b/src/navutil.cpp
@@ -2737,11 +2737,19 @@ void MyConfig::UpdateSettings() {
   DeleteGroup(_T ( "/TideCurrentDataSources" ));
   SetPath(_T ( "/TideCurrentDataSources" ));
   unsigned int iDirMax = TideCurrentDataSet.Count();
-  for (unsigned int id = 0; id < iDirMax; id++) {
-    wxString key;
-    key.Printf(_T("tcds%d"), id);
-    Write(key, TideCurrentDataSet[id]);
-  }
+  do { // a scope
+    // Avoid writing duplicated lines, see #3042
+    std::vector<std::string> done;
+    for (unsigned int id = 0; id < iDirMax; id++) {
+      auto current_data = TideCurrentDataSet[id].ToStdString();
+      if (std::find(done.begin(), done.end(), current_data) != done.end())
+        continue;
+      done.push_back(current_data);
+      wxString key;
+      key.Printf("tcds%d", id);
+      Write(key, TideCurrentDataSet[id]);
+    }
+  } while(false);
 
   SetPath(_T ( "/Settings/Others" ));

@leamas
Copy link
Collaborator

leamas commented Mar 3, 2023

@nohal
Copy link
Collaborator

nohal commented Mar 3, 2023

I have actually done pretty much the same at load time (Guarding https://github.com/OpenCPN/OpenCPN/blob/v5.6.x/src/navutil.cpp#L1419 from adding a path that is already present in the list) in my local code last night... Probably a bit better as it prevents the problem early. But did you find the root cause? I still don't see this problem here, at least not in current code or running the 5.6.2 flatpak...

@leamas
Copy link
Collaborator

leamas commented Mar 3, 2023

But did you find the root cause?

No

I still don't see this problem here, at least not in current code or running the 5.6.2 flatpak...

Nor in a build from current master.

So, the question remains: Should we add a "better safe than sorry" safety net like this, or just trust that this does not happen in 5.8. Personally I think it's worth the price, but no strong opinion

@nohal
Copy link
Collaborator

nohal commented Mar 3, 2023

As it seems from this issue we can assume there will be systems with configuration infested with duplicate entries, we probably should do something about it...
I have done a slightly bigger cut to the code in b88c0a8 let's wait for @bdbcat for final decision on what to include.

@leamas
Copy link
Collaborator

leamas commented Mar 3, 2023

In any case I agree that your patch is better. Given that this is a strange one, a reference to this issue in a comment IMHO makes send sense.

@bdbcat
Copy link
Member

bdbcat commented Mar 3, 2023

@leamas
" Bingo! I can reproduce this. Each and every time I exit OpenCPN a new such line is added also for me. "
Really? What platform/code base?

@nohal nohal added this to the OpenCPN 5.8.0 milestone Mar 4, 2023
@leamas
Copy link
Collaborator

leamas commented Mar 4, 2023

@we9v : I have uploaded a fix based on patch above to Debian. The new version is OpenCPN 5.6.2+dfsg-1~bpoll+3. This means that updating the package should solve the problem.

On Debian, updating is far from intuitive:

$ sudo apt update
$ sudo apt install --only-upgrade  opencpn

@leamas
Copy link
Collaborator

leamas commented Mar 4, 2023

@bdbcat : The reporter @we9v 's platform i. e., the official Bullseye backport on Debian. However, the failing package OpenCPN 5.6.2+dfsg-1~bpoll+2 is no longer available since I have updated it, see above. I'll somehow attach the relevant .deb packages, but have infrastructure issues with all three options github, zulip and fedorapeople right now.

@leamas
Copy link
Collaborator

leamas commented Mar 4, 2023

The 5.6.2+dfsg-1~bpoll+2 packages can be found here

@we9v
Copy link
Author

we9v commented Mar 4, 2023

@leamas So, for me to upgrade, it's still what you said up 3 messages, or do I need to do something with the message right above?

@leamas
Copy link
Collaborator

leamas commented Mar 4, 2023

So, for me to upgrade, it's still what you said up 3 messages

Yes

@we9v
Copy link
Author

we9v commented Mar 4, 2023

I can confirm that the patch works for me. Started and closed O a few times, and no new tcds line(s).
Apparently I open and close O more than the average person (this was a new install only about 3 months ago for my new boat and new Pi 4), or I was the only one who noticed the high CPU usage after close.

Thanks for the help guys!!

@bdbcat
Copy link
Member

bdbcat commented Mar 4, 2023

OK, I think I have a theory for the root cause.

  1. At startup, some poorly implemented code tries to verify that the first entry in the tcds sources is present and readable. This happens in old chart1.cpp, equivalent new code is in ocpn_app:1570. New code is slightly improved.
  2. It looks like if the first entry is not readable, another "/usr/share/opencpn/tcdata/HARMONICS_NO_US.IDX" is appended to the list.
  3. However, the first file in the list may be "/usr/share/opencpn/tcdata/harmonics-dwf-20210110-free.tcd".
  4. And sadly, O562 data package does not include nor install that file. That's the basic error.

So I think we can say that the problem is confined to O562. But the modern O571 data package needs to be checked to be sure all required files are present and installed.

It's a theory....

@bdbcat
Copy link
Member

bdbcat commented Mar 4, 2023

Anyway, I see no reason not to include Pavel's patch to scrub the list before writing to config file. It's good code, and can't hurt.

@leamas
Copy link
Collaborator

leamas commented Mar 5, 2023

This is a downstream packaging error. The root cause is that the file harmonics-dwf-20210110-free.tcd ended up as harmonics-dwf-20210110-free.tcd.gz in the -docs package rather than as-is in the -data package

EDIT: this also means that only the official packages are effected, not the PPA ones.

Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/opencpn/+bug/2009284
Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032370

I'll handle the packaging mess. However, I agree with @bdbcat : Pavel's patch makes sense as a safety net anyway, and I suggest this issue is kept open until it's merged.

@rgleason
Copy link
Collaborator

Was this merged? #2571

@leamas
Copy link
Collaborator

leamas commented Mar 15, 2023

#2571 has nothing to do with this issue. That it is merged is obvious inspecting the link.

What we are waiting for here is basically merging Pavel's patch b88c0a8. Dave has acknowledged that is it is good to include, so I guess the missing part is @nohal making a PR based on it.

@leamas
Copy link
Collaborator

leamas commented Mar 15, 2023

@we9v: downstream debian bugs filed and fixed, upstream patched to avoid this in the future.

OK to close?

@we9v
Copy link
Author

we9v commented Mar 15, 2023

@leamas I can confirm that after many reboots and restarts, there are no additional entries with this fix. Sounds like it can be closed. Thanks for your help!!

@nohal nohal closed this as completed Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants