Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant CPU Usage and Disk Write #617

Closed
coanghel opened this issue May 27, 2023 · 43 comments
Closed

Constant CPU Usage and Disk Write #617

coanghel opened this issue May 27, 2023 · 43 comments
Labels
🐛 bug Something isn't working

Comments

@coanghel
Copy link

Describe the bug
Speedtest-Tracker constantly running a process resulting in a small, but non-zero, amount of disk write and cpu usage.

To Reproduce
Steps to reproduce the behavior:

  1. Observe disk write and cpu usage of speedtest-tracker container

Expected behavior
No disk write and minimal CPU usage when not performing a speed test.

Environment (please complete the following information):

  • OS: Ubuntu Jammy
  • Architecture: arm64
  • Browser chrom
  • Version latest

Screenshots
image
image
image
image

Logs
If applicable, check the logs for any error that might of occurred.

Additional context
Observing in htop in the container points the finger to a php script. Speedtest-tracker is configured with an hourly cron and influxdb link

@luanupe
Copy link

luanupe commented May 27, 2023

Hi, this issue with the high CPU usage is happening with me too.
According to ProxMox statistics, I did not noticed any issues with the high disk usage, only with the CPU.

Hardware: iKoolCore R1
OS: ProxMox
VM Hardware: RAM 1024MB, CPU 2 Core 2Ghz, Disk 32GB
VM OS: Debian 11.7 (running on Docker)

@Knowthenazz
Copy link

Hi,

I'm running into both the high CPU and Disk writing issue.

Hardware: Synology
Platform: Docker
Browser: Firefox
Version: 0.11.16 and 0.11.1

-I've also observed that it's particularly bad when viewing the Dashboard and Results page.

Thanks!

@SkoricIT
Copy link

SkoricIT commented Jun 4, 2023

I can literally hear the NAS hard drives work when i open the web ui.

@alexjustesen
Copy link
Owner

There is some significant overhead occurring right now while running the task scheduler and queue. I've got those improvements scheduled for v0.16.0 which I'll add this issue to so the testing is done.

@alexjustesen alexjustesen added the 🐛 bug Something isn't working label Jun 7, 2023
@alexjustesen alexjustesen added this to the v0.16.0 milestone Jun 7, 2023
@alexjustesen
Copy link
Owner

Also worth noting, the spike on the hour are the speedtests running as a part of your schedule. Ookla's Speedtest CLI uses a lot of resources when running.

@SkoricIT
Copy link

SkoricIT commented Jun 8, 2023

This is while the web page is open in browser:

2023-06-08_17-20

This is a few seconds after closing the browser tab:
2023-06-08_17-20_1

The CPU usage alone is concerning, but like I said i can hear the hard drives working permanently while the web UI is open. I can NOT hear them work while the speed test is running without the web page being open.

@Knowthenazz
Copy link

Hi, thanks for the information, but I'm sure this doesn't apply to me. My speedtests are scheduled to run while I'm sleeping.

Fortunately when I close the tab/browser that is showing the dashboard or results page, the cpu drops within a few seconds.

Thanks for investigating this!

@alexjustesen
Copy link
Owner

This is while the web page is open in browser:

2023-06-08_17-20

This is a few seconds after closing the browser tab:
2023-06-08_17-20_1

The CPU usage alone is concerning, but like I said i can hear the hard drives working permanently while the web UI is open. I can NOT hear them work while the speed test is running without the web page being open.

It's got to be the polling then... I'll dig

@SkoricIT
Copy link

SkoricIT commented Jun 9, 2023

@alexjustesen Thank you very much. If I can help in any way, I'm at your disposal.

@alexjustesen
Copy link
Owner

I made some small changes in the release I just tagged, if you wouldn't mind letting me know if this improves CPU usage that'd be awesome.

https://github.com/alexjustesen/speedtest-tracker/releases/tag/v0.11.17

@SkoricIT
Copy link

I have pulled the newest image and it shows 0.11.17

grafik

Unfortunately, the symptoms are the same:
grafik
(filtered it down to php-fpm)

Funnily enough, if the tab is in the background (because i'm typing in github) the resource usage goes back to zero:

grafik

@SkoricIT
Copy link

If it helps for some reason (because it seems like frontend is involved): I'm on Firefox with Manjaro Linux.

@SkoricIT
Copy link

SkoricIT commented Jun 10, 2023

That seems to be because filament stops updating the widgets:

grafik

No Idea why the notifications request takes 100% longer when the tab is in foreground?

Also, there is like 5kb of data transferred to the frontend per second but the php-fpm disk writes alone are like 1000KB/s.

Red line = switch to BG

@alexjustesen
Copy link
Owner

@SkoricIT do you mind also testing with polling disabled? I updated the docs with a new Environment Variables page to help you out

@luanupe
Copy link

luanupe commented Jun 10, 2023

Hey Alex, I haven't had any issues on the last few days with the old version, and to be honest I haven't checked the web GUI of the on the past 4 days.

Just saw the notifications on the thread and I updated my docker image, I'll keep the tab open to verify if it is a front-end related issue and I'll let you know if anything changes.

I can't point which system process is hanging due to the VM completely freezing when the issue is happening, even the SSH is unresponsive.

I also increased the VM cores from 2 to 3 (I wan't to be able to connect to the VM when the issue happens) and here's my docker-compose.yml:

version: '3.3'

services:
  speedtest-tracker:
    image: 'ghcr.io/alexjustesen/speedtest-tracker:v0.11.17'
    container_name: speedtest-tracker
    ports:
      - '80:80'
      - '443:443'
    environment:
      - PUID=1000
      - PGID=1000
      - DASHBOARD_POLLING=60s
      - RESULTS_POLLING=60s
    volumes:
      - './data/config:/config'
      - './data/web:/etc/ssl/web'
    restart: unless-stopped

@SkoricIT
Copy link

@SkoricIT do you mind also testing with polling disabled? I updated the docs with a new Environment Variables page to help you out

I set both options to false, which seems to have reduced a lot of requests but it's somehow still polling:

Peek.2023-06-10.15-44.mp4

See if you can tell when i change the tab in the browser. 🙂

I would be happy without any polling and optionally a simple JS refresh trigger or something. 🙂

This is my config:
grafik

grafik

@SkoricIT
Copy link

BTW this is the CPU usage when i trigger a manual speed test, then quickly swap tab:

grafik

@alexjustesen
Copy link
Owner

@SkoricIT and @luanupe have you both set or set the correct values for PUID and PGID? I'm working on slowing down the polling to reduce usage for the fpm pool but I've also seen issues if you're not running the image under the right ids.

@luanupe
Copy link

luanupe commented Jun 10, 2023

@SkoricIT and @luanupe have you both set or set the correct values for PUID and PGID? I'm working on slowing down the polling to reduce usage for the fpm pool but I've also seen issues if you're not running the image under the right ids.

Hmmm... now that you said Alex it makes sense, I set the PUID and PGID as 1000 following the instructions at https://docs.speedtest-tracker.dev/getting-started/environment-variables but the docker container is running on a custom user/group on the host machine.

I'll fix it up and let you guys know.

@luanupe
Copy link

luanupe commented Jun 10, 2023

Hmm, just checked Alex and ignore my last comment, the ids are correct.

docker-user@speedtest:~/app$ id -u docker-user && id -g docker-user
1000
1000

@SkoricIT
Copy link

i forgot to include basic.env:

grafik

grafik

@alexjustesen
Copy link
Owner

i forgot to include basic.env:

grafik

grafik

Thanks, trying to rule out the low hanging fruit. I'll have changes in the next release that polling with default to 60s intervals which should cut down on usage.

@alexjustesen
Copy link
Owner

@coanghel #676 is going to have some process improvements I'd be interested to know if it's going to reduce process usage.

@SkoricIT
Copy link

@alexjustesen Maybe we can have an option to just completely disable auto-update (have static pages) and only refresh on button press or reload?

@alexjustesen
Copy link
Owner

alexjustesen commented Aug 19, 2023

@alexjustesen Maybe we can have an option to just completely disable auto-update (have static pages) and only refresh on button press or reload?

You can disable polling through the env vars btw: https://docs.speedtest-tracker.dev/getting-started/environment-variables

@SkoricIT
Copy link

@alexjustesen Maybe we can have an option to just completely disable auto-update (have static pages) and only refresh on button press or reload?

You can disable polling through the env vars btw: https://docs.speedtest-tracker.dev/getting-started/environment-variables

Have disabled this a while ago.

    environment:
      - DASHBOARD_POLLING=false
      - RESULTS_POLLING=false

@SkoricIT
Copy link

@alexjustesen Even though i have above in my environment this still happens:

Peek.2023-08-25.16-22.mp4

@alexjustesen
Copy link
Owner

@SkoricIT one contributing factor could be that I left debug as the log level which could be filling up the log stream. This was changed in v0.11.18 so please let me know if this helps.

@SkoricIT
Copy link

SkoricIT commented Aug 31, 2023

nvm the issue is open in #701

@mooglestiltzkin
Copy link

i also noticed high constant cpu usage compared to all my other containers.

all this app does is schedule to run once a day during off hours to check the internet speed, and keep a log for it. if you want u can even be sent an alert.

but why is the cpu constantly high? hopefully there is a fix :(

@imcdona

This comment has been minimized.

@SkoricIT
Copy link

@imcdona Can you please post proof of your allegations? Thanks!

@luanupe
Copy link

luanupe commented Nov 25, 2023

@imcdona Bro, do you have any proof of your claim? See, you can inspect any docker image (even the pre-built ones) on the dockerhub website... also dockerhub scans the images for malwares and vulnerabilities...
Sorry but I don't think a bitcoin miner is installed on this application.

@alexjustesen
Copy link
Owner

The reason for the CPU usage is because this project contains malware to mine crypto. I've reached out to the dev personally and haven't gotten a response.

I haven't tried compling the image myself so it could be that only the pre-built docker images are affected.

You did? If you did definitely didn't get anything so make sure you follow https://github.com/alexjustesen/speedtest-tracker/security/policy to report issues.

I take these allegations pretty seriously so proof is required and generally speaking it's best to reach out to the dev and allow them to fix any security issues before posting about it.

@imcdona

This comment was marked as duplicate.

@alexjustesen
Copy link
Owner

alexjustesen commented Dec 17, 2023

This message was created automatically by mail delivery software.
A message that you sent could not be delivered to one or more of its recipients. This is a permanent error.

[sec@alexjustesen.com](mailto:sec@alexjustesen.com), ERROR CODE :550 - 5.1.1 Address does not exist. UrYgPgKuAu1t

Please send this again, I've had no other reports of malware in the image or the image Speedtest Tracker is based off of. Additionally please let me know what your build process was in that email so I can attempt to re-produce it.

Edit: comment was hidden so that it can be researched per the sec guidelines.

@imcdona
Copy link

imcdona commented Dec 18, 2023

Why did you mark the comment where I explained the issue along with evidence that it's infected with perctl as a "duplicate" (so it's hidden and nobody can see the evidence I provided) and then request I send you an email with the exact same details I provided in the comment you hid from everyone?

Why do you feel the need to hide the fact it's infected with perfct? What does that have to do with "sec guidelines"?

@alexjustesen
Copy link
Owner

Why did you mark the comment where I explained the issue along with evidence that it's infected with perctl as a "duplicate" (so it's hidden and nobody can see the evidence I provided) and then request I send you an email with the exact same details I provided in the comment you hid from everyone?

Why do you feel the need to hide the fact it's infected with perfct? What does that have to do with "sec guidelines"?

I have a procedure for security related issues that needs to be followed to ensure proper research can be conducted and for the safety of other users.

  1. The reporter or researcher needs to notify me via sec@alexjustesen.com
  2. That email should contain your findings from above and a detailed process for reproducing the infected image or your process for downloading the infected image.
  3. It also needs to have valid contact information from the reporter, because well I just don't trust anyone on the Internet and there will likely be follow up questions.

As far as communication timeline goes:

  • I'll acknowledge a reported issue within 48h to the reporter via email.
  • If unverified, I'll respond to the reporter with findings and close the internally tracked issue.
  • If verified, I'll issue a patch to fix the problem and write a postmortem detailing the issue within 60 days which will be posted in the discussions under Announcements.

I'm not hiding anything about the issue or the code, it's open source. If you have concerns about code being hidden you can see the entire history of the application under releases.

This will be the last ask to please send an email with the information in detail above so that research can be conducted.

FAQ

  • Q: Why 60 days, that feels like a lot? Simple, I'm a one man team and research takes time. This application is built off of other open source tools and collaboration with those owners might be needed.
  • Q: Besides the security file, is your policy posted anywhere else? It was on my website but looks like when I simplified that it's no longer visible so I'll get that fixed.

I'm happy to have follow up discussions (post them in q&a under discussions) on general security practices but let's keep the discussion on this issue to the constant reads/writes which was a result of polling on the dashboard when kept open.

@imcdona
Copy link

imcdona commented Dec 18, 2023

If perctl isn't in the source code then it's being added during the image build process. Out of an abundance of caution, why not remove the image and suggest users build the image from source?

The safety of users is paramount, right?

@alexjustesen
Copy link
Owner

@coanghel high resource usage should be a non-issue while using the new LSIO image. If someone on this thread can verify what I'm seeing I'd like to close this out.

@alexjustesen alexjustesen removed this from the v0.x.0 (task scheduler) milestone Feb 21, 2024
@SkoricIT
Copy link

@alexjustesen I will check it out with a fresh instance.

@SkoricIT
Copy link

SkoricIT commented Feb 22, 2024

@alexjustesen Seems fixed for me, even with enabled polling and everything it stays at about 0.1 - 0.34 percent CPU. On leading dash it jumps to 65 percent for a second. I think it looks very good now. Thanks!

@alexjustesen
Copy link
Owner

Thanks for confirming I think we can close this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants