Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing and high Server Load Avearge #4282

Closed
4 tasks done
nicfab opened this issue Dec 16, 2023 · 15 comments
Closed
4 tasks done

Increasing and high Server Load Avearge #4282

nicfab opened this issue Dec 16, 2023 · 15 comments
Labels
bug Something isn't working

Comments

@nicfab
Copy link

nicfab commented Dec 16, 2023

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

After upgrading my Lemmy instance to version 0.19.0 I noted a fast increase in load average on the server that remains high (1.30).

Steps to Reproduce

  1. Run docker compose down
  2. Upgrading to new version 0.19.0 by modifying the docker-compose.yml file
  3. Run docker compose up -d

Technical Details

The server OS is Ubuntu 22.04.3 LTS.
Some files:

  1. Logs - lemmy_log.txt
  2. lemmy.hjson (in txt format)- lemmy.hjson.txt
  3. docker-compose.yml (in txt format) - docker-compose.yml.txt

Version

0.19.0

Lemmy Instance URL

https://community.nicfab.it

@nicfab nicfab added the bug Something isn't working label Dec 16, 2023
@arifwn
Copy link

arifwn commented Dec 16, 2023

I also noticed increased database load after the upgrade. Is this related to the new persistent federation queue?

Edit: setting max_connections to 50 seems to limit the database memory usage on my small instance with 4.5GB ram to a manageable level, though there is a lot of FATAL: sorry, too many clients already on postgres log now. I wonder if there is a proper way to limit the queue from lemmy server side.

@Demigodrick
Copy link

Just want to +1 this - I've seen on average a doubling of the server load metric since the update was applied.

@axeleroy
Copy link

I have been able to reduce the database load by setting database.pool_size in lemmy.hjson. I still have to tweak its value to get a good balance between Lemmy performance and relatively low database load.

@Nutomic
Copy link
Member

Nutomic commented Dec 18, 2023

Yes this is most likely because of the new federation queue. Previously, outgoing activities would be handled entirely in memory in the Lemmy process, but now they get written to the db and then read again. #4285 should help by batching these db queries.

@arifwn
Copy link

arifwn commented Dec 18, 2023

Thanks! Adding database.pool_size in lemmy.hjson works better than limiting the max connections on postgres.

@phiresky
Copy link
Collaborator

Please follow these steps to get info about database performance:

  1. enable pg_stat_statements and auto_explain by making sure these lines exists in the postgresql config (customPostgresql.conf):

    shared_preload_libraries=pg_stat_statements,auto_explain
    pg_stat_statements.track = all
    auto_explain.log_min_duration=5000ms
    
  2. open a psql repl by running docker compose exec -it db -u postgres psql -d lemmy and reset the stats by calling create extension pg_stat_statements; select pg_stat_statements_reset()

  3. wait an hour

  4. post the outputs of
    docker compose exec -it db -u postgres -qtAX -d lemmy -c 'select json_agg(a) from (select * from pg_stat_statements order by total_exec_time desc limit 10) a;' > total_exec_time.json

    and

    docker compose exec -it db -u postgres -qtAX -d lemmy -c 'select json_agg(a) from (select * from pg_stat_statements order by mean_exec_time desc limit 10) a;' > mean_exec_time.json

@phiresky
Copy link
Collaborator

Also, in general a higher server load floor on 0.19 is expected and not really an issue. The floor of server use is higher (esp. for small instances) but it scales better to higher federation loads.

@axeleroy
Copy link

axeleroy commented Dec 18, 2023

My issue isn't particularly the increased CPU load but rather the increased IO load which is tanking performance on other services I host, as well as the increased memory usage due to increased PostgreSQL activity (which filled my host's swap until I set a limit to the connection pool size)

I hope #4285 will resolve my issues.

@phiresky
Copy link
Collaborator

It won't.. Do you have synchronous_commit=off set?

@axeleroy
Copy link

axeleroy commented Dec 18, 2023

I don't think so, my postgres command is

[
  "postgres",
  "-c",
  "session_preload_libraries=auto_explain",
  "-c",
  "auto_explain.log_min_duration=5ms",
  "-c",
  "auto_explain.log_analyze=true",
  "-c",
  "track_activity_query_size=1048576",
]

I'll try that though

@arifwn
Copy link

arifwn commented Dec 18, 2023

Postgres memory usage is down again after setting database.pool_size to 30 in lemmy.hjson. The default value (95?) seems to be too high for my small VPS with 4.5GB RAM.

@nicfab
Copy link
Author

nicfab commented Dec 18, 2023

I am following your comments, and I'll be sure to wait for a solution.
In the meantime, I had to stop my Lemmy docker and put offline my instance.

@linux-cultist
Copy link

linux-cultist commented Jan 3, 2024

Postgres memory usage is down again after setting database.pool_size to 30 in lemmy.hjson. The default value (95?) seems to be too high for my small VPS with 4.5GB RAM.

@arifwn You can control postgres with the customPostgresql.conf file and put settings into it tuned by your hardware:

pgtune.leopard.in.ua

If you tell postgres to use 3 GB and 1 cpu (for example), it wont use all your resources. It may use memory if its not used by something else, and then release it the second something else needs it. Thats normal.

That being said, I also reduced the pool size to 30 but didnt really notice a difference. The postgres settings made the major difference.

@phiresky
Copy link
Collaborator

phiresky commented Jan 4, 2024

I'll close this since it seems like the same thing as #4334 and that one has more detail (I don't see any info here that's not present there) (?)

@phiresky phiresky closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2024
@arifwn
Copy link

arifwn commented Jan 4, 2024

If you tell postgres to use 3 GB and 1 cpu (for example), it wont use all your resources. It may use memory if its not used by something else, and then release it the second something else needs it. Thats normal.

That being said, I also reduced the pool size to 30 but didnt really notice a difference. The postgres settings made the major difference.

@linux-cultist My postgres config: https://gist.github.com/arifwn/1c86fe79708dfe3bd43ecabaafc73320

The VPS has 4.5 GB of RAM and postgres is configured to use 2 GB (or did I configure it wrong?). Unless I configured lemmy to database.pool_size to 30 (instead of leaving it to use default value, which was 95 back then in 0.19.0), after two days either lemmy or posgres got oom-killed because the ram was exhausted. I haven't tried again in 0.19.1.

@LemmyNet LemmyNet locked as resolved and limited conversation to collaborators Jan 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants