Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPE-3706 Fix max_connections calculation and expose experimental max_connections field #429

Merged
merged 10 commits into from
Jun 7, 2024

Conversation

paulomach
Copy link
Contributor

@paulomach paulomach commented Mar 28, 2024

Issue

Max connections calculations is not considering already allocated memory for innodb_buffer_pool_size, resulting in overcommit (and OOM kill risk)

Solution

  • Calculate max_connections based on the available memory after allocating innodb_buffer_pool_size.
  • Expose experimental experimental-max-connections configuration allowing independent tuning (as agreement with PM in Madrid)

Fixes #407

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.56%. Comparing base (e7c1809) to head (ae74a89).
Report is 17 commits behind head on main.

Current head ae74a89 differs from pull request most recent head c443bd3

Please upload reports for the commit c443bd3 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #429      +/-   ##
==========================================
+ Coverage   66.25%   66.56%   +0.30%     
==========================================
  Files          17       17              
  Lines        3180     3170      -10     
  Branches      424      419       -5     
==========================================
+ Hits         2107     2110       +3     
+ Misses        935      928       -7     
+ Partials      138      132       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# constrain max_connections based on the available memory
# after innodb_buffer_pool_size calculation
remaining_memory = available_memory - innodb_buffer_pool_size
max_connections = max(self.get_max_connections(remaining_memory), MIN_MAX_CONNECTIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see the logic behind this, but are we sure we want to limit max_connections based on the remaining memory after allocation about 75% to 50% to innodb_buffer_pool_size? would it bite us if the max_connections is severly limited compared to client expectations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and we probably can over commit here. But I'll defer that to @taurus-forever and @delgod

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially this is (at least) a 3-parameter problem to me. The proposed logic is better than the current one since it can avoid overcommitting memory to 175%. However, keeping limiting user-configurable parameters to 1 doesn't solve the fundamental challenges.

mysql_major_parameters

At this moment, we don't allow configuration other than the parameter "A" in the diagram. I assume that the mysql.py logic is shared between the machine charm and the k8s charms so the parameter "A" is determined either by the total amount of physical memory, memory_limit charm config, or pod resource limit. But no other parameters about the memory allocation can be supplied by an user.

When an user needs to bump the max_connections we will have to bump the memory requirement 4 times unnecessarily since there is no way to tell the charm to change the ratio of the memory allocation (75% innodb pool size + 25% max_connections).

For example with MySQL use cases as the backend of OpenStack deployment, we know that it requires a few thousand max connections to cover the distributed (and micro service-ish) control plane services and the multiple workers of each micro service, but 8GB of innodb buffer pool size is more than enough.

Let's say to have max_connections=4,000, 64 GB of the instance is more than enough (4,000 connections * 12MB + 8GB innodb buffer pool size = 56GB). However, with the fixed ratio, we need 200GB machine (4,000 connections * 12MB * (75+25)% / 25% = 192GB).

I cannot think of a better way than moving away from a single parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: @nobuto-m is correct.

My proposal was a custom profile tuning which could be extended to custom-mem-42% + custom-connections-100500 to fine tune the corner case scenarios.

Anyway, by default all pre-allocated memory like innodb_buffer_pool_size (and group_replication_message_cache_size?) should be excluded from available memory "reserved" for client connections. So this PR still necessary to avoid OOM/crashes.

CC: @7annaba3l , @delgod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobuto-m is correct.

but before decidion made on independent config options, we need to fix this OOM

# constrain max_connections based on the available memory
# after innodb_buffer_pool_size calculation
remaining_memory = available_memory - innodb_buffer_pool_size
max_connections = max(self.get_max_connections(remaining_memory), MIN_MAX_CONNECTIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: @nobuto-m is correct.

My proposal was a custom profile tuning which could be extended to custom-mem-42% + custom-connections-100500 to fine tune the corner case scenarios.

Anyway, by default all pre-allocated memory like innodb_buffer_pool_size (and group_replication_message_cache_size?) should be excluded from available memory "reserved" for client connections. So this PR still necessary to avoid OOM/crashes.

CC: @7annaba3l , @delgod

@nobuto-m
Copy link

Hi all, after discussing about this here and internally, I think I should request this patch to be on hold for the time being.

The fundamental challenge is not immediately solved at this point, and the proposed patch is good as is to prevent the OOM from happening. However, it will limit the available max_connections to 25% from the current state and that may make it harder to run workload with a machine with a fair amount of memory. Until we fix the original challenge, I think it might be better to leave it as is since 1. OOM doesn't happen immediately after the service start and 2. MySQL dynamically allocates memory for actually data in innodb buffer pool size and for incoming connections. So we can keep this optimistic way of memory allocation for the time being.

In the meantime, I appreciate @paulomach for taking the initiative on this matter so far.

Copy link
Member

@delgod delgod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't merge without my explicit approval

@paulomach paulomach changed the title DPE-3706 fix max_connections calculation DPE-3706 Fix max_connections calculation and expose experimental max_connections field May 22, 2024
lib/charms/mysql/v0/mysql.py Outdated Show resolved Hide resolved
@taurus-forever
Copy link
Contributor

please don't merge without my explicit approval

@delgod is it still valid blocker?

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the game 100 vs 151 min connection... and agree to shayan aout the variable name, but I do not have better proposal right now. the rest LGTM.

@@ -30,3 +30,11 @@ options:
mysql-interface-database:
description: "The database name for the legacy 'mysql' relation"
type: "string"
# Experimental features
experimental-max-connections:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to challenge the config option name. I totally understand that it's experimental and it could be removed at any point so clarifying it in the config description is fine.

However, adding the prefix experimental- to the config option itself is a little bit awkward. Because the max-connections in MySQL itself is a quite known config and it's not experimental or anything like that. The idea behind the config naming is to indicate that "this option could disappear anytime once the charm gets capable to make the best decision" so I prefer a different name something like max-connections-override or something like that to indicate "while charm will try to make the best decision to configure max connections automatically, you can still override the value explicitly".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobuto-m, Thank you for your feedback!
We have decided to use the experimental- prefix on a Product level to make it consistent across multiple solutions. This decision affects not only max-connections but also a few other things where override doesn't work.
Please provide product feedback or discuss directly with decision-makers.

@paulomach paulomach merged commit 547aa59 into main Jun 7, 2024
44 of 45 checks passed
@paulomach paulomach deleted the fix/dpe-3706-wrong-max-conn-calc branch June 7, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

innodb_buffer_pool_size and max_connections are trying to use ~175% of the physical memory?
5 participants