Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding random 10 second for every hour in the maxlifetime. This woul… #480

Merged
merged 4 commits into from
Nov 2, 2015

Conversation

mmuruganandam
Copy link
Contributor

This is to help close the issue on the maxlifetime. For every hour, it will add the random 10 second interval. Please review and pull if you agree.

Also if you agree, please do the same change on the JDK 1.7 version as well, please. Thank you!

… help manage the closing of connection in a span of seconds instead of all at once. issue brettwooldridge#256
long randomIntervalInMillis = inHours * 10_000;

// Support the up to 10 seconds spread out when the lifetime is less than 1 hour
if (inHours == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the conditional and use:

long randomIntervalInMillis = Math.max(10_000, inHours * 10_000);

Make the locals final.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as maxLifetime can not be less than 30sec, better change would be just:
final long variance = Math.min( ThreadLocalRandom.current().nextLong( maxLifetime / 1000 ), maxPoolSize * 1000);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still not quite right. If maxLifetime is 30 seconds, then maxLifetime / 1000 is 30ms. That's not enough variance.

Conversely, if the maxLifetime is 30 minutes, and the maxPoolSize is 10, the variance is only 10 seconds.

Here is my proposal:

long randomIntervalInMillis = Math.max(10_000, maxLifetime / 20);

Division by 20 is equal to 5% of the maxLifetime. This says, we want connections to reach their max. lifetime within 5% variance of the specified value. Some common cases:

  3 minutes = 15 seconds variance
  5 minutes = 30 seconds variance
 30 minutes = 90 seconds variance
      1 day = 71 minutes variance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...reducing 90 sec from 30 mins of connections life (which is default) is not good :)
it should be calculated and 'restricted' with number of maximum connections in pool... eg:
default max connections are 10 and if user sets their age to 1 day, then
... 'separating' age of 10 connections by 71 mins apart is too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides this, applying variance should not be conditional (less than hour or 10_000).
'separating' age (even by few hundred millis) is good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for:
final long variance = Math.min( ThreadLocalRandom.current().nextLong( maxLifetime / 100 ), maxPoolSize * 7200);
it would be:

age(min)    variance(sec)
1           0.6
5           3
10          6
30<---def   18
60          36
120         72<---max

IMO max 72 seconds separation in age would be enough for db / cput to breath easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitincchauhan and @brettwooldridge
Thanks for the comments. 72 seconds in your example is the max that it can go to and it is a random variance.

I kind of like @brettwooldridge approach on this to spread this out as much as possible to avoid any system impact. 71 minutes for a day is definitely not bad and we are keeping this spread even during the day to avoid issues.

@brettwooldridge Quick question, let us say, two connections reached the max lifetime and what is the behavior in that time, is the getConnection is going to wait to get those two connections established into the pool? If that is the case, we will still have the problem if the listeners are taking long time to give out the connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little modification to my version:
final long variance = ThreadLocalRandom.current().nextLong( Math.min( maxLifetime / 100, maxPoolSize * 7200));

with this, even smaller age ALSO get reasonable proportional spread.
@mmuruganandam IMHO variance of more than 2 minutes would be bad. DB / CPU would be happy serving even 100 connections with 2 mins of age difference.

@brettwooldridge
Copy link
Owner

@nitincchauhan and @brettwooldridge
Thanks for the comments. 72 seconds in your example is the max that it can go to and it is a random variance.

I kind of like @brettwooldridge approach on this to spread this out as much as possible to avoid any system impact. 71 minutes for a day is definitely not bad and we are keeping this spread even during the day to avoid issues.

@brettwooldridge Quick question, let us say, two connections reached the max lifetime and what is the behavior in that time, is the getConnection is going to wait to get those two connections established into the pool? If that is the case, we will still have the problem if the listeners are taking long time to give out the connections.


little modification to my version:
final long variance = ThreadLocalRandom.current().nextLong( Math.min( maxLifetime / 100, maxPoolSize * 7200));

with this, even smaller age ALSO get reasonable proportional spread.
@mmuruganandam IMHO variance of more than 2 minutes would be bad. DB / CPU would be happy serving even 100 connections with 2 mins of age difference.

Here are my thoughts. The default maxLifetime in HikariCP is 30 minutes. The default connection timeout of MySQL is 8 hours. A maxLifetime of 1 day seems excessive to me. The goal is to spread out new connection acquisitions to avoid connection storms against the database. Having said that, there are two common cases I would like to cover.

👉 First scenario, the pool is reasonably active with queries running every second or every few seconds. In this case, a spread of a few seconds between expirations is ideal. But, for example, a 10 second variance and a pool of 10 connections would result in a best case of 1 connection close/creation per second.

If the maxLifetime is set very low this may be unavoidable, but at least for the default 30 seconds and higher, the goal should be to spread of at least a few seconds. With a pool size of 50, a spread of 3-5 seconds with maxLifetime of 30 minutes would be ideal.

👉 Second scenario, the pool is very quiet with burst activity every minute or so. This is the harder case. For example, if 5 connections expire within a 30 second interval, and there is no other pool activity, when the housekeeper thread runs it will need to queue up 5 connection creation tasks.

Ideally, when the maxLifetime is at 30 minutes, the creation of 2 or three connections max. by the housekeeper would be desirable. This implies a variance the spans multiple housekeeping runs (30 seconds each).

@mmuruganandam
Copy link
Contributor Author

@brettwooldridge Based on your answer, we can determine what could be the best approach.

Can you please answer the below question?
Quick question, let us say, two connections reached the max lifetime and what is the behavior in that time, is the getConnection is going to wait to get those two connections established into the pool? If that is the case, we will still have the problem if the listeners are taking long time to give out the connections.

@brettwooldridge
Copy link
Owner

@mmuruganandam getConnection() will not wait for those two connections to be established. First, if there is an available connection in the pool, it won't wait at all. But if there is no connection available, the caller will naturally have to wait for the first of the two connections to be established, but then return immediately.

UPDATE: also, the housekeeping thread that runs every 30 seconds will re-establish those connections if a getConnection() call doesn't come in first. So when a caller to getConnection() does come it, there will be a connection ready and waiting.

@brettwooldridge
Copy link
Owner

Lastly, a note of variance in general. Imagine a maxLifetime of 1 hour...

In the first hour that the pool is running, the expirations looks like this:

[______________________________________________________:::::]
0         10        20        30        40        50        60

But because of variation, the second hour starts to look like this:

[___________________________________________________:_:__:::]
0         10        20        30        40        50        60

And after 24 hours, evolves to this:

[_________:___________:__:________________:____________:____]
0         10        20        30        40        50        60

So the variation doesn't need to be huge, but it needs to be big enough to create an even distribution over the course of a few hours or tens of hours.

@brettwooldridge
Copy link
Owner

@mmuruganandam Let's go with this:

final long randomIntervalInMillis = Math.max(10_000, maxLifetime / 40);

This ensures that connections live to at least 97.5% of their lifetimes. At 30 minutes maxLifetime it provides a 45 second variance, and at 1 day provides 36 minutes variance.

Keep in mind, as noted above, after several "generations" of connections they become spread out evenly (statistically) over a 24 hour period, which is the goal.

@nitincchauhan
Copy link
Contributor

@brettwooldridge @mmuruganandam

IMHO, Max reduction of 45 sec from age of 30min is not fair.

More reasonable would be to increase variance in small steps
( 1% of age, in range of min 0.5, max 5 - 100 sec ) as follows:

final long variance = ThreadLocalRandom.current()
.nextLong( 500, Math.max( 5000, Math.min( 100000, maxLifetime * 0.01 ) ) );

will give:

age(min)     max variance(sec)
0.5<--min      5
  1            5
 10            6
 30<--def     18
 60           36
120           72
130           78
140           84
150           90
160           96
170          100
480<--mysql  100

EDIT: using nextLong( min, max ) variant to return min 500ms
EDIT2: using 1% of age and updated states

@mmuruganandam
Copy link
Contributor Author

Hi,

final long randomIntervalInMillis = Math.max(10_000, maxLifetime / 40);

Looks better for me because of multiple pools connecting to the database. For example, we have over 100 instances serving multi-million transactions on an daily basis connecting to the same database. The variance has to take into consideration of other/same applications connecting to the same database as well. Limiting them under 2 minutes might not help given the scale of things that happens behind the scene.

Lastly, we may provide a configuration where it can be limited to a percentage, so that users can control what the variance could be. IMHO, that might be a little overkill for this feature. But I believe that the above mentioned solution might solve and spread things better considering lot of other connectivity that comes to same database.

@nitincchauhan
Copy link
Contributor

@mmuruganandam my suggestion is for mass,
for edge case like yours that requires huge variance, brett can consider adding option.
EDIT: IMO 2mins is good enough to serve few hundred connections for hardware and popular databases of today, even in your case. For scaling more than this you may have to look at tuning other parts, network and db listeners etc.

@mmuruganandam
Copy link
Contributor Author

@nitincchauhan I am not sure what you mean by mass. Ours is one of the top line enterprise version of the database and I have seen this in few of my previous projects too.

@brettwooldridge Please let me know how you like to proceed with this change?

@nitincchauhan
Copy link
Contributor

@mmuruganandam mass here means other users of hikaricp

@brettwooldridge
Copy link
Owner

My decision is, miaximum of 10 seconds or maxLifetime / 40.

@mmuruganandam
Copy link
Contributor Author

@nitincchauhan :).

@brettwooldridge Thanks for the final call on this. I have made the change and already committed. Please do a merge at your convenience. Thank you!

@brettwooldridge and @nitincchauhan Thanks for taking time to make a good discussion and made the final call on this.

@nitincchauhan
Copy link
Contributor

@mmuruganandam you missed to randomize as before.

@mmuruganandam
Copy link
Contributor Author

@nitincchauhan thanks for catching that. With randomization is now checked in.

@nitincchauhan
Copy link
Contributor

@brettwooldridge variance should grow proportionally in small steps.
if I remove restriction of 100 seconds, then calculating variance as:
(min 0.5sec, max 5sec - 1%of age) as follows would be better for all:

final long variance = ThreadLocalRandom.current()
.nextLong( 500, Math.max( 5000, maxLifetime * 0.01 ) );

@nitincchauhan
Copy link
Contributor

will give:

age(min)     max variance(sec)
0.5<--min      5
  1            5
 10            6
 30<--def     18
 60           36
120           72
130           78
140           84
150           90
160           96
170          102
480<--mysql  288

@nitincchauhan
Copy link
Contributor

@mmuruganandam @brettwooldridge
bug in change
earlier it was random 10000 if > 60000
now changed to random 10000 if > 10000
If connection age is near 10000 (probably some tests), It is randomly likely that connection age would be reduced to few millis

@mmuruganandam
Copy link
Contributor Author

@nitincchauhan You have the minimum connection lifetime of 30 seconds which is validated in the HikariConfig. This check is to avoid the unit tests which can't go below or don't add a variance when the total time is below 10 seconds. This also kind of take care of the lifetime as last level backup.

@brettwooldridge
Copy link
Owner

@mmuruganandam I accept that reasoning. In practice, except for unit tests, maxLifetime can never be below 60000ms.

brettwooldridge added a commit that referenced this pull request Nov 2, 2015
Adding random 10 second for every hour in the maxlifetime.  This woul…
@brettwooldridge brettwooldridge merged commit be790b9 into brettwooldridge:master Nov 2, 2015
@mmuruganandam mmuruganandam deleted the master branch November 2, 2015 14:38
@brettwooldridge
Copy link
Owner

Post-merge comment. The key for me is that at the default 30 minute maxLifetime, the variance should span some connections past one housekeeping run interval. The 2.5% variance provides 45 seconds (maximum) variance, which would statistically spread 33% of the connections past one housekeeping interval.

However, this reasoning is only applicable to the first or second "generation" of connections. As noted above, over time, with any variance at all, connection lifetimes will eventually spread over a much wider range -- while still adhering to individual lifetimes of 97.5-100% of configured maxLifetime.

@nitincchauhan
Copy link
Contributor

@brettwooldridge I know the intention.
housekeeper run is constant (+/- few millis) while reduction in age is 'hugely' random now.
it is flawed to think that 2.5% would affect more than 1% reduction on creating connections in housekeeper run vs getConnection.
Probably tuning housekeeper run time would impact more than this.
My opinion is to reduce in small steps by 1% and for all (even for < 10000)

Edit: for default 30 min age, from 10sec to 45sec is surely going to be disturbing change for some than 18sec (1%)
Edit2: reduction (more than 4 times) in age is more sensitive: primary reason for using pool is cache (expensive creation of) connection, not concurrency.

@brettwooldridge
Copy link
Owner

@mmuruganandam HikariCP 2.4.2 has been released, containing this contribution. Thank you.

@mmuruganandam
Copy link
Contributor Author

Great. I will get our libraries updated. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants