Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4200] prevent reconnect floods to Postgres #4232

Merged
merged 4 commits into from
Aug 14, 2019
Merged

[#4200] prevent reconnect floods to Postgres #4232

merged 4 commits into from
Aug 14, 2019

Conversation

ebilling
Copy link

@ebilling ebilling commented Aug 7, 2019

After many profiles, it became clear that at high transaction levels the system was spending substantial amounts of it's time reaping and re-establishing TLS connections in order to grab locks or run queries against Postgres.

This change resulted in a 30% reduction in CPU for our web processes. The dominate calls now are around marshaling and un-marshaling JSON.

Signed-off-by: Eric Billingsley eric@calculi.com

…loods on TLS

Signed-off-by: Eric Billingsley <eric@calculi.com>
Copy link
Member

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! left some feedback/questions inline.

atc/atccmd/command.go Outdated Show resolved Hide resolved
@@ -1172,7 +1172,8 @@ func (cmd *RunCommand) constructDBConn(
}

// Prepare
dbConn.SetMaxOpenConns(maxConn)
dbConn.SetMaxOpenConns(2 * maxIdleConns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should bump this outright since folks may have tuned their Postgres server-side connection limit accordingly. Doubling the effective pool size on all the web nodes could may be surprising especially with multiple web nodes. Could we just do SetMaxIdleConns(maxConn / 2) instead?

Also, FYI we might end up re-doing our connection pooling; recently we've discussed giving critical things like GC their own connection pool (#4025) and that may turn into having per-backend-component pools with sizes based on the component's priority/usage, instead of having one big pool for everything. For if/when we get to that, what's the main learning here so we don't repeat the original mistake? Is it just to have a bit more breathing room with idle connections?

Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't double the pool size, it doubles the max size. If the system needs fewer connections it never gets to the limit. The limit should be on the idle connections, not the open.

For #4025, what's the benefit of creating separate pools? Segmentation of connections creates more opportunities for bottlenecking rather than allowing for free flow.

My thought was that the "right" thing to do is to only set the max Idle and let the max open go to a very high limit. That allows the system to scale up and flex, but keeps a minimum amount of connections open so you don't get stuck in open, use, reap, repeat flows. That's what killed performance on TLS. If you aren't using encryption to the DB, then this isn't a big deal. The setup time is low. But with TLS, you spend many milliseconds going back and forth, checking certificate chains, etc. It's more expensive than the request, so you really need to err on the side of re-using that resource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to prevent work from taking up the entire shared connection pool and starving other mission-critical components like garbage collection because the pool has hit MaxOpenConns. This can cause GC to not run, which can cause containers to pile up on the workers and then everything starts failing.

Our thinking - and this is somewhat backed by what we've observed in our test environments - is that each internal component has some ballpark number of connections it needs in order to do its job.

GC, for example, only needs around 2 when run in serial, and around 10 when run in parallel (assuming #4152).

The pipeline scheduler, on the other hand, needs a lot more than that depending on how many jobs there are to schedule. It's fine for scheduling work to "steal" connections from other scheduling work - things will just be scheduled more slowly - but it's not OK for scheduling work to "steal" from GC.

Also keep in mind most Postgres servers have a default limit of 100 open connections. It's not enough to just set the max idle pool and let everyone grab as many connections as they want. Someone will be starved eventually, whether it's client-side or server-side. So the connection pool has to be rationed somewhat carefully.

It's a tricky problem, though; it's fundamentally difficult to coordinate these limits on both sides, and there are probably other deeper ways to improve this situation. We just want sane defaults for now so things can't go off the rails. So, our idea was to just figure out what these reasonable connection counts (or maybe just 'ratios') are for each component and tune the defaults accordingly.

Does that make sense? 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since you have multiple webs the connection limits you impose on any one of them doesn't prevent you from hitting the 100 total. In any case, you'll need to increase the number of connections on the DB to support a large number of calls. I think we are bumping into an inherit shortcoming of the connection pooling system. The ability to get a connection to the next request and bypass the idle queue is the real underlying problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right about that, but it's still better to have some limit rather than no limit, isn't it? Without any limits you have no hope of preventing GC from being starved, since you'll hit the server-side limit at some point just because you received a lot of API requests or had a spike in pipeline scheduling. We have to do what we can to prevent that from turning into a cascading failure.

If I have N ATCs and they have a theoretical max pool usage of 64 connections (as it is today), I need to configure my Postgres to have a minimum of N * 64 connection pool. We just set ours to 200 and call it a day. That affords 3 web nodes which is a pretty decent number for large-scale environments.

Also bear in mind that these pool limits were added because people kept hitting the server-side limit. We did a very coarse-grain bucketing of the limit by giving the API 32 and the "backend" 32. We just want to be a bit smarter about how that's divided up by default, since 'backend' isn't fine-grained enough to prevent GC from getting wedged by other backend work.

I wonder how other systems enforce this kind of thing. In my ideal world it would all be enforced by the server so we don't have multiple 'sources of truth' and it doesn't matter how many web nodes you have. In a cursory search it looks like you can have per-user connection limits, so maybe you could have a separate user for critical things, but that's a lot to configure. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly why I took the position of maxOpen = 2 x maxIdle. The thing we really want to configure is maxIdle as GC is a bursty problem. It comes in with a lot of requests all at once and if there aren't enough sitting on the Idle queue, then it just starts making new ones.

Here is my proposal:

  1. Create one thread pool with a MaxIdleConnections queue size that is configurable, defaulted to 20. We then set MaxOpenConnections to be a separately configurable parameter that defaults to 2x MaxIdleConnections. (alternatively we could set MaxIdleConnections to be 1/2 MaxOpenConnections), but the behavior we really want is to allow bursting and backoff, IMHO.

  2. Locking should rely on the mutex, not the connection, so it should use the same connection pool. After all, as soon as you have 2 webs, the connection is not your savior.

This gives us the flexibility of configuration for specific environments and puts defaults in place that are reasonable. Sound like a plan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor clarification: GC isn't bursty yet - right now it uses at most one or two connections. I'll be more bursty with #4152 (~10 connections) but not nearly as bursty as pipeline scheduling (N connections; at least 1 per pipeline). This difference in expected connection usage is why we want to separate the scheduling pool from the GC pool and give them reasonable sizes so they're not competing for resources.

1. Create one thread pool with a MaxIdleConnections queue size that is configurable, defaulted to 20. We then set MaxOpenConnections to be a separately configurable parameter that defaults to 2x MaxIdleConnections. (alternatively we could set MaxIdleConnections to be 1/2 MaxOpenConnections), but the behavior we really want is to allow bursting and backoff, IMHO.

Makes sense. We're just trying to fix the churning connections, so setting MaxIdle seems like the simplest thing to do.

FWIW, I don't think we should change the effective default max open connections. I don't want to surprise any Concourse operators and cause server-side limits to suddenly be hit (due to an increase) or cause components to be starved of connections (due to a decrease) where they weren't before. We're only trying to fix the connection churn, so I don't think changing anything else is worth the risk, at least not right now. We can re-assess these magic numbers when we do per-component connection pools (#4025). I'll make a note in that issue to be sure to configure MaxIdle appropriately so this doesn't regress.

2. Locking should rely on the mutex, not the connection, so it should use the same connection pool. After all, as soon as you have 2 webs, the connection is not your savior.

As clarified in #4232 (comment) locking does use the connection, and I think that'll need to stay how it is.

With all that said, this sounds like a super simple change (just SetMaxIdleConns(maxConn / 2). With a lot of thought/words behind it. 😅

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks for all of the help on this. There are already three connection pools (lockConn, apiConn and backendConn). If you are going to add another for gcConn and potentially others, then making this configurable could get a bit unwieldy. My personal vote would be to allow configuration of the total pool size and use the pool for all use cases. Would be nice if there was a prioritized pool, but...

How would you like me to proceed? I have this working for our needs right now, and we are happy with the changes so far. Would you like me to make any specific changes, like make the default MaxOpenConnections a configuration parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you've just pushed looks good. 🙂

The simplest possible thing for #4025 is probably to just have a flag for each component and spend a lot of time making sure we get the defaults right so most operators don't think about it (...until they need to, and then they have all the knobs they want).

Some components only need a certain amount of connections no matter what the scale is, so those are pretty easy. Others, like the API and pipeline scheduling, are based more on usage patterns, so it's hard to have a reasonable default. But hey, hardcoding 32 got us this far!

- roll back to previous lock behavior
- make max open connections a flag
- set max idle connections as 1/2 max open

Signed-off-by: Eric Billingsley <eric@calculi.com>
vito
vito previously approved these changes Aug 8, 2019
@@ -98,6 +98,8 @@ type RunCommand struct {

Postgres flag.PostgresConfig `group:"PostgreSQL Configuration" namespace:"postgres"`

MaxOpenConnections int `long:"max-conns" description:"The maximum number of open connections for a conneciton pool." default:"32"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching. Fixed.

Signed-off-by: Eric Billingsley <eric@calculi.com>
@@ -1187,7 +1190,6 @@ func (cmd *RunCommand) constructLockConn(driverName string) (*sql.DB, error) {
return nil, err
}

dbConn.SetMaxOpenConns(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good to go once this line is restored. 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Eric Billingsley <eric@calculi.com>
@ebilling ebilling requested a review from a team August 14, 2019 15:49
Copy link
Member

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a bunch!

@vito vito merged commit 63b59f0 into concourse:master Aug 14, 2019
jamieklassen pushed a commit to concourse/concourse-bosh-release that referenced this pull request Aug 21, 2019
* influxdb batch size/duration and metrics buffer size concourse/concourse#3937
* max db connection pool size concourse/concourse#4232

Signed-off-by: Jamie Klassen <cklassen@pivotal.io>
@jamieklassen jamieklassen added this to the v5.5.0 milestone Aug 26, 2019
jamieklassen pushed a commit to concourse/docs that referenced this pull request Aug 26, 2019
concourse/concourse#4232

Signed-off-by: Jamie Klassen <cklassen@pivotal.io>
jamieklassen pushed a commit that referenced this pull request Aug 26, 2019
#4232

Signed-off-by: James Thomson <jthomson@pivotal.io>
Co-authored-by: Jamie Klassen <cklassen@pivotal.io>
@jamieklassen jamieklassen added the release/documented Documentation and release notes have been updated. label Aug 29, 2019
matthewpereira pushed a commit that referenced this pull request Sep 5, 2019
#4232

Signed-off-by: James Thomson <jthomson@pivotal.io>
Co-authored-by: Jamie Klassen <cklassen@pivotal.io>
k8s-ci-robot pushed a commit to helm/charts that referenced this pull request Sep 6, 2019
This commit adds the new parameters that were added to Concourse 5.5.

Here's a breakdown of the new parameters:

- max-active-tasks-per-worker

  > used by the `limit-active-tasks` container placement strategy
  > concourse/concourse#4118

- support for influxdb batching and bigger buffer size for metrics emissions

  > concourse/concourse#3937

- limitting number of max connections in db conn pools

  > concourse/concourse#4232

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
Co-authored-by: Zoe Tian <ztian@pivotal.io>
kengou pushed a commit to kengou/charts that referenced this pull request Sep 18, 2019
This commit adds the new parameters that were added to Concourse 5.5.

Here's a breakdown of the new parameters:

- max-active-tasks-per-worker

  > used by the `limit-active-tasks` container placement strategy
  > concourse/concourse#4118

- support for influxdb batching and bigger buffer size for metrics emissions

  > concourse/concourse#3937

- limitting number of max connections in db conn pools

  > concourse/concourse#4232

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
Co-authored-by: Zoe Tian <ztian@pivotal.io>
ramkumarvs pushed a commit to yugabyte/charts-helm-fork that referenced this pull request Sep 30, 2019
This commit adds the new parameters that were added to Concourse 5.5.

Here's a breakdown of the new parameters:

- max-active-tasks-per-worker

  > used by the `limit-active-tasks` container placement strategy
  > concourse/concourse#4118

- support for influxdb batching and bigger buffer size for metrics emissions

  > concourse/concourse#3937

- limitting number of max connections in db conn pools

  > concourse/concourse#4232

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
Co-authored-by: Zoe Tian <ztian@pivotal.io>
taylorsilva pushed a commit to taylorsilva/concourse-helm that referenced this pull request Oct 2, 2019
This commit adds the new parameters that were added to Concourse 5.5.

Here's a breakdown of the new parameters:

- max-active-tasks-per-worker

  > used by the `limit-active-tasks` container placement strategy
  > concourse/concourse#4118

- support for influxdb batching and bigger buffer size for metrics emissions

  > concourse/concourse#3937

- limitting number of max connections in db conn pools

  > concourse/concourse#4232

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
Co-authored-by: Zoe Tian <ztian@pivotal.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/documented Documentation and release notes have been updated.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants