Skip to content

Fix goroutine leak in consulcatalog when consul is down#3036

Merged
traefiker merged 1 commit intotraefik:v1.5from
yutopp:fix_goroutine_leak_in_consulcatalog
Mar 20, 2018
Merged

Fix goroutine leak in consulcatalog when consul is down#3036
traefiker merged 1 commit intotraefik:v1.5from
yutopp:fix_goroutine_leak_in_consulcatalog

Conversation

@yutopp
Copy link
Copy Markdown
Contributor

@yutopp yutopp commented Mar 16, 2018

What does this PR do?

Fix goroutine leaks in consulcatalog provider. Use sync.Once to avoid blocking when sending errors to the channel which already used and an owner function of it is already exited.

Below is a number of goroutines by metrics data.

# Before patch
> date && curl -s http://localhost:10080/metrics | grep go_goroutines             
Fri Mar 16 19:32:28 JST 2018
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 28

> sleep 120 && date && curl -s http://localhost:10080/metrics | grep go_goroutines
Fri Mar 16 19:34:29 JST 2018
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 54
# After patch
> date && curl -s http://localhost:10080/metrics | grep go_goroutines             
Fri Mar 16 18:49:38 JST 2018
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 21

> sleep 120 && date && curl -s http://localhost:10080/metrics | grep go_goroutines
Fri Mar 16 18:51:39 JST 2018
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 21

(traefik command is ./dist/traefik --consulcatalog --entryPoints=Name:http Address::18080 --web.address=:10080 --web.metrics --web.metrics.prometheus --web.metrics.prometheus.entrypoint=traefik, and consul is not started)

Motivation

Assume there are 2 goroutines are spawned and they have the same error channel. If one goroutine sends an error to the channel, the caller returns by getting an error from the channel. Then the other goroutine will stuck when also trying to send an error to the channel because an owner of the error channel is already exited.

Additional Notes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add 1 buffer on this chan instead of the sync.Once?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding buffers will reduce maintainability because a number of capacity is depended on the number of gorountines which spawned at other functions. By using sync.Once, it will no longer need to worry about thinking the number of the buffer and blocking of the channel.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for me.

We need this bug fix in 1.5, can you do this PR on branch v1.5 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed base branch to v1.5 and patched the fix to it.

@nmengin nmengin modified the milestones: 1.6, 1.5 Mar 20, 2018
@yutopp yutopp force-pushed the fix_goroutine_leak_in_consulcatalog branch from 1b0a516 to c1eadef Compare March 20, 2018 11:29
@yutopp yutopp requested review from a team as code owners March 20, 2018 11:29
@yutopp yutopp changed the base branch from master to v1.5 March 20, 2018 11:29
@yutopp yutopp force-pushed the fix_goroutine_leak_in_consulcatalog branch from c060335 to 2c3640a Compare March 20, 2018 11:40
Copy link
Copy Markdown
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@mmatur mmatur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@traefiker traefiker merged commit 43a510c into traefik:v1.5 Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants