Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capacity controller: listen to creation events on deployments #111

Merged
merged 1 commit into from Jul 8, 2019

Conversation

Projects
None yet
3 participants
@juliogreff
Copy link
Contributor

commented Jul 4, 2019

This is related to #77. When running shipper with resyncs disabled, we
noticed that some releases would hang, waiting for contender to achieve
capacity.

After investigating, we found that this was probably due to replication
lag on the application clusters. When operating over a freshly created
Deployment, we'd get a stream of the following errors:

error syncing CapacityTarget "foo/bar" (will retry: true): expected exactly
1 deployment on cluster baz, namespace foo, with label "..." but 0
deployments exist

Usually, this resolves itself with a couple of retries, but when the
replication lag is severe enough, or different errors occur in sequence,
we'd drop the capacity target from the queue and never retry again.

By watching creation of Deployments, the CapacityTarget will re-join the
queue, and releases should no longer get stuck.

capacity controller: listen to creation events on deployments
This is related to #77. When running shipper with resyncs disabled, we
noticed that some releases would hang, waiting for contender to achieve
capacity.

After investigating, we found that this was probably due to replication
lag on the application clusters. When operating over a freshly created
Deployment, we'd get a stream of the following errors:

	error syncing CapacityTarget "foo/bar" (will retry: true):
	expected exactly 1 deployment on cluster baz, namespace foo,
	with label "..." but 0 deployments exist

Usually, this resolves itself with a couple of retries, but when the
replication lag is severe enough, or different errors occur in sequence,
we'd drop the capacity target from the queue and never retry again.

By watching creation of Deployments, the CapacityTarget will re-join the
queue, and releases should no longer get stuck.

@icanhazbroccoli icanhazbroccoli merged commit 83d670e into master Jul 8, 2019

2 checks passed

Travis CI - Branch Build Passed
Details
Travis CI - Pull Request Build Passed
Details

@icanhazbroccoli icanhazbroccoli deleted the jgreff/ct-deployment-handler branch Jul 8, 2019

icanhazbroccoli added a commit that referenced this pull request Jul 8, 2019

capacity controller: listen to creation events on deployments (#111)
This is related to #77. When running shipper with resyncs disabled, we
noticed that some releases would hang, waiting for contender to achieve
capacity.

After investigating, we found that this was probably due to replication
lag on the application clusters. When operating over a freshly created
Deployment, we'd get a stream of the following errors:

	error syncing CapacityTarget "foo/bar" (will retry: true):
	expected exactly 1 deployment on cluster baz, namespace foo,
	with label "..." but 0 deployments exist

Usually, this resolves itself with a couple of retries, but when the
replication lag is severe enough, or different errors occur in sequence,
we'd drop the capacity target from the queue and never retry again.

By watching creation of Deployments, the CapacityTarget will re-join the
queue, and releases should no longer get stuck.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.