New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL deadlocks when many deployments run simultaneously #2246

Closed
vanessalove opened this Issue Dec 18, 2015 · 17 comments

Comments

Projects
None yet
@vanessalove
Contributor

vanessalove commented Dec 18, 2015

40 deployments kicked off at once through teamcity sometimes causes a deadlock error

[16:31:01][Octopus Deploy] Server exception: 
[16:31:01][Octopus Deploy] System.Exception: Error while executing SQL command: Transaction (Process ID 238) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
[16:31:01][Octopus Deploy] The command being executed was:
[16:31:01][Octopus Deploy] INSERT INTO dbo.[Event] (RelatedDocumentIds, ProjectId, EnvironmentId, Category, UserId, Username, Occurred, Message, Id, Json) values (@RelatedDocumentIds, @ProjectId, @EnvironmentId, @Category, @UserId, @Username, @Occurred, @Message, @Id, @Json) ---> System.Data.SqlClient.SqlException: Transaction (Process ID 238) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
[16:31:01][Octopus Deploy]    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

Source: http://help.octopusdeploy.com/discussions/problems/43502

@vanessalove

This comment has been minimized.

Show comment
Hide comment
@vanessalove

vanessalove Jan 21, 2016

Contributor

This is probably an edge case and has only been reported by one user, but I thought we had situations like this covered, and i feel we will see it again

Contributor

vanessalove commented Jan 21, 2016

This is probably an edge case and has only been reported by one user, but I thought we had situations like this covered, and i feel we will see it again

@vanessalove vanessalove removed the ready label Jan 21, 2016

@PaulStovell PaulStovell added the ready label Jan 27, 2016

@robpearson robpearson self-assigned this Jan 28, 2016

@robpearson robpearson added in progress and removed ready labels Jan 28, 2016

@robpearson

This comment has been minimized.

Show comment
Hide comment
@robpearson

robpearson Jan 28, 2016

Happy to say I've reproduced this.
Scenario:

  • Promoting a release from DEV to TEST
  • TEST lifecycle phase has two environments (Test A and Test B)
  • Project is a rolling deployment with three child steps.
    • Add to LB
    • Deploy package
    • Remove from LB
  • Periodically fails on executing the deployment with the error below.

screencapture-localhost-8065-app-1454020575540

robpearson commented Jan 28, 2016

Happy to say I've reproduced this.
Scenario:

  • Promoting a release from DEV to TEST
  • TEST lifecycle phase has two environments (Test A and Test B)
  • Project is a rolling deployment with three child steps.
    • Add to LB
    • Deploy package
    • Remove from LB
  • Periodically fails on executing the deployment with the error below.

screencapture-localhost-8065-app-1454020575540

@robpearson robpearson added blocked and removed in progress labels Feb 1, 2016

@michaelnoonan michaelnoonan added ready and removed blocked labels Feb 3, 2016

@vanessalove

This comment has been minimized.

Show comment
Hide comment
@vanessalove

This comment has been minimized.

Show comment
Hide comment
@vanessalove

vanessalove Apr 6, 2016

Contributor

Happening for some customers daily now. I don't know how to move this forward any faster? it's been in ready since Jan 21. What extra information would you like?

Contributor

vanessalove commented Apr 6, 2016

Happening for some customers daily now. I don't know how to move this forward any faster? it's been in ready since Jan 21. What extra information would you like?

@phundisk

This comment has been minimized.

Show comment
Hide comment
@phundisk

phundisk Apr 6, 2016

Is there any good work around for this issue or idea on when this may be resolved? I am seeing this when I try and install the tentacle on about 30 - 40 instances in which they all come up and register in octo within a short period of time. This seems to have happened more frequently since we just upgraded our octo server environment.

phundisk commented Apr 6, 2016

Is there any good work around for this issue or idea on when this may be resolved? I am seeing this when I try and install the tentacle on about 30 - 40 instances in which they all come up and register in octo within a short period of time. This seems to have happened more frequently since we just upgraded our octo server environment.

@vanessalove

This comment has been minimized.

Show comment
Hide comment
@vanessalove

vanessalove Apr 7, 2016

Contributor

http://help.octopusdeploy.com/discussions/problems/45014 ticket (already linked above) has a log with 47 errors in a row

Contributor

vanessalove commented Apr 7, 2016

http://help.octopusdeploy.com/discussions/problems/45014 ticket (already linked above) has a log with 47 errors in a row

@thebigmoosey thebigmoosey added in progress and removed ready labels Apr 7, 2016

@thebigmoosey thebigmoosey self-assigned this Apr 7, 2016

@BryanPierard

This comment has been minimized.

Show comment
Hide comment
@BryanPierard

BryanPierard Apr 18, 2016

We saw this issue as well a couple of weeks ago. Retry worked for us.

BryanPierard commented Apr 18, 2016

We saw this issue as well a couple of weeks ago. Retry worked for us.

@vanessalove

This comment has been minimized.

Show comment
Hide comment
@vanessalove

vanessalove Apr 19, 2016

Contributor

Please pair with Paul who has some ideas about logging that can be activated client side to help us reproduce.

Contributor

vanessalove commented Apr 19, 2016

Please pair with Paul who has some ideas about logging that can be activated client side to help us reproduce.

@phundisk

This comment has been minimized.

Show comment
Hide comment
@phundisk

phundisk Apr 19, 2016

We just rebuilt the database indexes on the SQL database and that seemed to help out a bit. You might want to try doing that to see if it fixes your issue too.

phundisk commented Apr 19, 2016

We just rebuilt the database indexes on the SQL database and that seemed to help out a bit. You might want to try doing that to see if it fixes your issue too.

@thebigmoosey

This comment has been minimized.

Show comment
Hide comment
@thebigmoosey

thebigmoosey Apr 20, 2016

Hi,

Thanks everyone for the various reports. We've analysed everything to date and unfortunately we are unable to find the cause of the problem (or reproduce this with any sort of consistency). So, the next steps...

We've compiled a version of Octopus (from the current master branch) that includes transaction logging when a deadlock exception is encountered. This won't solve the issue, but it will give us the necessary logs to hopefully analyse the transactions involved in a deadlock from a real-world situation. The patch version is available here for you to install: Octopus.3.3.9-bug-sqldeadlockt0004-x64.msi As always, please ensure you have made a full backup before installing this patch.

If a deadlock is occurring consistently enough for you, we would be grateful if you could capture a SQL Profiling session with additional deadlock traces enabled and send us your findings. To capture deadlock information via a trace:

  • Run this command against your Octopus database: DBCC TRACEON (1204, 1222)
  • Open SQL Profiler and start a trace with extended events selected
    • Click File > New Trace > [connect to your instance] > Events Selection tab
    • Click the Show all events checkbox
    • Select Deadlock graph, Lock: Deadlock, Lock: Deadlock Chain events

Full instructions for setting up a SQL Profile trace for deadlocks can be found here: https://www.brentozar.com/archive/2014/06/capturing-deadlock-information/

Also, please ensure READ_COMMITTED_SNAPSHOT is enabled for your Octopus database. This is enabled when Octopus is first installed, but in case you've disabled this in your database at some point, you can re-run this script to make sure it is enabled (replace $databaseName$ with the name of your database):

PRINT 'Enabling read committed snapshot isolation on $databaseName$'
IF EXISTS (SELECT * FROM sys.databases WHERE name='$databaseName$' AND is_read_committed_snapshot_on = 0)
BEGIN
    EXECUTE sp_executesql N'ALTER DATABASE [$databaseName$] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE';
END
ELSE
BEGIN
    PRINT 'Read committed snapshot isolation is already on'
END

Thanks for your patience and we appreciate your help!
Cheers

thebigmoosey commented Apr 20, 2016

Hi,

Thanks everyone for the various reports. We've analysed everything to date and unfortunately we are unable to find the cause of the problem (or reproduce this with any sort of consistency). So, the next steps...

We've compiled a version of Octopus (from the current master branch) that includes transaction logging when a deadlock exception is encountered. This won't solve the issue, but it will give us the necessary logs to hopefully analyse the transactions involved in a deadlock from a real-world situation. The patch version is available here for you to install: Octopus.3.3.9-bug-sqldeadlockt0004-x64.msi As always, please ensure you have made a full backup before installing this patch.

If a deadlock is occurring consistently enough for you, we would be grateful if you could capture a SQL Profiling session with additional deadlock traces enabled and send us your findings. To capture deadlock information via a trace:

  • Run this command against your Octopus database: DBCC TRACEON (1204, 1222)
  • Open SQL Profiler and start a trace with extended events selected
    • Click File > New Trace > [connect to your instance] > Events Selection tab
    • Click the Show all events checkbox
    • Select Deadlock graph, Lock: Deadlock, Lock: Deadlock Chain events

Full instructions for setting up a SQL Profile trace for deadlocks can be found here: https://www.brentozar.com/archive/2014/06/capturing-deadlock-information/

Also, please ensure READ_COMMITTED_SNAPSHOT is enabled for your Octopus database. This is enabled when Octopus is first installed, but in case you've disabled this in your database at some point, you can re-run this script to make sure it is enabled (replace $databaseName$ with the name of your database):

PRINT 'Enabling read committed snapshot isolation on $databaseName$'
IF EXISTS (SELECT * FROM sys.databases WHERE name='$databaseName$' AND is_read_committed_snapshot_on = 0)
BEGIN
    EXECUTE sp_executesql N'ALTER DATABASE [$databaseName$] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE';
END
ELSE
BEGIN
    PRINT 'Read committed snapshot isolation is already on'
END

Thanks for your patience and we appreciate your help!
Cheers

@titusjaka

This comment has been minimized.

Show comment
Hide comment
@titusjaka

titusjaka Apr 20, 2016

Hi,

We'll try to collect traces for you in the near future.
Could you please clarify for me whether installing patch and profilling SQL are separated or combined actions? I mean, could we collect SQL traces without patch or not?

Kind regards,
Denis Titusov

titusjaka commented Apr 20, 2016

Hi,

We'll try to collect traces for you in the near future.
Could you please clarify for me whether installing patch and profilling SQL are separated or combined actions? I mean, could we collect SQL traces without patch or not?

Kind regards,
Denis Titusov

@thebigmoosey

This comment has been minimized.

Show comment
Hide comment
@thebigmoosey

thebigmoosey Apr 20, 2016

Hi Denis,

The SQL trace is the main priority, so you could collect SQL Profiler traces without the patch. However, installing the patch version will help to give additional logging of the transactions leading to the deadlock, which will all help to identify the problem. So if you can install the patch, great, if not, that's ok too :) At this point we're just trying to gather whatever additional information we can on the issue as it occurs.

Thanks
Mark

thebigmoosey commented Apr 20, 2016

Hi Denis,

The SQL trace is the main priority, so you could collect SQL Profiler traces without the patch. However, installing the patch version will help to give additional logging of the transactions leading to the deadlock, which will all help to identify the problem. So if you can install the patch, great, if not, that's ok too :) At this point we're just trying to gather whatever additional information we can on the issue as it occurs.

Thanks
Mark

@droyad

This comment has been minimized.

Show comment
Hide comment
@droyad

droyad Jun 22, 2016

SQL Deadlock Tracing has been implemented in 3.3.18 via #2542

If you encounter deadlocks, and are running that version or later, we would appreciate the OctopusServer.txt log file (usually located in c:\octopus\logs) being sent to support@octopus.com. If you do not want to send the whole file, you can send in the exception message, including the SQL statements below it. The SQL statements we log do not contain the parameters.

droyad commented Jun 22, 2016

SQL Deadlock Tracing has been implemented in 3.3.18 via #2542

If you encounter deadlocks, and are running that version or later, we would appreciate the OctopusServer.txt log file (usually located in c:\octopus\logs) being sent to support@octopus.com. If you do not want to send the whole file, you can send in the exception message, including the SQL statements below it. The SQL statements we log do not contain the parameters.

@brentm5

This comment has been minimized.

Show comment
Hide comment
@brentm5

brentm5 Jun 22, 2016

We are also seeing this however we are on an older version (3.2.11). It happened when creating a release with octo.exe just to give some information. Just to give some more information to help maybe track down this issue. Octo Support Issue

brentm5 commented Jun 22, 2016

We are also seeing this however we are on an older version (3.2.11). It happened when creating a release with octo.exe just to give some information. Just to give some more information to help maybe track down this issue. Octo Support Issue

@titusjaka

This comment has been minimized.

Show comment
Hide comment
@titusjaka

titusjaka Jun 23, 2016

Hi Robert and Mark,
We still run 3.3.14. I will provide you log file as soon as we upgrade our instance.

Kind regards,
Denis Titusov

titusjaka commented Jun 23, 2016

Hi Robert and Mark,
We still run 3.3.14. I will provide you log file as soon as we upgrade our instance.

Kind regards,
Denis Titusov

@thebigmoosey thebigmoosey added this to the 3.4.0 milestone Jul 6, 2016

@michaelnoonan michaelnoonan modified the milestones: 3.4.0-beta0001, 3.4.0 Jul 21, 2016

@octoreleasebot

This comment has been minimized.

Show comment
Hide comment
@octoreleasebot

octoreleasebot Jul 21, 2016

Release Note: Fixed issues around event table deadlocks

octoreleasebot commented Jul 21, 2016

Release Note: Fixed issues around event table deadlocks

@alfhenrik alfhenrik removed this from the 3.4.0-beta0001 milestone Aug 23, 2016

@alfhenrik alfhenrik modified the milestones: 3.4.0, 3.4.0-beta0001 Aug 23, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment