Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectRetry doesn't work if the server comes up after the first attempt has failed #373

Closed
jamesmanning opened this issue Mar 14, 2016 · 5 comments

Comments

@jamesmanning
Copy link

If the target redis server is down ('connection refused' error coming back from socket attempts) when the first attempt happens, if the server comes up, the later retries still fail such that the overall connect still fails (after it exhausts the retry limit).

Here's a failing test I added locally to show the issue. I'm not sure if there's an existing/canonical way to do this, but if this is fine, I can make a pull request with it if that's helpful.

using System;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using NUnit.Framework;

namespace StackExchange.Redis.Tests
{
    [TestFixture]
    public class ConnectRetryTests : TestBase
    {
        [Test]
        [MaxTime(100000)]
        public void ConnectWithRetryButFailsOnFirst()
        {
            var log = new StringWriter();
            try
            {
                var options = ConfigurationOptions.Parse("localhost:6379");
                options.ConnectRetry = 5;
                options.ConnectTimeout = 10000;

                // stop the redis server so the first connect should fail.  Note that we only need to
                // sleep long enough to make sure the first connect attempt fails.
                var serviceStopStartTask = StopSleepThenStartRedis(options.ConnectTimeout);

                using (var conn = ConnectionMultiplexer.Connect(options, log))
                {
                    var ttl = conn.GetDatabase().Ping();
                    Console.WriteLine(ttl);
                }
            }
            catch
            {
                Console.WriteLine(log);
                Assert.Fail();
            }
        }

        private async Task StopSleepThenStartRedis(int sleepTimeMilliseconds)
        {
            Process.Start("sc.exe", "stop redis").WaitForExit();
            Console.WriteLine("local redis server has stopped");
            await Task.Delay(sleepTimeMilliseconds);
            Process.Start("sc.exe", "start redis").WaitForExit();
            Console.WriteLine("local redis server has started");
        }
    }
}

Log from connect attempt:

localhost:6379,connectTimeout=10000,connectRetry=5

Connecting localhost:6379/Interactive...
BeginConnect: localhost:6379
1 unique nodes specified
Requesting tie-break from localhost:6379 > __Booksleeve_TieBreak...
Allowing endpoints 00:00:10 to respond...
Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=3,Free=32764,Min=8,Max=32767)
EndConnect: localhost:6379
All tasks completed cleanly, IOCP: (Busy=2,Free=998,Min=8,Max=1000), WORKER: (Busy=3,Free=32764,Min=8,Max=32767)
localhost:6379 faulted: SocketFailure on PING
Awaiting task completion, IOCP: (Busy=2,Free=998,Min=8,Max=1000), WORKER: (Busy=1,Free=32766,Min=8,Max=32767)
All tasks completed cleanly, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 failed to nominate (Faulted)
> UnableToResolvePhysicalConnection on GET
No masters detected
localhost:6379: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Disconnected; sub: Connecting; not in use: DidNotRespond
localhost:6379: int ops=0, qu=0, qs=0, qc=0, wr=0, sync=1, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=2
Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s)
Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
resetting failing connections to retry...
retrying; attempts left: 4...
1 unique nodes specified
Requesting tie-break from localhost:6379 > __Booksleeve_TieBreak...
Allowing endpoints 00:00:10 to respond...
Awaiting task completion, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
All tasks completed cleanly, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=4,Free=32763,Min=8,Max=32767)
localhost:6379 faulted: UnableToResolvePhysicalConnection on PING
Awaiting task completion, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=1,Free=32766,Min=8,Max=32767)
Connect complete: localhost:6379
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=4,Free=32763,Min=8,Max=32767)
localhost:6379 failed to nominate (WaitingForActivation)
No masters detected
localhost:6379: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond
localhost:6379: int ops=0, qu=2, qs=0, qc=0, wr=0, sync=3, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=3
Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s)
Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
resetting failing connections to retry...
retrying; attempts left: 3...
1 unique nodes specified
Requesting tie-break from localhost:6379 > __Booksleeve_TieBreak...
Allowing endpoints 00:00:10 to respond...
Awaiting task completion, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=3,Free=32764,Min=8,Max=32767)
All tasks completed cleanly, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=1,Free=32766,Min=8,Max=32767)
localhost:6379 faulted: SocketFailure on PING
Awaiting task completion, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=1,Free=32766,Min=8,Max=32767)
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 failed to nominate (WaitingForActivation)
No masters detected
localhost:6379: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond
localhost:6379: int ops=0, qu=4, qs=0, qc=0, wr=0, sync=4, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=4
Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s)
Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
resetting failing connections to retry...
retrying; attempts left: 2...
1 unique nodes specified
Requesting tie-break from localhost:6379 > __Booksleeve_TieBreak...
Allowing endpoints 00:00:10 to respond...
Awaiting task completion, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=3,Free=32764,Min=8,Max=32767)
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 did not respond
Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 failed to nominate (WaitingForActivation)
No masters detected
localhost:6379: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: ConnectedEstablished, 1 active; not in use: DidNotRespond
localhost:6379: int ops=0, qu=8, qs=0, qc=0, wr=0, sync=4, socks=2; sub ops=3, qu=0, qs=0, qc=0, wr=0, subs=1, sync=3, socks=5
Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0+3=3 (0.30 ops/s; spans 10s)
Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
resetting failing connections to retry...
retrying; attempts left: 1...
1 unique nodes specified
Requesting tie-break from localhost:6379 > __Booksleeve_TieBreak...
Allowing endpoints 00:00:10 to respond...
Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 did not respond
Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767)
localhost:6379 failed to nominate (WaitingForActivation)
No masters detected
localhost:6379: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: ConnectedEstablished, 1 active; not in use: DidNotRespond
localhost:6379: int ops=0, qu=10, qs=0, qc=0, wr=0, sync=4, socks=2; sub ops=3, qu=0, qs=0, qc=0, wr=0, subs=1, sync=3, socks=5
Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0+3=3 (0.30 ops/s; spans 10s)
Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
@jamesmanning
Copy link
Author

FWIW, after increasing connectRetry up to 20 and leaving connectTimeout as its default of 5 seconds, I can see a 4-attempt pattern to the failures

In the cycle, the first attempt gets IsFaulted with UnableToResolvePhysicalConnection, the second gets IsFaulted with ConnectionDisposed, then the next 2 get IsCanceled

  1. SocketFailure (intentional)
  2. UnableToResolvePhysicalConnection
  3. ConnectionDisposed
  4. localhost:6379 did not respond / localhost:6379 failed to nominate
  5. localhost:6379 did not respond / localhost:6379 failed to nominate
  6. UnableToResolvePhysicalConnection
  7. ConnectionDisposed
  8. localhost:6379 did not respond / localhost:6379 failed to nominate
  9. localhost:6379 did not respond / localhost:6379 failed to nominate
  10. ... 4-attempt cycle keeps repeating

If I increase the timeout from 5 seconds to 10 seconds the 'did not respond' only happens once per cycle (so it becomes a 3-attempt cycle), but otherwise it stays the same AFAICT.

If I increase the timeout up to 30 seconds, then the 'did not respond' ones don't happen, but instead seem to turn into faulted: with UnableToConnect, although there's a Disposed near the beginning. Same pattern happens at 60 seconds as well.

  1. SocketFailure
  2. UnableToResolvePhysicalConnection
  3. ConnectionDisposed
  4. UnableToConnect
  5. UnableToResolvePhysicalConnection
  6. UnableToConnect
  7. UnableToResolvePhysicalConnection
  8. UnableToConnect
  9. UnableToResolvePhysicalConnection
  10. UnableToConnect

@jamesmanning
Copy link
Author

I have a workaround: if setting configuration in code: config.TieBreaker = String.Empty; and if in a connection string, tiebreaker= (with no value after the equals)

The relevant bit is (I think) here, although it's more a guess than anything at this point.

if (useTieBreakers)
{
LogLocked(log, "Requesting tie-break from {0} > {1}...", Format.ToString(server.EndPoint), configuration.TieBreaker);
Message msg = Message.Create(0, flags, RedisCommand.GET, tieBreakerKey);
msg.SetInternalCall();
msg = LoggingMessage.Create(log, msg);
tieBreakers[i] = server.QueueDirectAsync(msg, ResultProcessor.String);
}

Since the configuration doesn't explicitly set it, the TieBreaker configuration is set (specifically, to __Booksleeve_TieBreak, but the actual value doesn't matter, just that it's set). So, even though I'm connecting to just a single stand-alone server, useTiebreakers ends up being true.

bool useTieBreakers = !string.IsNullOrWhiteSpace(configuration.TieBreaker);

Should useTiebreakers require both the key being set and endpoints.Count > 1? In any case, I have a workaround, and the test passes when the workaround is added.

@NickCraver
Copy link
Collaborator

Should useTiebreakers require both the key being set and endpoints.Count > 1

That's not going to fix many other cases...but it's still probably a good idea. Less complication in simpler scenarios is always a good thing - thoughts @mgravell?

@jamesmanning
Copy link
Author

@NickCraver probably not worth filing separately, but FWIW since the ConfigurationOptions.ToString only adds an option if its set, then setting tieBreaker to empty string (which isn't its default value) currently doesn't round-trip since it won't be serialized in the string.

image

@NickCraver
Copy link
Collaborator

This was fixed in the 2.x rewrite - just cleaning up older issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants