Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous connection losses when connecting to clusters #8059

Closed
hardcodet opened this issue Aug 10, 2019 · 7 comments
Closed

Continuous connection losses when connecting to clusters #8059

hardcodet opened this issue Aug 10, 2019 · 7 comments
Labels
can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity.

Comments

@hardcodet
Copy link

hardcodet commented Aug 10, 2019

Do you want to request a feature or report a bug?

This is potentially an issue, it may not be mongoose though.

What is the current behavior?

When working with a database in a docker container, I enjoy a stable connection. On the other hand, if I connect to a cluster (either MongoDB Atlas, or an Azure Cosmos DB), there's a continuous loop of disconnect/reconnect events. It doesn't matter whether I run my app locally, or in the same DataCenter that hosts the cluster. I explicitly set the keepAlive option, but that didn't help.

This has been raised with #7452, but then closed without any resolution. Maybe the fact that I only experience this with a cluster can shed some light on this?

If the current behavior is a bug, please provide the steps to reproduce.

My app works, but it has to reconnect all the time (which affects performance and wreaks havoc in my alerting ;). To repro the issue, I just switch the connection strings to point to a MongoDB on Atlas (free tier), and then the reconnect events start to fire. When I query for health, the connection's readyState is rarely 1.

What is the expected behavior?

Stable connection.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

mongoose 5.6.7
node v11.2.0
MongoDB Atlas

Here's my connection options:

connection string: mongodb+srv://foobar-123.azure.mongodb.net/foobar?retryWrites=true&w=majority

           const options: ConnectionOptions = {
                user: config.mongoDbUser || undefined,
                pass: config.mongoDbPass || undefined,
                useNewUrlParser: true,
                reconnectTries: config.mongoDbReconnectAttempts || 10,
                reconnectInterval: config.mongoDbReconnectInterval || 1500,
                connectTimeoutMS: config.mongoDbConnectTimeoutMs || 3000,
                socketTimeoutMS: config.mongoDbSocketTimeoutMs || 8000,
                keepAlive: true,
            };
@vkarpov15
Copy link
Collaborator

This issue is too vague to be actionable. Realistically I won't be able to debug this without access to your codebase, but here's a couple ideas:

  • Are you using angular universal or zone.js? Theres a known bug that prevents those from connecting to Atlas
  • How are you connecting to mongodb? Using mongoose.connect()?
  • You mentioned cosmos db. Mongoose does not officially support cosmos db.
  • Can you connect to your Atlas cluster using the mongo shell?
  • Is your app running in a docker container? If so, please describe your docker setup.

@vkarpov15 vkarpov15 added the needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity label Aug 11, 2019
@hardcodet
Copy link
Author

hardcodet commented Aug 15, 2019

@vkarpov15 Sorry for the late reply! Here's some bullet points for your bullet points :)

  • Neither, it's a Node application
  • Yes, I'm using mongoose.connect. See snippet below.
  • I only checked with Cosmos DB to see whether it's an Atlas issue. The behavior is the same for both Atlas and Cosmos DB
  • I can connect, also with mongoose. It's just that mongoose loses and recreates the connection all the time. At least, it fires the reconnected event all the time, and the readyState is switching constantly.
  • It's the same behavior whether my app runs in a Docker container (Linux, on my machine or on Azure) or directly on my Windows machine.

Here's my connection code:
connection string: mongodb+srv://foobar-123.azure.mongodb.net/foobar?retryWrites=true&w=majority

        const options: ConnectionOptions = {
            user: config.mongoDbUser || undefined,
            pass: config.mongoDbPass || undefined,
            useNewUrlParser: true,
            reconnectTries: config.mongoDbReconnectAttempts || 10,
            reconnectInterval: config.mongoDbReconnectIntervalMs || 1500,
            connectTimeoutMS: config.mongoDbConnectTimeoutMs || 3000,
            socketTimeoutMS: config.mongoDbSocketTimeoutMs || 8000,
            keepAlive: true,
        };

        const db: Mongoose = await mongoose.connect(uri, options);

Btw, those events fire like clockwork. This is definitely not a random disconnect:

image

@hardcodet
Copy link
Author

Btw, this is being logged like this:

mongoose.connection.on("reconnected", err => {
    logger.info(`MongoDB reconnection success @ ${config.mongoDbUri}`, err);
});

@vkarpov15
Copy link
Collaborator

Here's a potential explanation, it looks like the reconnection happens every 20 seconds. Is your socketTimeoutMS set to 20 seconds (20000)? If there's no activity - no queries and no updates, and your socket timeout is 20000, this would explain the timeouts.

@hardcodet
Copy link
Author

@vkarpov15 No, the timeout is set to 8000 ms. If that helps, I can send you the actual connection string (I'd rather send that to you directly rather than posting it here for obvious reasons ;). At the end of the day, the app isn't doing anything but connecting to said Atlas cluster as posted above in my snippet, then sitting idly (no database requests going on), so this could indeed be related to timeouts and/or keepAlive logic. Here's the full snippet:

export class MongoDbService extends LogEnabled {

    public static get isConnected(): boolean {
        return mongoose.connection.readyState === 1;
    }

    constructor(private config: ConfigService, private logging: LoggingService) {
        super(logging);
    }


    /**
     * Inits the MongoDB connection plus reconnect logic for manual reconnects if automatic retries fail.
     */
    public async initMongoDb(attemptCounter = 0, subscribeToEvents = true) {
        const logger = this.logger;
        const config = this.config;

        try {
            // perform connect
            attemptCounter++;
            const uri: string = this.config.mongoDbUri;

            const options: ConnectionOptions = {
                user: config.mongoDbUser || undefined,
                pass: config.mongoDbPass || undefined,
                useNewUrlParser: true,
                reconnectTries: config.mongoDbReconnectAttempts || 10,
                reconnectInterval: config.mongoDbReconnectIntervalMs || 1500,
                connectTimeoutMS: config.mongoDbConnectTimeoutMs || 3000,
                socketTimeoutMS: config.mongoDbSocketTimeoutMs || 8000,
                keepAlive: true,
            };

            logger.debug("Initializing MongoDB", {name: "dbConnectionOptions", ...options});
            const db: Mongoose = await mongoose.connect(uri, options);

            if (subscribeToEvents) {

                // register event listener to track connection losses
                mongoose.connection.on("error", err => {
                    logger.error(`MongoDB connection lost (error) @ ${config.mongoDbUri}! Auto-retry is active - check for reconnect event...`, err);
                });

                mongoose.connection.on("reconnectFailed", async err => {
                    try {
                        // all reconnection attempts failed - log fatal right away
                        logger.fatal(`MongoDB auto-reconnection failed @ ${config.mongoDbUri}! Attempting to reset the connection...`, err);

                        // start a new connection attempt
                        await PromiseUtil.delay(1500);
                        this.initMongoDb(0, false);
                    } catch (e) {
                        logger.fatal("Attempting to recover from MongoDb reconnect error failed with another exception. The data tier is dead now!", e);
                    }
                });

                mongoose.connection.on("reconnected", err => {
                    // TODO reenable (this is currently driving me crazy due to https://github.com/Automattic/mongoose/issues/8059
                    logger.info(`MongoDB reconnection success @ ${config.mongoDbUri}`, err);
                });
            }

            logger.info("Successfully connected to MongoDb");
        } catch (e) {
            // mongoose doesn't recover automatically from this kind of exception - reconnect manually
            logger.fatal(`Failed connect to MongoDB @ ${config.mongoDbUri} (attempt #${attemptCounter})! Will retry...`, e);
            await PromiseUtil.delay(2000);
            await this.initMongoDb(attemptCounter, false);
        }
    }

}

@vkarpov15 vkarpov15 added needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue and removed needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity labels Aug 23, 2019
@vkarpov15 vkarpov15 modified the milestones: 5.6.11, 5.6.12 Aug 23, 2019
@vkarpov15 vkarpov15 modified the milestones: 5.6.12, 5.6.13, 5.7.1 Sep 3, 2019
@vkarpov15 vkarpov15 modified the milestones: 5.7.1, 5.7.3 Sep 11, 2019
@vkarpov15 vkarpov15 modified the milestones: 5.7.3, 5.7.4 Sep 27, 2019
@vkarpov15
Copy link
Collaborator

Sorry for taking so long to get back to you, this seems like one of those issues that I'm unlikely to be able to help with. Below script with an Atlas instance doesn't trigger any reconnect events for me:

'use strict';

const mongoose = require('mongoose');

run().catch(err => console.log(err));

async function run() {
  mongoose.connection.on("reconnected", err => {
    console.log(`MongoDB reconnection success @ ${config.mongoDbUri}`);
  });

  await mongoose.connect('mongodb+srv://user:pass@omitted.mongodb.net/admin?retryWrites=true&w=majority', {
    useNewUrlParser: true,
    connectTimeoutMS: 3000,
    socketTimeoutMS: 8000,
    reconnectTries: 10,
    reconnectInterval: 1500
  });

  console.log('Initial connection done');
}

Output:

$ time node gh-8059.js 
Initial connection done
^C

real	0m50.284s
user	0m0.740s
sys	0m0.160s
$ 

The only possibility I can think of right now is what is the ping time to your Atlas machine? It might be timing out because of slow internet...

@vkarpov15 vkarpov15 removed this from the 5.7.4 milestone Oct 7, 2019
@vkarpov15 vkarpov15 added can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Oct 7, 2019
@hardcodet
Copy link
Author

hardcodet commented Oct 10, 2019

This is so weird. I managed to get around the problem by raising the socketTimeoutMS options from 8000 to 30000. I'm not complaining at this point though ;)
Internet is fast btw - I'm getting my data back within just a few milliseconds...

A potentially related issue, that also was gone once I raised that timeout flag:

async startTransaction() {
    console.warn("creating session...");
    const session = await mongoose.startSession();
    console.warn("created session...");
    ...
}

More often than not, I had a creating session entry on the console, followed by a mongoose reconnection event log entry, and then nothing anymore - the promise returned by startSession just kept hanging forever. This happened fairly often, and once it did, any subsequent attempts also were hanging. Again, no idea what was going on...

I'll try to modify your script to get a repro with my Atlas connection, and will post back here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity.
Projects
None yet
Development

No branches or pull requests

2 participants