fix: turn atlas-connect-cluster async #343

fmenezes · 2025-07-07T17:24:26Z

fixes #321

Proposed changes

turn atlas-connect-cluster tool into an async tool. Unfortunately, rely on atlas user creation api to connect, sometimes the propagation time of the user from control plane to the data plane can take time, there is no state to query other than check if user has access to DB.

Checklist

I have signed the MongoDB CLA

coveralls · 2025-07-07T17:32:54Z

Pull Request Test Coverage Report for Build 16194572196

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

33 of 51 (64.71%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.8%) to 74.831%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/session.ts	3	5	60.0%
src/tools/mongodb/mongodbTool.ts	5	8	62.5%
src/tools/atlas/metadata/connectCluster.ts	25	38	65.79%

Files with Coverage Reduction	New Missed Lines	%
src/tools/atlas/metadata/connectCluster.ts	2	61.74%

Totals
Change from base Build 16135914646:	-0.8%
Covered Lines:	863
Relevant Lines:	1065

💛 - Coveralls

coveralls · 2025-07-07T17:32:54Z

Pull Request Test Coverage Report for Build 16123662219

Details

0 of 34 (0.0%) changed or added relevant lines in 1 file are covered.
129 unchanged lines in 15 files lost coverage.
Overall coverage decreased (-13.3%) to 60.879%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/tools/atlas/metadata/connectCluster.ts	0	34	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/common/atlas/generatePassword.ts	3	25.0%
src/tools/atlas/create/createFreeCluster.ts	3	57.14%
src/tools/atlas/read/inspectCluster.ts	3	23.53%
src/tools/atlas/read/listAlerts.ts	3	20.0%
src/tools/atlas/read/listOrgs.ts	3	72.73%
src/tools/atlas/read/inspectAccessList.ts	4	33.33%
src/tools/atlas/create/createDBUser.ts	6	25.0%
src/tools/atlas/create/createProject.ts	8	46.67%
src/common/atlas/cluster.ts	9	0.0%
src/tools/atlas/create/createAccessList.ts	10	16.13%

Totals
Change from base Build 16077040616:	-13.3%
Covered Lines:	704
Relevant Lines:	1046

💛 - Coveralls

Copilot

Pull Request Overview

This PR refactors the atlas-connect-cluster tool into an asynchronous background process and updates integration tests to poll until the cluster connection is established.

Tests now loop with retries to wait for the async connection to succeed.
ConnectClusterTool has been split into query, prepare, and background connect phases, returning immediately with an “Attempting…” message.
Added new log IDs for connect attempts and successes in logger.ts.

Comments suppressed due to low confidence (1)

tests/integration/tools/atlas/clusters.test.ts:195

Add an assertion after the loop (or inside the success branch) to fail the test if the cluster never reports "Cluster is already connected.", otherwise the test may silently pass without verifying a successful connection.

                for (let i = 0; i < 600; i++) {

Copilot · 2025-07-08T06:55:41Z

src/tools/atlas/metadata/connectCluster.ts

+        await this.session.serviceProvider.runCommand("admin", {
+            ping: 1,
+        });
+        return "connected";


Wrap the runCommand('admin', { ping: 1 }) call in a try/catch so transient ping errors don’t bubble up and trigger a full reconnection flow prematurely.

Suggested change

await this.session.serviceProvider.runCommand("admin", {

ping: 1,

});

return "connected";

try {

await this.session.serviceProvider.runCommand("admin", {

ping: 1,

});

return "connected";

} catch (error) {

logger.warn(LogId.ConnectionPingError, `Ping command failed: ${error.message}`);

return "connecting";

}

no need, I'm bubbling up the error

Copilot · 2025-07-08T06:55:41Z

src/tools/atlas/metadata/connectCluster.ts

+        for (let i = 0; i < 600; i++) {
+            // try for 5 minutes


Extract the retry count (600) and delay (500ms) into named constants to improve readability and ease future adjustments.

Suggested change

for (let i = 0; i < 600; i++) {

// try for 5 minutes

for (let i = 0; i < RETRY_COUNT; i++) {

// try for RETRY_COUNT attempts

Copilot · 2025-07-08T06:55:41Z

src/logger.ts

@@ -17,6 +17,8 @@ export const LogId = {
    atlasDeleteDatabaseUserFailure: mongoLogId(1_001_002),
    atlasConnectFailure: mongoLogId(1_001_003),
    atlasInspectFailure: mongoLogId(1_001_004),
+    atlasConnectAttempt: mongoLogId(1_001_005),
+    atlasConnectSuccessed: mongoLogId(1_001_006),


The log ID key atlasConnectSuccessed is misspelled; consider renaming it to atlasConnectSucceeded or atlasConnectSuccess for clarity.

Suggested change

atlasConnectSuccessed: mongoLogId(1_001_006),

atlasConnectSucceeded: mongoLogId(1_001_006),

nirinchev · 2025-07-08T10:57:32Z

src/tools/atlas/metadata/connectCluster.ts

        const connectionString = cn.toString();

+        return connectionString;


Suggested change

const connectionString = cn.toString();

return connectionString;

return cn.toString();

nirinchev · 2025-07-08T11:00:26Z

src/tools/atlas/metadata/connectCluster.ts

+                            groupId: this.session.connectedAtlasCluster?.projectId || "",
+                            username: this.session.connectedAtlasCluster?.username || "",


If those are not set, does it make sense to make that call at all?

src/tools/atlas/metadata/connectCluster.ts

fmenezes · 2025-07-08T15:02:36Z

@nirinchev this is ready for another look

nirinchev

Looks good - let's add some clarifying comments (feel free to use my suggestion or reword them) and fix the retry attempts for the "connects to cluster" test.

nirinchev · 2025-07-09T12:30:46Z

src/tools/atlas/metadata/connectCluster.ts

+    }
+
+    protected async execute({ projectId, clusterName }: ToolArgs<typeof this.argsShape>): Promise<CallToolResult> {
+        const connectingResult = {


Any reason to have this at the top of the function if it's only used at the last catch clause?

yes, it is used in two cases, the last return and a switch case early on case "connecting"

nirinchev · 2025-07-09T12:32:44Z

src/tools/atlas/metadata/connectCluster.ts

+        const connectionString = await this.prepareClusterConnection(projectId, clusterName);
+
+        try {
+            await this.connectToCluster(connectionString, 60);


Let's add some comments that could guide readers - e.g.:

Suggested change

await this.connectToCluster(connectionString, 60);

// First, try to connect to the cluster within the current tool call.

// We give it 60 attempts with 500 ms delay between each, so ~30 seconds

await this.connectToCluster(connectionString, 60);

nirinchev · 2025-07-09T12:35:09Z

src/tools/atlas/metadata/connectCluster.ts

+                `error connecting to cluster: ${error.message}`
+            );
+
+            process.nextTick(async () => {


Suggested change

process.nextTick(async () => {

// We couldn't connect in ~30 seconds, likely because user creation is taking longer

// Retry the connection with longer timeout (~5 minutes), while also returning a response

// to the client. Many clients will have a 1 minute timeout for tool calls, so we want to

// return well before that.

//

// Once we add support for streamable http, we'd want to use progress notifications here.

process.nextTick(async () => {

nirinchev · 2025-07-09T12:36:50Z

tests/integration/tools/atlas/clusters.test.ts

-                expect(response.content).toBeArray();
-                expect(response.content).toHaveLength(1);
-                expect(response.content[0]?.text).toContain(`Connected to cluster "${clusterName}"`);
+                for (let i = 0; i < 600; i++) {


Now that we added the 30 second retry logic in the connect tool, this may be a bit excessive - worst case scenario, this will result in 5 hours of waiting for the test to fail.

I'd expect jest has some default test timeouts. maybe we can set an explicit timeout here and turn this into a while loop or create a waitFor / retry helper

This is actually not the case, on subsequent calls we don't try for 30 secs, we know there is a background process running so we return Attempting ... message straight away.

changed to reflect what we discussed offline, now we always wait 30 secs, adjusted the test to 10 times only

gagik · 2025-07-09T15:16:06Z

src/tools/atlas/metadata/connectCluster.ts

+                `error connecting to cluster: ${error.message}`
+            );
+
+            process.nextTick(async () => {


what is the benefit of using process.nextTick here?
There's no problematic operation that'd take precedence I can think of that would justify its usage here.

this is my bad, when I use to work with node we didn't have/use promises (10 yrs ago) so process.nextTick was used for async operations, now-a-days void somePromise is much handier and I keep forgetting

nirinchev · 2025-07-10T11:04:59Z

src/tools/atlas/metadata/connectCluster.ts

+                case "unknown":
+                default:
+                    await this.session.disconnect();
+                    const connectionString = await this.prepareClusterConnection(projectId, clusterName);


We should fix this, though it seems like we can move it entirely into connectToCluster, can't we?

Actually, we can't - we need to await this before we continue because it's setting the connected cluster on the session. Guess we just need to wrap it in curly braces.

fixed the tests and syntax

* main: chore: revoke access tokens on server shutdown [MCP-53] (#352) fix: turn atlas-connect-cluster async (#343)

fmenezes added 3 commits July 7, 2025 18:24

fix: turn atlas-connect-cluster async

e654962

fix: update decription

24298ed

fix: format

f45b22f

fmenezes added 6 commits July 7, 2025 18:33

fix: add logs

d2f91d5

fix: tests

8eeb786

fix

6c84179

fix: update result

dad0111

fix: styles

d2c54ae

Merge branch 'main' into mcp-46

e978c82

fmenezes marked this pull request as ready for review July 8, 2025 06:54

Copilot AI review requested due to automatic review settings July 8, 2025 06:54

fmenezes requested a review from a team as a code owner July 8, 2025 06:54

Copilot AI reviewed Jul 8, 2025

View reviewed changes

fmenezes added 2 commits July 8, 2025 08:16

fix: improve model interpretation

54dfd7b

fix: tests

42b3e47

nirinchev reviewed Jul 8, 2025

View reviewed changes

fix: comments

e267b45

fmenezes requested a review from nirinchev July 8, 2025 13:44

fix: mix sync and async

28372be

nirinchev approved these changes Jul 9, 2025

View reviewed changes

gagik reviewed Jul 9, 2025

View reviewed changes

fmenezes added 3 commits July 10, 2025 10:46

fix: address comments

2134f16

fix: comment

57981cb

fix: check

693d31c

gagik approved these changes Jul 10, 2025

View reviewed changes

fmenezes added 2 commits July 10, 2025 11:22

fix: move connect to always wait 30secs

4274f1b

fix: user

d1b2324

fmenezes added 3 commits July 10, 2025 11:38

fix: add IP warning

0acc685

fix: add IP warning

a0ce60c

fix: styles

11d3a14

nirinchev approved these changes Jul 10, 2025

View reviewed changes

fmenezes added 5 commits July 10, 2025 12:08

fix: styles

0286a89

fix: tests

1127f4c

fix: tests

0a1e57c

fix: tests

7430d49

fix: metadata

bf41f0d

fmenezes enabled auto-merge (squash) July 10, 2025 12:07

fmenezes merged commit 27c52b4 into main Jul 10, 2025
18 checks passed

fmenezes deleted the mcp-46 branch July 10, 2025 12:10

nirinchev added a commit that referenced this pull request Jul 11, 2025

Merge branch 'main' into ni/connect-guidance

eed30f8

* main: chore: revoke access tokens on server shutdown [MCP-53] (#352) fix: turn atlas-connect-cluster async (#343)

	atlasConnectSuccessed: mongoLogId(1_001_006),
	atlasConnectSucceeded: mongoLogId(1_001_006),

		const connectionString = cn.toString();

		return connectionString;

		groupId: this.session.connectedAtlasCluster?.projectId \|\| "",
		username: this.session.connectedAtlasCluster?.username \|\| "",

-            process.nextTick(async () => {
+            // We couldn't connect in ~30 seconds, likely because user creation is taking longer
+            // Retry the connection with longer timeout (~5 minutes), while also returning a response
+            // to the client. Many clients will have a 1 minute timeout for tool calls, so we want to
+            // return well before that.
+            //
+            // Once we add support for streamable http, we'd want to use progress notifications here.
+            process.nextTick(async () => {

fix: turn atlas-connect-cluster async #343

fix: turn atlas-connect-cluster async #343

Uh oh!

Conversation

fmenezes commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

coveralls commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16194572196

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

coveralls commented Jul 7, 2025

Pull Request Test Coverage Report for Build 16123662219

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmenezes commented Jul 8, 2025

Uh oh!

nirinchev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gagik Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmenezes Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gagik Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmenezes Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmenezes commented Jul 7, 2025 •

edited

Loading

coveralls commented Jul 7, 2025 •

edited

Loading

gagik Jul 9, 2025 •

edited

Loading

fmenezes Jul 10, 2025 •

edited

Loading

gagik Jul 9, 2025 •

edited

Loading

fmenezes Jul 10, 2025 •

edited

Loading