Skip to content

feat(nscale): implement topology provider#239

Merged
dmitsh merged 2 commits into
mainfrom
ds-nscale
May 14, 2026
Merged

feat(nscale): implement topology provider#239
dmitsh merged 2 commits into
mainfrom
ds-nscale

Conversation

@dmitsh
Copy link
Copy Markdown
Collaborator

@dmitsh dmitsh commented Mar 23, 2026

No description provided.

@dmitsh dmitsh requested a review from ravisoundar March 23, 2026 22:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 74.48980% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.28%. Comparing base (1875ab8) to head (c895eb6).
⚠️ Report is 47 commits behind head on main.

Files with missing lines Patch % Lines
pkg/providers/nscale/provider.go 65.68% 29 Missing and 6 partials ⚠️
pkg/providers/nscale/provider_sim.go 77.55% 9 Missing and 2 partials ⚠️
pkg/providers/nscale/instance_topology.go 91.11% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #239      +/-   ##
==========================================
+ Coverage   68.46%   70.28%   +1.82%     
==========================================
  Files          82       82              
  Lines        4842     4866      +24     
==========================================
+ Hits         3315     3420     +105     
+ Misses       1395     1290     -105     
- Partials      132      156      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

This PR adds the nscale topology provider, which reads switch-path and block-ID data from the Nscale Radar API and instance metadata from the Nscale Instance API, converting them into Topograph's canonical three-tier graph.

  • Core implementation (provider.go, instance_topology.go): baseProvider manages pagination against the Radar API using GET /v1/topology, mapping network_node_path[0..2] to Core/Spine/Leaf tiers. Provider additionally implements the Slurm instanceMapper interface via Instances2NodeMap and GetInstancesRegions, using the Instance API for hostname discovery.
  • Simulation (provider_sim.go): A dedicated simProvider type with a simClient that replays a model file; GetComputeInstances is correctly scoped to simProvider only, avoiding the production panic that occurs when it lives on the shared Provider type.
  • Registration and docs: nscale.NamedLoader and nscale.NamedLoaderSim are registered in the registry, and new documentation covers credentials, params, config examples, and verification steps.

Confidence Score: 5/5

The new nscale provider is safe to merge; the implementation follows established patterns, sim/production separation is correctly structured, and pagination and path-mapping logic are sound.

All structural concerns raised in prior review rounds are addressed in the current code. No new correctness or security issues were found.

No files require special attention.

Important Files Changed

Filename Overview
pkg/providers/nscale/provider.go Core production provider: defines baseProvider, nscaleClient, Credentials, ProviderParams, and Slurm interface methods. Region is validated defensively at call sites rather than at load time.
pkg/providers/nscale/instance_topology.go Pagination loop and switch-based path assignment look correct; minPathSize correctly caps at 4 to fire the warning exactly once per oversized path. TopologyResult doc comment says 'single instance' but it represents a page of results.
pkg/providers/nscale/provider_sim.go Simulation provider cleanly separates simProvider from Provider; GetComputeInstances is on simProvider only, avoiding the production panic. Safe path reversal is implemented correctly.
pkg/providers/nscale/provider_sim_test.go Good coverage of tree and block topology formats, pagination, error cases, and missing region.
pkg/providers/nscale/provider_test.go Covers Loader validation for missing params/creds and tests Instances2NodeMap against a real httptest server.
pkg/registry/registry.go Registers both nscale.NamedLoader and nscale.NamedLoaderSim; trivial import/registration change.
docs/providers/nscale.md New provider documentation is consistent with the implementation.

Sequence Diagram

sequenceDiagram
    participant Client as Topograph Client
    participant BP as baseProvider
    participant NC as nscaleClient (Radar API)
    participant IA as nscaleClient (Instance API)

    Client->>BP: GenerateTopologyConfig(ctx, pageSize, instances)
    BP->>BP: generateInstanceTopology()

    loop For each ComputeInstances region
        BP->>BP: generateRegionInstanceTopology()
        loop Paginate until empty response
            BP->>NC: "GET /v1/topology?limit=&offset="
            NC-->>BP: []InstanceTopology
            BP->>BP: Map path[0..2] to Core/Spine/Leaf
            BP->>BP: topo.Append()
        end
    end

    BP->>BP: topo.ToThreeTierGraph()
    BP-->>Client: topology.Graph

    Note over Client,IA: Slurm auto-discovery path
    Client->>BP: Instances2NodeMap(ctx, nodes)
    BP->>IA: "GET /v2/instances?organizationID=&regionID="
    IA-->>BP: []instance metadata
    BP->>BP: Filter by requested node names
    BP-->>Client: map[instanceID]nodeName
Loading

Reviews (9): Last reviewed commit: "Added top level results attribute to the..." | Re-trigger Greptile

Comment thread pkg/providers/nscale/instance_topology.go Outdated
Comment thread pkg/providers/nscale/provider_sim.go Outdated
Comment thread pkg/providers/nscale/instance_topology.go Outdated
Comment thread pkg/providers/nscale/provider_sim.go Outdated
Comment thread pkg/providers/nscale/provider.go Outdated
Comment thread pkg/providers/nscale/provider.go
@dmitsh dmitsh force-pushed the ds-nscale branch 3 times, most recently from 5c2fe9b to 00126e1 Compare March 24, 2026 16:56
@dmitsh dmitsh marked this pull request as draft April 14, 2026 17:25
@dmitsh dmitsh marked this pull request as draft April 14, 2026 17:25
@resker resker mentioned this pull request Apr 17, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 6, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@dmitsh dmitsh force-pushed the ds-nscale branch 3 times, most recently from 7fcf0df to b6a6c3f Compare May 8, 2026 22:03
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dmitsh dmitsh force-pushed the ds-nscale branch 2 times, most recently from fec7966 to b81bdf0 Compare May 13, 2026 17:32
@dmitsh dmitsh marked this pull request as ready for review May 13, 2026 17:32
ravisoundar
ravisoundar previously approved these changes May 13, 2026
dmitsh and others added 2 commits May 13, 2026 17:18
Signed-off-by: Dmitry Shmulevich <dshmulevich@nvidia.com>
Signed-off-by: Ravi Shankar <ravish@nvidia.com>
@dmitsh dmitsh merged commit 9f1f8a6 into main May 14, 2026
8 checks passed
@dmitsh dmitsh deleted the ds-nscale branch May 14, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants