Skip to content

Conversation

@jakubno
Copy link
Member

@jakubno jakubno commented Nov 19, 2025

Note

Expose machine CPU metadata (arch/family/model/name) from orchestrators via gRPC ServiceInfo and surface it in API Node/NodeDetail; update proto/OpenAPI, generated code, and node manager/admin to collect and return it.

  • API/Spec:
    • Add MachineInfo schema (cpuArchitecture/cpuFamily/cpuModel/cpuModelName) and include it in Node and NodeDetail.
    • Regenerate OpenAPI client/server code (spec.gen.go, types.gen.go, tests models).
  • gRPC/Proto:
    • Introduce MachineInfo message and add machine_info to ServiceInfoResponse.
    • Regenerate protobufs.
  • Orchestrator Service:
    • Detect machine CPU info (machineinfo.Detect) at startup and store in ServiceInfo.
    • Return machine info in InfoService.ServiceInfo (with converter).
  • Node Manager/Admin:
    • Store machine info in nodemanager.Node; update on sync and creation.
    • Include machineInfo in admin AdminNodes and AdminNodeDetail responses.

Written by Cursor Bugbot for commit a2ce0f4. This will update automatically on new commits. Configure here.

@jakubno jakubno added the improvement Improvement for current functionality label Nov 19, 2025
@linear
Copy link

linear bot commented Nov 19, 2025

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@sitole sitole self-requested a review November 19, 2025 17:15
@sitole sitole self-assigned this Nov 19, 2025
@jakubno jakubno force-pushed the add-cpu-family-to-the-service-info-eng-3322 branch from ae7c215 to 1e607bb Compare November 19, 2025 20:27
@jakubno jakubno requested a review from sitole November 20, 2025 09:30
@dobrac dobrac assigned sitole and unassigned sitole and dobrac Nov 20, 2025
return n.status
}

func (n *ClusterInstance) GetMachineInfo() *infogrpc.MachineInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the mutex lock here? I think we can remove it from the status getter too.

roles: make([]infogrpc.ServiceInfoRole, 0),
status: infogrpc.ServiceInfoStatus_Unhealthy,
roles: make([]infogrpc.ServiceInfoRole, 0),
machineInfo: nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the empty struct option here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason to pass empty struct here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are essentially doing the same with gRPC, where missing values are treated as empty strings. Removing the pointer here eliminates the need to handle a nil state in all places where machine info is used.

@sitole
Copy link
Member

sitole commented Nov 20, 2025

Just few nits

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Data race in GetStatus due to mutex removal

Removing the read lock from GetStatus creates a race condition. The status field is updated concurrently in syncInstance (called via PoolUpdate) using a write lock, so reading it without synchronization is unsafe.

packages/api/internal/edge/cluster_instances.go#L63-L66

func (n *ClusterInstance) GetStatus() infogrpc.ServiceInfoStatus {
return n.status
}

Fix in Cursor Fix in Web


Bug: Data race in GetStatus due to mutex removal

Removing the read lock from GetStatus introduces a race condition. The status field is updated concurrently in syncInstance (invoked via PoolUpdate) while holding a lock, so reading it without synchronization is unsafe.

packages/api/internal/edge/cluster_instances.go#L63-L66

func (n *ClusterInstance) GetStatus() infogrpc.ServiceInfoStatus {
return n.status
}

Fix in Cursor Fix in Web


@sitole sitole self-requested a review November 20, 2025 15:02
cursor[bot]

This comment was marked as resolved.

# Conflicts:
#	packages/api/internal/orchestrator/nodemanager/sync.go
#	packages/orchestrator/internal/service/info.go
#	packages/orchestrator/internal/service/service_info.go
#	packages/orchestrator/main.go
@jakubno jakubno force-pushed the add-cpu-family-to-the-service-info-eng-3322 branch from 8a0f527 to e67097c Compare November 24, 2025 12:12
@jakubno jakubno changed the title Add CPU family to the service info Add CPU info to the service info Nov 24, 2025
if len(info) > 0 {
if info[0].Family == "" || info[0].Model == "" {
return MachineInfo{}, fmt.Errorf("unable to detect CPU platform from CPU info: %+v", info[0])
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Service fails to start on ARM processors

The validation checks if info[0].Family or info[0].Model are empty strings and returns an error. On ARM processors like AWS Graviton, the Model field from cpu.Info() is often empty or unavailable because ARM CPUs don't populate this field in /proc/cpuinfo. This causes the orchestrator service to fail startup on ARM instances with "unable to detect CPU platform from CPU info", even though ModelName contains the necessary information. The PR discussion confirms this concern about Graviton processors.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll fix once relevant

@jakubno jakubno merged commit a3c2803 into main Nov 25, 2025
28 checks passed
@jakubno jakubno deleted the add-cpu-family-to-the-service-info-eng-3322 branch November 25, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement for current functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants