-
Notifications
You must be signed in to change notification settings - Fork 217
Add CPU info to the service info #1511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ae7c215 to
1e607bb
Compare
| return n.status | ||
| } | ||
|
|
||
| func (n *ClusterInstance) GetMachineInfo() *infogrpc.MachineInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the mutex lock here? I think we can remove it from the status getter too.
| roles: make([]infogrpc.ServiceInfoRole, 0), | ||
| status: infogrpc.ServiceInfoStatus_Unhealthy, | ||
| roles: make([]infogrpc.ServiceInfoRole, 0), | ||
| machineInfo: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the empty struct option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the reason to pass empty struct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are essentially doing the same with gRPC, where missing values are treated as empty strings. Removing the pointer here eliminates the need to handle a nil state in all places where machine info is used.
|
Just few nits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Data race in GetStatus due to mutex removal
Removing the read lock from GetStatus creates a race condition. The status field is updated concurrently in syncInstance (called via PoolUpdate) using a write lock, so reading it without synchronization is unsafe.
packages/api/internal/edge/cluster_instances.go#L63-L66
infra/packages/api/internal/edge/cluster_instances.go
Lines 63 to 66 in 20b30b5
| func (n *ClusterInstance) GetStatus() infogrpc.ServiceInfoStatus { | |
| return n.status | |
| } |
Bug: Data race in GetStatus due to mutex removal
Removing the read lock from GetStatus introduces a race condition. The status field is updated concurrently in syncInstance (invoked via PoolUpdate) while holding a lock, so reading it without synchronization is unsafe.
packages/api/internal/edge/cluster_instances.go#L63-L66
infra/packages/api/internal/edge/cluster_instances.go
Lines 63 to 66 in 20b30b5
| func (n *ClusterInstance) GetStatus() infogrpc.ServiceInfoStatus { | |
| return n.status | |
| } |
# Conflicts: # packages/api/internal/orchestrator/nodemanager/sync.go # packages/orchestrator/internal/service/info.go # packages/orchestrator/internal/service/service_info.go # packages/orchestrator/main.go
8a0f527 to
e67097c
Compare
| if len(info) > 0 { | ||
| if info[0].Family == "" || info[0].Model == "" { | ||
| return MachineInfo{}, fmt.Errorf("unable to detect CPU platform from CPU info: %+v", info[0]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Service fails to start on ARM processors
The validation checks if info[0].Family or info[0].Model are empty strings and returns an error. On ARM processors like AWS Graviton, the Model field from cpu.Info() is often empty or unavailable because ARM CPUs don't populate this field in /proc/cpuinfo. This causes the orchestrator service to fail startup on ARM instances with "unable to detect CPU platform from CPU info", even though ModelName contains the necessary information. The PR discussion confirms this concern about Graviton processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll fix once relevant
Note
Expose machine CPU metadata (arch/family/model/name) from orchestrators via gRPC ServiceInfo and surface it in API Node/NodeDetail; update proto/OpenAPI, generated code, and node manager/admin to collect and return it.
MachineInfoschema (cpuArchitecture/cpuFamily/cpuModel/cpuModelName) and include it inNodeandNodeDetail.spec.gen.go,types.gen.go, tests models).MachineInfomessage and addmachine_infotoServiceInfoResponse.machineinfo.Detect) at startup and store inServiceInfo.InfoService.ServiceInfo(with converter).nodemanager.Node; update on sync and creation.machineInfoin adminAdminNodesandAdminNodeDetailresponses.Written by Cursor Bugbot for commit a2ce0f4. This will update automatically on new commits. Configure here.