-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add /health endpoint [RM-114] #9062
Conversation
✅ Deploy Preview for determined-ui ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9062 +/- ##
========================================
Coverage 47.05% 47.06%
========================================
Files 1154 1154
Lines 142272 142372 +100
Branches 2423 2423
========================================
+ Hits 66953 67005 +52
- Misses 75129 75177 +48
Partials 190 190
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nice work!
All the comments questions are for my own understanding / curiosity.
hc.Database = model.Healthy | ||
_, err := db.Bun().NewSelect().Table("cluster_id").Exists(ctx) | ||
if err != nil { | ||
hc.Database = model.Unhealthy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity question: why not return as soon as one element is not healthy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid returning partial responses
func TestHealthCheck(t *testing.T) { | ||
api, _, _ := setupAPITest(t, nil) | ||
|
||
assertHealthCheck := func(t *testing.T, expectedCode int, expectedHealth model.HealthCheck) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - love how this is organized.
func (p *pods) HealthStatus() model.HealthStatus { | ||
p.mu.Lock() | ||
defer p.mu.Unlock() | ||
for _, podInterface := range p.podInterfaces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my understanding: would range p.podInterfaces
iterate over all pods in the cluster? I see this loop exits early based on the first response; I'm curious about the general intent of podInterfaces
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.podInterfaces
contains a mapping from namespaces to kubernetes clients.
We used to use namespace all to avoid needing a client per namespace but this caused permission issues for some customers
https://github.com/determined-ai/determined/blob/85bb3c8881bbd23c34e07359db096fd337e90a51/master/.golangci.yml#L80C10-L80C16
# | ||
# This is a syntax error. This is a small hack to get around this just | ||
# by removing the "model." prefix. | ||
merge_dict(spec, json.loads(f.read().replace("model.", ""))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here? I'm not familiar with swagger.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a swagger spec that we create and host for our API documentation. Also we generate bindings.py
and api.ts
from this swagger spec.
```swagger.py```` is used to combine swagger specs since we generate two and apply some processing. The first comes from our grpc API and the second comes from our echo API generated by swag https://github.com/swaggo/swag
The process is here.
Lines 54 to 68 in 85bb3c8
$(echo_swagger_patch_dir): | |
mkdir -p $(echo_swagger_patch_dir) | |
$(echo_swagger_patch): $(echo_swagger_patch_dir) $(echo_swagger_source_files) | |
swag init -g ../master/cmd/determined-master/main.go -d ../master/. -o $(echo_swagger_patch_dir) -ot json | |
jq 'del(.swagger, .info)' $(echo_swagger_patch) > $(echo_swagger_patch).tmp | |
mv $(echo_swagger_patch).tmp $(echo_swagger_patch) | |
build/swagger: | |
mkdir -p build/swagger | |
$(swagger_out): $(source_files) build/swagger $(swagger_patch) $(echo_swagger_patch) | |
protoc -I src $(swagger_in) --swagger_out=logtostderr=true,allow_delete_body=true,json_names_for_fields=true:build/swagger | |
python3 scripts/swagger.py $@ $(swagger_patch) | |
python3 scripts/swagger.py $@ $(echo_swagger_patch) |
a3f8ac3
to
ee35a29
Compare
4a10a91
to
789b984
Compare
Description
Adds a
/heath
endpoint.Test Plan
Intg / unit tests should cover it
det dev curl /health
for manualCommentary (optional)
Checklist
docs/release-notes/
.See Release Note for details.
Ticket