-
Notifications
You must be signed in to change notification settings - Fork 393
backend: caching: implemented caching for single-cluster #3499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
backend: caching: implemented caching for single-cluster #3499
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: upsaurav12 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Nice one. ps. See commit guidelines, and run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start. Thanks.
Would you mind separating the code out from headlamp.go? Code can be put into a separate module/package, and then headlamp.go can import it and use it. We are in the process of moving as much code out of headlamp.go as possible into package modules. It's all a bit too much code tangled up in there.
Could you please add tests? As well, please add documentation to your functions.
I left a comment about commit message guidelines and running backend-lint.
@illume I have added extracted the cache implementation from I am not sure that this implementation might work for multiple users however i am finding way to test it with services accounts but in development mode it is directly fetching from kubeconfig file and it loads clusters directly without asking for bearer token |
2452767
to
9aa03d1
Compare
How about making an own package for pkg/k8cache? I think maybe it should not go inside pkg/cache, since it's topic is slightly different than what pkg/cache already does. |
I was thinking the same but eventually had to impelment in the same package because of cache initialization and ease of importing functions. I will. modify it. |
I have moved the code to |
Hi I have just tested this cache implementation with different Screen.Recording.-.Jun.28.2025-VEED.mp4However still there are still some points that is need to be fixed/address.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few notes.
I am sorry i could get these replies. i don't know why but they weren't visible to me? |
No worries. I think I wrote them, and then few days came back and noticed they were still pending and submitted the review. I'm not sure why they kept the old date on them, I think it's a github bug. That's why they weren't visible to you - I didn't submit the review yet. |
backend/cmd/server.go
Outdated
return | ||
} | ||
|
||
isAllowed, err := k8cache.IsAllowed(r.URL, kContext, w, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isAllowed
is false then a proper response has to be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new function ReturnAuthErrorResponse
which returns the AuthErrResponse
JSON object if the user is not authorized to access the resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. Is there a test case that covers this?
I think it's important that we test the authentication parts. I don't see a test for ReturnAuthErrorResponse, which is why I'm wondering if there are other test cases that check this part is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't added test cases for new functions i have added after moving cache logic to middleware
so that why you seeing any test for them.
If it is necessary i will write those tests once i have completed the cache implementation (ie make cache optional , and many areas to improve).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good plan.
backend/pkg/k8cache/k8cache.go
Outdated
// If the key was not found inside the cache then this will make actual call to k8's | ||
// and this will capture the response body and convert the captured response to string. | ||
// After converting it will store the response with the key and TTL of 10*min. | ||
func RequestToK8sAndStore(k8scache cache.Cache[string], k *kubeconfig.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does make a request to the k8s api server so lets improve/update the name for better understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please remove unused args from this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed kuberconfig.Context
arg and changed name from RequestToK8sAndStore
to RequestK8ClusterAPIAndStore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yolossn what do you think of the new name?
(To me the old name RequestToK8sAndStore was fine, so I'm not sure about this one)
// getClientMD is used to get a clientset for the given context and token. | ||
// It will reuse clientsets if a matching one is already cached. | ||
func getClientMD(k *kubeconfig.Context, token string) (*kubernetes.Clientset, error) { | ||
contextKey := strings.Split(k.ClusterID, "+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the context behind this string split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, I don't have direct way to get contextID
. I can get contextID by using k.ClusterID
but this contains to many things /home/saurav/.kubeconfig+<contextID>
to get contexKey i have to split
k.ClusterID
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can't think of a cleaner way to do this, can you please leave a comment there explaining what it is about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@upsaurav12 you can just pass the cluster name from the request here or use k.ContextName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update the code. thanks
backend/cmd/server.go
Outdated
func CacheMiddleWare(c *HeadlampConfig) mux.MiddlewareFunc { | ||
return func(next http.Handler) http.Handler { | ||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
if r.Method == "POST" || r.Method == "UPDATE" || r.Method == "DELETE" || r.Method == "PUT" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be if != "GET"
, so it's an allow list of things we should support. It's usually better to do allow lists in case things change. If things change when it's just GET then it should still work. If it's the opposite way if we've missed something or something else is added this condition will be broken.
What about OPTIONS and HEAD? How should they be handled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code and moved the logic to LoadFromCache
. because the request methods like this should not go to cache instead forward to actual request.
Also there was discussion for cache invalidation where we need to remove the stale data in the cache. but i am not sure what should be the workflow.
For example if there was a request like POST
on secrets
resource. if that same resource data is present in cache (using key) we can stale the older data in the cache, and store the new data in cache.
for this we need to check after making request to K8's
(POST
/DELETE
) then it will check that the key was in cache or not if present, remove it or else proceed to store that data.
@yolossn @illume WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Please mark this resolved when you're done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will push code once done completely
…av12/headlamp into caching-pagination-search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces single-cluster caching for Kubernetes API proxy requests, adding key generation, response capture, and middleware integration.
- Add
k8cache
package with key hashing, response capture, cache get/set, and filtering logic - Wire up
CacheEnabled
flag in configuration and server entrypoint to enable caching middleware - Provide comprehensive unit tests for key functions and cache behavior
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
backend/pkg/k8cache/key.go | Define CacheKey struct and SHA hashing utility |
backend/pkg/k8cache/k8cache.go | Implement caching logic: response capture, key generation, load/store |
backend/pkg/k8cache/k8cache_test.go | Add unit tests for hashing, header filtering, marshal/unmarshal |
backend/pkg/headlampconfig/headlampConfig.go | Add CacheEnabled flag to headlamp configuration |
backend/pkg/config/config.go | Add cache-enabled CLI/env flag to core config |
backend/cmd/server.go | Integrate caching middleware based on CacheEnabled |
backend/cmd/headlamp.go | Declare global cache instance and refactor cluster request handler |
Comments suppressed due to low confidence (2)
backend/pkg/k8cache/k8cache.go:167
- Update the comment to match the function name 'FilterHeadersForCache' instead of 'FilterHeaderCache' for clarity.
// FilterHeaderCache ensures that the cached headers accurately reflect the state of the
} | ||
|
||
// UnmarshalCachedData deserialize a JSON string received from cache | ||
// back into a CacheResposeData struct. This function is used to recontructing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix spelling in comment: 'CacheResposeData' should be 'CachedResponseData' and 'recontructing' should be 'reconstructing'.
// back into a CacheResposeData struct. This function is used to recontructing | |
// back into a CachedResponseData struct. This function is used to reconstructing |
Copilot uses AI. Check for mistakes.
Kind: "Status", | ||
APIVersion: "v1", | ||
MetaData: MetaData{}, | ||
Message: fmt.Sprintf("%s is forbidden: User \"system:serviceaccount:default:%s\" cannot", last, contextKey) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space after 'cannot' or before the next '%s' to avoid concatenating words (e.g., 'cannot get').
Message: fmt.Sprintf("%s is forbidden: User \"system:serviceaccount:default:%s\" cannot", last, contextKey) + | |
Message: fmt.Sprintf("%s is forbidden: User \"system:serviceaccount:default:%s\" cannot ", last, contextKey) + |
Copilot uses AI. Check for mistakes.
// TestMarshallToStore tests whether the MarshallToStore | ||
// serialized correctly that will be stored into the cache. | ||
func TestMarshallToStore(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name and comment spell 'MarshallToStore' with two L's. It should be 'MarshalToStore' to match the function name.
// TestMarshallToStore tests whether the MarshallToStore | |
// serialized correctly that will be stored into the cache. | |
func TestMarshallToStore(t *testing.T) { | |
// TestMarshalToStore tests whether the MarshalToStore | |
// serialized correctly that will be stored into the cache. | |
func TestMarshalToStore(t *testing.T) { |
Copilot uses AI. Check for mistakes.
attribute.String("cluster", mux.Vars(r)["clusterName"]), | ||
) | ||
|
||
defer span.End() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling 'defer span.End()' inside this helper ends the trace span immediately when this function returns, which may be before downstream logic executes. Consider ending the span in the caller after the handler completes.
defer span.End() | |
// Note: The caller is responsible for ending the span by calling span.End(). |
Copilot uses AI. Check for mistakes.
@yolossn @upsaurav12 I went through an marked a few convos as resolved that were finished. What's the plan to work on next? (what's left to do?) Is it getting near a point where the code should be cleaned up and merged? Maybe now it's a good time to start adding tests, and breaking it up into clean commits? |
Implementation
I have implemented caching for the single-cluster. For multi-cluster i am trying to check by creating user by changing
use-context
.Here how i have implemented:
ProxyRequest
and storing it incache
in string.gzip
to byte and then to string.selfsubjectrulesreviews
and if it returnsFailure
then it is not authenticated. If doesn't then process to acquire fromcache
.kind
andnamespace
andclusters
and it will be same if the user is making the same requests.cache
then it will go for actual request tok8's
and then store in cache.However more things i have to add like cache should be
synced
with the latest k8's value but i am finding solution whether i should go forcountinuously make requests to k8's after 5s or fixed time
oruse watch events from k8's
.Test Results