Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHProxy: Fix cache mode detection and log an error if cache write fails #22610

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

alvaroaleman
Copy link
Member

@alvaroaleman alvaroaleman commented Jun 18, 2021

Currently, the ghproxy cache mode detection is broken:

[2021-06-18-16:04]:[app.ci@ci][master]~/git/work/test-infra
$ curl  -H "Authorization: Bearer $TOKEN" http://localhost:8888/repos/openshift/ci-tools/git/refs/heads/master -v 2>&1|rg 'X-Cache-Mode|Etag:|X-Ratelimit-Remaining:'
< Etag: W/"5c6a041c5b02551695905605c51f434b0320df906b026c63a9683db5d50ff185"
< X-Cache-Mode: MISS
< X-Ratelimit-Remaining: 4022
[2021-06-18-16:04]:[app.ci@ci][master]~/git/work/test-infra
$ curl  -H "Authorization: Bearer $TOKEN" http://localhost:8888/repos/openshift/ci-tools/git/refs/heads/master -v 2>&1|rg 'X-Cache-Mode|Etag:|X-Ratelimit-Remaining:'
< Etag: "5c6a041c5b02551695905605c51f434b0320df906b026c63a9683db5d50ff185"
< X-Cache-Mode: CHANGED
< X-Ratelimit-Remaining: 4022
[2021-06-18-16:04]:[app.ci@ci][master]~/git/work/test-infra
$ curl  -H "Authorization: Bearer $TOKEN" http://localhost:8888/repos/openshift/ci-tools/git/refs/heads/master -v 2>&1|rg 'X-Cache-Mode|Etag:|X-Ratelimit-Remaining:'
< Etag: "5c6a041c5b02551695905605c51f434b0320df906b026c63a9683db5d50ff185"
< X-Cache-Mode: CHANGED
< X-Ratelimit-Remaining: 4022

The second and third request return an X-Cache-Mode of CHANGED, even though they didn't use a token (X-Ratelimit-Remaining remains the same). This is because the cache mode detection relies on a header that is never present:

if strings.Contains(headers.Get("Status"), "304 Not Modified") {

I presume that Header used to be returned by GitHub but isn't anymore. Because the cache lib we are using is unmtaintained, I have forked it and made it inject that very same header when it serves a request from cache: alvaroaleman/httpcache@0b0fe54

All of this investigation happened in the first place because we ran out of tokens. This turned out to ultimatively be caused by the disk used by ghproxy running out of inodes. Unfortunately, the cache doesn't log any error in that case. I have added another commit to my fork that does that: alvaroaleman/httpcache@ab9a1a3

/assign @chaodaiG @cjwagner

@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 18, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 18, 2021
@alvaroaleman
Copy link
Member Author

I've created upstream PRs for the two issues here, just in case the author responds:

Should those get merged, I'll switch us back. IMHO we should switch to the fork for now though, the completely borked cache mode detection is pretty bad.

@stevekuznetsov
Copy link
Contributor

If we wanted TTL-based pruning for the cache data would that need to come as a new commit to the library as well?

@stevekuznetsov
Copy link
Contributor

The two patches both look good.

/lgtm
/hold

@cjwagner @chaodaiG does this approach with vendoring work for you?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2021
@alvaroaleman
Copy link
Member Author

Another possible approach would be to just copy the lib into test-infra, not sure if that is better or worse.

@cjwagner
Copy link
Member

We're ok with the vendoring approach if it is the best option, but I think we might be working around the problem rather than solving it.
Here are GitHub's docs on conditional requests: https://docs.github.com/en/rest/guides/getting-started-with-the-rest-api#conditional-requests The API should be returning 304 if the resource is unmodified.
To me it looks like the issue is that we are depending on the presence of a Status header rather than checking the actual HTTP response code. I think checking if the status code is 304 would be sufficient to resolve this, WDYT?

@alvaroaleman
Copy link
Member Author

The API should be returning 304 if the resource is unmodified.

It does, but the cache replaces the GitHub response with the cached response if that happens: https://github.com/gregjones/httpcache/blob/901d90724c7919163f472a9812253fb26761123d/httpcache.go#L195

The only thing it keeps from the 304 response is the headers, except for a list of disabled ones (https://github.com/gregjones/httpcache/blob/901d90724c7919163f472a9812253fb26761123d/httpcache.go#L418) which I strongly presume is the reason the detection works by inspecting this Status header.

I am very certain there is no way to get this working without modifying the httpcache lib, I did try that first.

@alvaroaleman
Copy link
Member Author

I am very certain there is no way to get this working without modifying the httpcache lib, I did try that first.

Hm, we could feed a custom http transport into the lib that sets this Header. But the "log error when writing to cache fails" is definitely not fixable without changing the lib and I would like to have that.

@cjwagner
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, cjwagner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2021
@alvaroaleman
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 22, 2021
@k8s-ci-robot k8s-ci-robot merged commit 46d20b5 into kubernetes:master Jun 22, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants