Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating unnecessary temporary cat file sub process #33942

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lunny
Copy link
Member

@lunny lunny commented Mar 19, 2025

Extract from #33934

In the same goroutine, we should reuse the exist cat file sub process which exist in git.Repository to avoid creating a unnecessary temporary subprocess.

This PR reuse the exist cate file writer and reader in getCommitFromBatchReader.
It also move prepareLatestCommitInfo before creating dataRc which will hold the writer so other git operation will create a temporary cat file subprocess.

@lunny lunny added type/bug backport/v1.23 This PR should be backported to Gitea 1.23 labels Mar 19, 2025
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 19, 2025
@pull-request-size pull-request-size bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 19, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Mar 19, 2025
@lunny lunny changed the title Avoid to create uncessary temporary cat file sub process Avoid creating unnecessary temporary cat file sub process Mar 19, 2025
Copy link
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is not right.

You should make the cat-file reader be thread-safe and be shared by different callers. But not keeping asking developers to strictly arrange the callers orders.

I do not see why the cat-file reader can't be shared, if I understand correctly, it is a standard request-response service provider. Correct me if I am wrong.

@GiteaBot GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 20, 2025
@lunny
Copy link
Member Author

lunny commented Mar 20, 2025

The design is not right.

You should make the cat-file reader be thread-safe and be shared by different callers. But not keeping asking developers to strictly arrange the callers orders.

I do not see why the cat-file reader can't be shared, if I understand correctly, it is a standard request-response service provider. Correct me if I am wrong.

There is no new design in this PR. The new design could be in #33934 or other new PRs. This PR just fix the unnecessary temporary subprocess according to original design so that it could be backport to v1.23.

Regarding the original design: since cat-file relies on a strict input-output order — for example, input1 corresponds to output1, input2 to output2, and so on — I don’t believe it can be easily shared between two goroutines without guaranteed order. The tight coupling between input and output makes concurrent access without coordination problematic.

Regarding the drawback of the original design: it spawns at least one Git subprocess for every Git-related http request. This frequent creation and destroy of subprocesses can be inefficient. Introducing a managed subprocess pool in the background could mitigate this by reusing existing subprocesses, thereby reducing overhead. This approach would function similarly to how Nginx manages worker processes. However, implementing such a mechanism adds significant complexity.

@GiteaBot GiteaBot added lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. and removed lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged labels Mar 20, 2025
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 20, 2025

OK, I won't block it, but I do not think it's worth to use this temporary fix or backport since it isn't fully tested or proven to fix any known user end bugs. The risk of backporting it is much higher than the benefit.

If let me decide, just use a complete fix in main branch and just keep 1.23 as-is. Do not use this temp fix.

@TheFox0x7
Copy link
Contributor

Isn't cat-file command tied to a repository? I don't see how a pool would help, unless there's a constant (or on demand) command running per repo with req-reply interface in front. Question then is what if single cat-file instance is overloaded (I'm assuming it's possible - but I might be wrong)?

Current solution, though inefficient when concurrent requests happen has a clearly defined lifecycle (matching context, so in turn request).


I'm kind of stuck on the idea of NATS replacing queue system in gitea but the req-reply it provides would fit here - assuming we figure out how to bring the cat-file instances up and down on demand and scale if needed. I'm not sure if the OVHD would be worth it though. Just a thought

@wxiaoguang
Copy link
Contributor

Isn't cat-file command tied to a repository?

Yes, it is repo-based command. At the moment, in most cases it is tied to each "request context", and it could also be used independently in a "search indexer". I do not think a pool would help and there is no pool, only some per-request caches.

Question then is what if single cat-file instance is overloaded (I'm assuming it's possible - but I might be wrong)?

Slows down or block repo browsing. cat-file is used to retrieve git entry information.

@lunny
Copy link
Member Author

lunny commented Mar 20, 2025

Isn't cat-file command tied to a repository?

Yes, it is repo-based command. At the moment, in most cases it is tied to each "request context", and it could also be used independently in a "search indexer". I do not think a pool would help and there is no pool, only some per-request caches.

Question then is what if single cat-file instance is overloaded (I'm assuming it's possible - but I might be wrong)?

Slows down or block repo browsing. cat-file is used to retrieve git entry information.

We can have a process pool which keep some idle cat file processes at the background about 5 minutes. So every request will pick one idle or create a new one according to it's repository directory. This method will reduce creating and destroy sub process frequently. The current design will create and destroy cat file sub process per request. This may resolve the windows non-gogit version's performance problem because windows process is heavy.

@lunny lunny force-pushed the lunny/avoid_create_temporary_cat_file branch from 4979b33 to c171248 Compare March 23, 2025 01:08
@github-actions github-actions bot added the modifies/api This PR adds API routes or modifies them label Mar 23, 2025
@lunny lunny removed the backport/v1.23 This PR should be backported to Gitea 1.23 label Mar 23, 2025
Comment on lines 51 to 52
// Don't call any other repository functions until the dataRc closed to
// avoid create unnecessary temporary cat file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is hard to maintain.
There are too many codes and functions below, it is hard to keep this rule in the future.
Also, what are repository functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s a bit complicated, but that’s the current design. This PR doesn’t refactor it—it just improves upon what’s already there. PR #33934 might rework cat-file and help simplify things in the future.

@lunny lunny requested a review from yp05327 March 27, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants