Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: cache github repository for faster deployments #622

Merged
merged 5 commits into from
Oct 19, 2023
Merged

Conversation

nicoloboschi
Copy link
Member

@nicoloboschi nicoloboschi commented Oct 19, 2023

Currently every time the user specify a github repository for the application/secrets/instance we clone the entire git repository N times. If the user submit the deploy again, the repo is cloned from sctratch again. 99% of use cases are to download from this repo and it takes up to 10 seconds.

Changes:

  • Save git repository clone in ~/.langstream/ghrepos with the repo identifier. If the repo specified already exists, we just reset and fetch to the specified branch. This is much faster than cloning the repo: 3-4s vs 1012s on my machine
  • In case the same command specify the same repository, the second one will be considered up to date and nothing will be done (neither clone or fetch)
  • Added --disable-local-repositories-cache to fallback to the old behaviour
  • In case of failures, the repo is wiped off and it restarts from scratch

The downside of this solution is that the repository is stored "forever" in the user home but I don't think it's a real issue.
Another possible issue but very very unlikely is that two deployments happen in parallel and If at the same time the remote branch is updated. I don't think it's a real issue too

@eolivelli
Copy link
Member

Can we make this behavior configurable?

Copy link
Member

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

What happens in case of corrupted directory ?


@Override
public boolean isDebugEnabled() {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure about this ?

@nicoloboschi nicoloboschi merged commit 362d5d5 into main Oct 19, 2023
9 checks passed
@nicoloboschi nicoloboschi deleted the cache-git branch October 19, 2023 11:03
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants