-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Ported][CI/Build] Share HuggingFace downloads between test runs #4874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ported][CI/Build] Share HuggingFace downloads between test runs #4874
Conversation
Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process. |
Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts? |
I'm not that experienced in Kubernetes but from my understanding, placing the HuggingFace cache inside a Volume should avoid the need to redownload the models when tests are run again in the same Pod. @rkooo567 is it possible on your end to force the CI to run on the same Pod so we can test whether the cache actually works in this way? |
Hmm I am not super familiar with how CI works actually (idk if we even use k8s under the hood). cc @simon-mo for thoughts.. |
@khluu since you're involved with CI, can you help out with this? Particularly the part concerning Kubernetes. |
I believe we should download the model each time. @robertgshaw2-neuralmagic mentioned that putting them on NFS is a bit tricky because it might reaches rate limit. |
hostPath a possible workaround. |
Hmm I am against this. Imo we should test the default config for those tests (especially the test_model) |
I have updated this PR to work on AWS pipeline. Looks like this shaved around 10 minutes off the duration of model tests. Going to rerun the test just to be sure. |
This doesn't seem to be the case anymore. It's hard to determine the real effect since the test runs aren't necessarily performed on the same machine (from my understanding). |
Due to #5757, I have moved this PR to vllm-project/ci-infra#8. |
The models tests keep getting interrupted (presumably due to running too long). This PR attempts to reduce the running time via:
Share the HuggingFace cache between Kubernetes containers during CI by storing it in ahostPath
volume.Disabling graph construction (considering that the vLLM model is only run once per test, not including the profile run).