Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft][CI/Build] Optimize models tests #4874

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

DarkLight1337
Copy link
Collaborator

@DarkLight1337 DarkLight1337 commented May 17, 2024

The models tests keep getting interrupted (presumably due to running too long). This PR attempts to reduce the running time via:

  • Share the HuggingFace cache between Kubernetes containers during CI by storing it in a hostPath volume.
    • Note: hostPath volumes have associated security risks. Is there another way for agent-stack-k8s to use a persistent volume?
  • Disabling graph construction (considering that the vLLM model is only run once per test, not including the profile run).

This PR also adds tqdm to the common dependencies. This is not actually a new dependency since it is currently used in vllm/entrypoints/llm.py; I just found that it is not in the requirements.txt file when trying to use tqdm for downloading the models.

@DarkLight1337 DarkLight1337 marked this pull request as draft May 17, 2024 03:00
@DarkLight1337
Copy link
Collaborator Author

Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process.

@DarkLight1337
Copy link
Collaborator Author

DarkLight1337 commented May 17, 2024

Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts?

@rkooo567 rkooo567 self-assigned this May 17, 2024
@DarkLight1337
Copy link
Collaborator Author

DarkLight1337 commented May 21, 2024

I'm not that experienced in Kubernetes but from my understanding, placing the HuggingFace cache inside a Volume should avoid the need to redownload the models when tests are run again in the same Pod.

@rkooo567 is it possible on your end to force the CI to run on the same Pod so we can test whether the cache actually works in this way?

@rkooo567
Copy link
Collaborator

Hmm I am not super familiar with how CI works actually (idk if we even use k8s under the hood). cc @simon-mo for thoughts..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants