[Draft][CI/Build] Optimize models tests #4874

DarkLight1337 · 2024-05-17T03:00:26Z

The models tests keep getting interrupted (presumably due to running too long). This PR attempts to reduce the running time via:

Share the HuggingFace cache between Kubernetes containers during CI by storing it in a hostPath volume.
- Note: hostPath volumes have associated security risks. Is there another way for agent-stack-k8s to use a persistent volume?
Disabling graph construction (considering that the vLLM model is only run once per test, not including the profile run).

This PR also adds tqdm to the common dependencies. This is not actually a new dependency since it is currently used in vllm/entrypoints/llm.py; I just found that it is not in the requirements.txt file when trying to use tqdm for downloading the models.

DarkLight1337 · 2024-05-17T04:05:27Z

Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process.

DarkLight1337 · 2024-05-17T12:30:06Z

Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts?

… logs

DarkLight1337 · 2024-05-21T08:10:09Z

I'm not that experienced in Kubernetes but from my understanding, placing the HuggingFace cache inside a Volume should avoid the need to redownload the models when tests are run again in the same Pod.

@rkooo567 is it possible on your end to force the CI to run on the same Pod so we can test whether the cache actually works in this way?

rkooo567 · 2024-05-21T12:36:55Z

Hmm I am not super familiar with how CI works actually (idk if we even use k8s under the hood). cc @simon-mo for thoughts..

DarkLight1337 added 2 commits May 17, 2024 02:55

Simplify code and fix type annotations

970dbc5

See if enforce_eager=True can reduce the running time

92b3bc0

DarkLight1337 marked this pull request as draft May 17, 2024 03:00

Apply formatter

7678e8b

DarkLight1337 added 3 commits May 17, 2024 04:15

Download models concurrently

d3e5200

Fix invalid fixture scope

09b8886

Also use parallel loading and eager mode for test_gptq_marlin

3f34fdf

DarkLight1337 force-pushed the optimize-models-tests branch from b25b37f to 4edfe6e Compare May 17, 2024 08:07

Add progress bar for pre-loading models

6accb96

DarkLight1337 force-pushed the optimize-models-tests branch from 4edfe6e to 6accb96 Compare May 17, 2024 08:09

Fix models failing to be downloaded

427fc60

DarkLight1337 force-pushed the optimize-models-tests branch from bf7edab to 427fc60 Compare May 17, 2024 09:38

rkooo567 self-assigned this May 17, 2024

DarkLight1337 added 2 commits May 21, 2024 07:53

Remove parallel loading as it did not help much while obfuscating the…

31d3e99

… logs

Share the huggingface cache between k8s containers

78f64a4

Try using hostPath volume as suggested in agent-stack-k8s docs

7fb6e0d

Merge branch 'upstream' into optimize-models-tests

f416494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft][CI/Build] Optimize models tests #4874

[Draft][CI/Build] Optimize models tests #4874

DarkLight1337 commented May 17, 2024 •

edited

DarkLight1337 commented May 17, 2024

DarkLight1337 commented May 17, 2024 •

edited

DarkLight1337 commented May 21, 2024 •

edited

rkooo567 commented May 21, 2024

[Draft][CI/Build] Optimize models tests #4874

Are you sure you want to change the base?

[Draft][CI/Build] Optimize models tests #4874

Conversation

DarkLight1337 commented May 17, 2024 • edited

DarkLight1337 commented May 17, 2024

DarkLight1337 commented May 17, 2024 • edited

DarkLight1337 commented May 21, 2024 • edited

rkooo567 commented May 21, 2024

DarkLight1337 commented May 17, 2024 •

edited

DarkLight1337 commented May 17, 2024 •

edited

DarkLight1337 commented May 21, 2024 •

edited