Skip to content

Commit ae57738

Browse files
authored
[https://nvbugs/5547414][fix] Use cached models (#8755)
Signed-off-by: Hui Gao <huig@nvidia.com>
1 parent a2e964d commit ae57738

File tree

3 files changed

+2
-3
lines changed

3 files changed

+2
-3
lines changed

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2832,7 +2832,7 @@ def test_dummy_load_format(self):
28322832
False,
28332833
True,
28342834
True,
2835-
False,
2835+
True,
28362836
marks=pytest.mark.skip_less_mpi_world_size(8))],
28372837
ids=["latency", "multi_gpus_no_cache"])
28382838
def test_bf16(self, tp_size, pp_size, ep_size, attention_dp, cuda_graph,

tests/integration/defs/triton_server/test_triton_llm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3708,7 +3708,7 @@ def test_llmapi_backend(E2E_MODEL_NAME, DECOUPLED_MODE, TRITON_MAX_BATCH_SIZE,
37083708
f"--test-llmapi",
37093709
'dataset',
37103710
f"--dataset={os.path.join(llm_backend_dataset_root, 'mini_cnn_eval.json')}",
3711-
f"--tokenizer-dir=TinyLlama/TinyLlama-1.1B-Chat-v1.0",
3711+
f"--tokenizer-dir={tiny_llama_model_root}",
37123712
]
37133713

37143714
print_info("DEBUG:: run_cmd: python3 " + " ".join(run_cmd))

tests/integration/test_lists/waives.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,6 @@ accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen
387387
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[GSM8K-gen_tp=2-ctx_pp=4] SKIP (https://nvbugs/5582277)
388388
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=2-ctx_pp=2] SKIP (https://nvbugs/5582277)
389389
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=2-ctx_pp=4] SKIP (https://nvbugs/5582277)
390-
accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_bf16[multi_gpus_no_cache] SKIP (https://nvbugs/5547414)
391390
triton_server/test_triton.py::test_llava[llava] SKIP (https://nvbugs/5547414)
392391
unittest/executor/test_rpc_proxy.py SKIP (https://nvbugs/5605741)
393392
full:RTX/accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5569696)

0 commit comments

Comments
 (0)