In the case that a new group is created (model, slo pair), it seems to use random.choice to select the next instance. However, since random is only guaranteed to have even results when there are a lot of GPUs, it might send to the same instance.
This is especially a problem with model replica deployments:
Ex:
(model1, warmup_slo)
(model1, slo)
(model1, slo2)
Both might get sent to the same gpu due to random.choice
In the case that a new group is created (model, slo pair), it seems to use random.choice to select the next instance. However, since random is only guaranteed to have even results when there are a lot of GPUs, it might send to the same instance.
This is especially a problem with model replica deployments:
Ex:
(model1, warmup_slo)
(model1, slo)
(model1, slo2)
Both might get sent to the same gpu due to random.choice