Currently, all AI upstream services are simulated using this fake server method.
I'm worried that the difference between the fake server and the real LLM request here is too big.
Should we introduce a container specifically for LLM Fake Server?
Originally posted by @membphis in #13307 (comment)
Currently, all AI upstream services are simulated using this fake server method.
I'm worried that the difference between the fake server and the real LLM request here is too big.
Should we introduce a container specifically for LLM Fake Server?
Originally posted by @membphis in #13307 (comment)