Hello, thank you for your great work. I really liked this project and would gladly contribute to it's development. Looks like that now the project is mainly focused on multi-gpu setups (correct me if I'm wrong). Thus the question is: is it a priority to support more lite setups. It would be cool to add the support of cases where one card is being used to launch several models (and therefore not 100% of memory is available for this model), because as I can see KV Cache now occupies all the remaining memory. And also I'd like to see more graphic cards available (I'm personally interested in A100). What do you think about all of this, let me know if it's not a priority or if you have another view on this project.
Hello, thank you for your great work. I really liked this project and would gladly contribute to it's development. Looks like that now the project is mainly focused on multi-gpu setups (correct me if I'm wrong). Thus the question is: is it a priority to support more lite setups. It would be cool to add the support of cases where one card is being used to launch several models (and therefore not 100% of memory is available for this model), because as I can see KV Cache now occupies all the remaining memory. And also I'd like to see more graphic cards available (I'm personally interested in A100). What do you think about all of this, let me know if it's not a priority or if you have another view on this project.