I use 7b-model with 3090(24G), When loading, it cost 15G Memory, It's OK.
When referencing, indeed it is a nice model, but it eat too much Memory as metadata.sql increasing a bit.
Only a 4k metadata.sql probably causes CUDA out of memory. Any good idea to solve it?