In my application scenario, I hope to know whether the product is related to our search terms; the collection of products is known in advance. How can it be optimized? My idea is to cache the K and V related to the title of the product first, and then concatenate them with the search terms in the Context phase and use them directly. Based on TensorRT-llm, can this goal be achieved?
In my application scenario, I hope to know whether the product is related to our search terms; the collection of products is known in advance. How can it be optimized? My idea is to cache the K and V related to the title of the product first, and then concatenate them with the search terms in the Context phase and use them directly. Based on TensorRT-llm, can this goal be achieved?