You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When querying transformers, an <image> placeholder is used and the images are passed as a separate input argument to the prompt. This doesn't appear to be the case with TGI, which just expects a prompt input.
Something like this:
curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
-X POST \
-d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
works, although it fails when trying to do two images (the model ignores the second image):
curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
-X POST \
-d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image, and also about this second image: ![](http://images.cocodataset.org/val2017/000000039769.jpg)<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":50}}' \
-H 'Content-Type: application/json'
The text was updated successfully, but these errors were encountered:
System Info
NA
Information
Tasks
Reproduction
It is unclear how to query TGI for multi-modal models.
The links to LLaVA Next and IDEFICS2 give 404:
https://huggingface.co/docs/text-generation-inference/HuggingFaceM4/idefics-9b-instruct
https://huggingface.co/docs/text-generation-inference/llava-hf/llava-v1.6-mistral-7b-hf
@Narsil @VictorSanh
Expected behavior
When querying transformers, an
<image>
placeholder is used and the images are passed as a separate input argument to the prompt. This doesn't appear to be the case with TGI, which just expects a prompt input.Something like this:
works, although it fails when trying to do two images (the model ignores the second image):
The text was updated successfully, but these errors were encountered: