wrong output on phi3 vision model #447

adrid · 2024-06-18T20:25:15Z

Describe the bug
I've started the server:
cargo run --release --features cuda -- --port 1234 vision-plain -m microsoft/Phi-3-vision-128k-instruct -a phi3v

And then I'm running the python request example from here:
https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md

I'm getting:
The image shows a snow-covered mountain with a clear sky above and trees at the base. There appears to be a path or trail leading up the mountain, and some structures can be seen on the peak.

Which is correct.

But when I'm changing the image url to something else like for example: https://onnxruntime.ai/images/coffee.png

Then it takes forever until it gets out of memory.

"/home/adrian/miniconda3/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'message': 'DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")', 'partial_response': {'id': '2', 'choices': [{'finish_reason': 'error', 'index': 0, 'message': {'content': "<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>...
...it continues

Other image: https://onnxruntime.ai/images/table.png

For some images it stops but returns a mess:

The 00012, and0s12 - 00 2000 (   . Currently. There 121 and a
319 to a 0s - It - , the - . The  . ( [ [ 300 isution, in, and   In....9, the
  2odans, 1200 for 20 111,  working and0 years119. Ins10 a in

s [ . In, the Currently. The   on a the - .   and is a the primary, based, and in on - at a an, primarily.  , 20 (   for [ution.s ands for 11 role. It20s
., 02, working at 0101,  -.
 [ [ in for the the  .
0, and in .s, it399 before a a prior on  1.     , currently, ands... [
s 0 iss In....ia. Prior, it25. Currently, I  ' 0 and 
 [ [
    112. Currently, the .0 0 . It .0s is 1 .           , jobs0 to role
 [  .   and, thes - a3, a The9. and1,  working - 20 0.
[ 
0
     0 [, the20 ' '
   \ .   .  - prior, before 2s and . (100 1 .   , the5, prior, full. The1. Its -...utod  -s in0s [200s
3, current, the, and, and.
  

 2, 29, 0 .    10ed        [ is1^^ The^ [s11 for a. It  - a time. (, 10, I. in, it, it
, the

s.  [ and, the
...it continues

I can't make it work on any other image than the one from the example.

Latest commit
3a79137

The text was updated successfully, but these errors were encountered:

EricLBuehler · 2024-06-21T02:56:28Z

Hi @adrid! Thank you for raising this, I'll take a look.

EricLBuehler · 2024-06-22T14:35:55Z

Hi @adrid! I have fixed this now in #459. After you run git pull and rebuild (for Rust) / reinstall (for Python), it should work.

When I run:

cargo run --release --features cuda -- --port 1234 --isq Q4K vision-plain -m microsoft/Phi-3-vision-128k-instruct -a phi3v

And

import openai

openai.api_key = "EMPTY"
openai.base_url = "http://localhost:1234/v1/"

completion = openai.chat.completions.create(
    model="phi3v",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://onnxruntime.ai/images/coffee.png"
                    },
                },
                {
                    "type": "text",
                    "text": "<|image_1|>\nWhat is shown in this image? Write a detailed response analyzing the scene.",
                },
            ],
        },
    ],
    max_tokens=256,
    frequency_penalty=1.0,
    top_p=0.1,
    temperature=0,
)
resp = completion.choices[0].message.content
print(resp)

It gives:

The image captures a moment of tranquility, featuring a white cup filled with coffee. The cup, which is the central focus of the image, is placed on a wooden surface that adds a rustic charm to the scene. 

The coffee inside the cup has been meticulously prepared into an intricate latte art design. This design is composed of three delicate leaves, each symmetrically arranged around a central point. The leaves are white in color, contrasting beautifully with the dark brown foam that forms their base. 

The image does not contain any discernible text or additional objects. The relative position of the objects is such that the cup is at the center, with its contents spread out around it. The wooden surface on which the cup rests provides a natural and warm backdrop to this inviting scene. 

This image exudes a sense of calm and enjoyment, as if inviting one to take a moment to appreciate a well-crafted cup of coffee. It's a simple yet captivating snapshot of everyday life.

If you could confirm that it works for you that would be great!

adrid · 2024-06-22T23:17:33Z

It works great now @EricLBuehler ! I've checked couple more images and results now are correct. Thank you for checking this!

adrid added the bug Something isn't working label Jun 18, 2024

adrid changed the title ~~wrong output on phi3 vision model image~~ wrong output on phi3 vision model Jun 18, 2024

EricLBuehler mentioned this issue Jun 22, 2024

Fix LongRope models position ids calculation #459

Merged

EricLBuehler added the resolved label Jun 22, 2024

adrid closed this as completed Jun 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong output on phi3 vision model #447

wrong output on phi3 vision model #447

adrid commented Jun 18, 2024 •

edited

Loading

EricLBuehler commented Jun 21, 2024

EricLBuehler commented Jun 22, 2024

adrid commented Jun 22, 2024

wrong output on phi3 vision model #447

wrong output on phi3 vision model #447

Comments

adrid commented Jun 18, 2024 • edited Loading

EricLBuehler commented Jun 21, 2024

EricLBuehler commented Jun 22, 2024

adrid commented Jun 22, 2024

adrid commented Jun 18, 2024 •

edited

Loading