In [23]:
from llama_index.llms import Ollama
import base64
import struct

In [24]:
# load the llava model with Ollama interface. I am keeping a very high timeout to handle larger images
llm = Ollama(model='llava', temperature=0, request_timeout=300.0,verbose=True)

In [25]:
# we require a base 64 encoded string of the image bytes - this function helps out.
def get_b64_image(image_file:str)->str:    
    with open(image_file,'rb') as f:
        data = f.read()
    b64_image = base64.b64encode(data).decode()
    return b64_image

In [26]:
b64_image = get_b64_image('/Users/akshayranganath/Downloads/jumper.jpeg')

In [None]:
# This won't work. We have not passed the image in the right way. The response is some random stuff
resp = llm.complete(f'Can you describe this image? Here is the image data\n\n>>>{b64_image}>>>')
print(resp.text)

### cURL
Let's try this with curl instead of the library

In [27]:
import requests

In [28]:
body = {
    "model" : "llava",
    "prompt": f"Can you describe this image?",
    "stream": False,
    "images": [b64_image], # i found this after combing through some github issues
    "options":{
        "temperature": 0
    }
}

In [29]:
resp = requests.post(
    "http://localhost:11434/api/generate",
    json=body
)

In [30]:
op = resp.json()

In [31]:
print(op['response'])

 The image shows a dynamic scene of an athlete in mid-air, captured during a jump. The athlete is wearing a black tracksuit with the Nike logo on it and is also wearing black shoes with a white checkmark, which are characteristic of Nike's branding. The athlete appears to be a male, given the muscular build and the style of the clothing.

The background suggests an indoor athletic facility, as indicated by the artificial turf and the stadium seating in the distance. There is a track with lane markings, and the lighting suggests it could be either early morning or late evening, given the soft glow on the ground. The athlete's pose and the motion blur around him convey a sense of speed and agility.

The image has a professional quality to it, likely intended for promotional or advertising purposes, showcasing the athletic prowess associated with the Nike brand. 


### Using Library
Now, let's try to use the library. After checking Git issues, I found out that the `llm` call can include `images` attribute. This takes the base64 encoded string to analyze. So using this mechanism, I am able to use the pure library and get the desired output.

In [32]:
from llama_index.schema import ImageDocument

In [33]:
llava_response = llm.complete(
    'Can you desribe this image?',
    images=[b64_image],
)

In [34]:
llava_response.text

" The image shows a dynamic scene of an athlete in mid-air, captured during a jump. The athlete is wearing a black tracksuit with the Nike logo on it and is also wearing black shoes with a white checkmark. They are jumping over a track with a starting block visible in the background. The setting appears to be an indoor stadium with artificial lighting, as suggested by the shadows cast on the ground. The athlete's pose suggests they are in the middle of a sprint or hurdle event. The image has a dramatic and intense feel, emphasizing the athletic prowess and speed of the individual. "