# Ollama LLaVA Challenges

Ollama is a tool that allows users to run open-source large language models (LLMs) locally on their machines. It supports a variety of models, including Llama 2, Code Llama, and others. 

This notebook contains some example code to run ollama with a Vision-Language Model called LLaVA. 

You'll need to download Ollama from their website first: www.ollama.ai.

Courtesy of some code examples to ollama.com / Jeffrey Morgan.

You will need some understanding of Python to do this notebook.

License: MIT License

### Contents
0. Install and settings
1. Using ollama vision with LLaVA
2. LLaVA tests
3. Webcam challenge
4. Self driving car challenge
5. Bonus Challenge: Gradio front end


### Sources
- ollama: www.ollama.com
- ollama python: https://github.com/ollama/ollama-python
- LLaVA: https://llava-vl.github.io/

## 0. Install and settings

Make sure you've installed Ollama on your machine before running the code!

In [1]:
# Check your version of python. To run ollama you will need Python 3.8 or higher.
from platform import python_version
print(python_version())

3.10.9


In [2]:
%pip install --upgrade ollama

Note: you may need to restart the kernel to use updated packages.


In [None]:
# pull models from source; uncomment if necessery
#ollama.pull('llava')

In [3]:
# Script that shows the models on your laptop.
import ollama
models_dict = ollama.list()
models = models_dict['models']
model_list = []
for i in range(len(models)):
    print(models[i]['name'])
    model_list.append(models[i]['name'])
print(50*'-')
print(model_list)

bramvanroy/fietje-2b-chat:Q3_K_M
llava:latest
mistral:latest
nomic-embed-text:latest
tinyllama:latest
--------------------------------------------------
['bramvanroy/fietje-2b-chat:Q3_K_M', 'llava:latest', 'mistral:latest', 'nomic-embed-text:latest', 'tinyllama:latest']


## 1. Using ollama vision with LLaVA

In [4]:
#show the jpg files in your current folder
import glob
my_jpgs = glob.glob('*.jpg')
my_jpgs

['man_ironing_taxi.jpg',
 'chihuaha-muffin.jpg',
 'count_cars.jpg',
 'Cow_alps_beach.jpg',
 'dog_or_tiger.jpg']

In [5]:
#select your image here
image_path = 'man_ironing_taxi.jpg' #select your image

In [6]:
#show an image
import IPython
IPython.display.Image(image_path)

<IPython.core.display.Image object>

In [7]:
#source: https://github.com/ollama/ollama-python
import ollama

res = ollama.chat(
    model="llava",
    messages=[
        {
            'role': 'user',
            'content': 'What is strange about this image?:',
            'images': [image_path]
        }
    ]
)

print(res['message']['content'])

 The image depicts an unusual scene where a person is standing on the back of a taxi cab that appears to be in motion. This is strange because it's not safe or practical to stand or sit atop a moving vehicle, as there is a high risk of falling off and getting injured. Furthermore, the way the person is dressed, with a yellow shirt and a tie, contrasts with the casual, street-style clothes one would typically expect for someone in such an environment. The setting appears to be a busy urban street with traffic, which makes this scene even more extraordinary due to its unexpected nature. 


## 2. LLaVA tests & exercises
 
- Test LLaVA. How reliable is it? 
- Use the images provided on github.

### Exercise 1: Cow on the beach

Use image 'cow_alps_beach.jpg' en let LLaVA describe the image. 


In [None]:
#YOUR CODE HERE

P.S. The image of a cow on the beach is a classic in Computer Vision as previous generation of models (CNN's) were not able to detect the cow on the beach. (see https://arxiv.org/abs/1807.04975 Recognition in Terra Incognita). Now LLaVA can do it with ease. Progress!

### Exercise 2:  Chihua or muffin?¶

Another classic in Computer Vision. Can LLaVA get this right?
Use image 'chihuaha-muffin.jpg'.

In [None]:
#YOUR CODE HERE

### Exercise 3: Count the number of cars
How good is LLaVA at detection the objects? Can it count all the cars?
Use image 'count_cars.jpg'.

In [None]:
#YOUR CODE HERE

### Exercise 4: Dog or tiger

What does LLaVA make of this hard image?
Use image 'dog_or_tiger.jpg'

In [None]:
#YOUR CODE HERE

### Exercise 5: select your own image
Select an image from the web & let ollama describe it.

In [None]:
#YOUR CODE HERE

### Short summary

Give a short summary how good LLaVA performs at these tasks.
- image 1:
- image 2:
- image 3:
- image 4:
- image 5:

## 3. Ollama Webcam Challenge

We will use a script that captures an image with your webcam.

In [None]:
# take an image with your webcam using OpenCV library
import cv2
camera = cv2.VideoCapture(0)
return_value, image = camera.read()
cv2.imwrite('webcam.jpg', image)
del(camera)

In [None]:
#display the image
import IPython
IPython.display.Image('webcam.jpg')

### Challenge: let ollama describe this image.

Use the code given above to let llava describe the image.

In [None]:
#YOUR CODE HERE 

What applications can you think of?  Security? Art? Robotics?

## 4. Self driving car challenge

Self driving cars still have trouble to navigate there way around. They can 'see' traffic signs, lights, road markings and other traffic, but they often have difficulties with 'reasoning' about their behaviour.  Can LLaVA be a step in the right direction? 

LLaVA is better than Convolutional Neural Networks (CNN's) so why not try it out?

#### Challenge: reason about traffic situations

You will create a model that reasons about traffic situations.

Your to do list:
1. In Google Streetview, take a screenshot of a streetcrossing or other traffic situation. 
2. Create a Modelfile to create a model that can reason about traffic. Create a system prompt that tells the model to be a rational, self driving car that explains it's behaviour step by step. (hint:do this outside of the notebook)
3. Create the model with the linux commands (cheat sheet)
4. Run the model using the screenshot you've made.

In [None]:
#YOUR CODE HERE


## 5. Bonus Challenge (medium to hard)

Create a front end with Gradio where you can run LLaVA.

More info in this video by Patrick Loeber: https://www.youtube.com/watch?v=eE7CamOE-PA

In [None]:
#YOUR CODE HERE