# LLaMA 3.2 Vision for Image-based Tasks

In [1]:
import sys
!{sys.executable} -m pip install ollama




In [3]:
import ollama
ollama.pull("llama3.2-vision")

ProgressResponse(status='success', completed=None, total=None, digest=None)

## Vision Question Answering

In [5]:
# pass image and question to the model
response = ollama.chat(
    model = "llama3.2-vision",
    messages= [{
        'role': 'user',
        'content': 'What is in the image?',
        'images': ['../pets.webp']
    }]
)
print(response['message']['content'])

The image shows a cat and a dog standing in a field of yellow flowers.


## Visual Question Answering (Streaming)

In [6]:
# pass image and question to the model
stream = ollama.chat(
    model = "llama3.2-vision",
    messages= [{
        'role': 'user',
        'content': 'What is in the image?',
        'images': ['../pets.webp']
    }],
    stream= True,
)
for chunk in stream:
 print(chunk['message']['content'], end= '', flush = True)

The image shows a cat and dog standing in a field of yellow flowers. The cat is on the left side of the image, facing forward. It has brown and grey stripes with a fluffy tail that is curled up over its back. To the right of the cat is a dog with brown, grey, and white fur. It is facing forward and appears to be smiling. Both animals are standing in a field of yellow flowers with green grass. The background is blurry but appears to be a body of water behind the field of flowers.

## Explaining meme

In [9]:
# pass image and question to the model
stream = ollama.chat(
    model = "llama3.2-vision",
    messages= [{
        'role': 'user',
        'content': 'Explain this meme to me',
        'images': ['../meme.webp']
    }],
    stream= True,
)
for chunk in stream:
 print(chunk['message']['content'], end= '', flush = True)

This meme is a play on the "You have a meme idea" and "You forget it" meme format. The top panel features a cartoon frog with a wide-eyed, toothy grin, while the bottom panel shows the same frog with a confused expression. The text reads, "You have a meme idea" in the top panel and "You forget it" in the bottom panel.

The meme is humorous because it pokes fun at the common experience of having a great idea for a meme, only to forget it moments later. The use of a cartoon frog as the character adds to the humor, as it's a relatable and endearing character that many people can identify with. Overall, the meme is a lighthearted way to poke fun at the fleeting nature of creativity and the tendency to forget good ideas.

## Document Understanding

In [10]:
# pass image and question to the model
stream = ollama.chat(
    model = "llama3.2-vision",
    messages= [{
        'role': 'user',
        'content': 'Explain the flow of method.',
        'images': ['../doc.png']
    }],
    stream= True,
)
for chunk in stream:
 print(chunk['message']['content'], end= '', flush = True)

The flow of method is as follows:

1. **Load taxonomy**: Define files and data for extraction.
2. **Digitize**: Use OCR to detect text and its location.
3. **Classify**: Classify the documents from the specified list.
4. **Extract**: Extract information from the documents.
5. **Validate**: If needed, a human can confirm the extracted data.
6. **Export**: Export the extracted information for further use.

This flowchart provides a step-by-step guide to the method's process.

## Image Comparison

In [33]:
# First image
description1 = ""
stream1 = ollama.chat(
    model="llama3.2-vision",
    messages=[{
        'role': 'user',
        'content': 'Describe this image.',
        'images': ["../img1.jpg"]
    }],
    stream=True,
)
print("Image 1 Description:\n")
for chunk in stream1:
    content = chunk['message']['content']
    description1 += content
    print(content, end='', flush=True)

# Second image
description2 = ""
stream2 = ollama.chat(
    model="llama3.2-vision",
    messages=[{
        'role': 'user',
        'content': 'Describe this image.',
        'images': ["../img2.webp"]
    }],
    stream=True,
)
print("\n\nImage 2 Description:\n")
for chunk in stream2:
    content = chunk['message']['content']
    description2 += content
    print(content, end='', flush=True)


Image 1 Description:

The image depicts a white ram with a distinctive curved horn on its head, standing on a rocky terrain with its head turned to the left. The ram's thick, fluffy coat is a uniform white, and its horn is brown and curved in a spiral shape. Its head is turned to the left, and its eyes are not visible. The ram's body is facing to the right, and its legs are spread out, with its left leg slightly bent.

In the background, there are dark branches and a rocky terrain, suggesting that the ram is in a natural environment, possibly a mountainous or rocky area. The overall atmosphere of the image is one of serenity and tranquility, with the ram seemingly at ease in its surroundings.

Image 2 Description:

The image depicts a young calf standing in a grassy field, its head held high and its ears perked up. The calf's fur is a mix of black and white, with its face and ears being black and its body being white. A blue and purple rope is tied around its head, possibly for trainin

In [34]:
# Compare descriptions using ollama.chat
comparison = ollama.chat(
    model="llama3.2-vision",
    messages=[
        {
            "role": "user",
            "content": f"""Compare the following two image descriptions and highlight their similarities and differences:

Description 1:
{description1}

Description 2:
{description2}
"""
        }
    ]
)

print("\n\nComparison Result:\n")
print(comparison['message']['content'])




Comparison Result:

Here are the similarities and differences between the two image descriptions:

**Similarities:**

* Both descriptions mention the animal's head and body orientation, with the ram's head turned to the left and the calf's head held high.
* Both descriptions mention the background environment, with the ram's image featuring a rocky terrain and the calf's image featuring a grassy field.
* Both descriptions convey a sense of atmosphere, with the ram's image being serene and tranquil and the calf's image suggesting a controlled environment.

**Differences:**

* **Animal type**: The most obvious difference is that the first image describes a ram, while the second image describes a calf.
* **Color and coat**: The ram has a uniform white coat with a brown curved horn, while the calf has a mix of black and white fur.
* **Environment**: The ram is in a natural, rocky environment, while the calf is in a controlled, grassy environment with a dirt path.
* **Accessories**: The c