# Exploring Llama 3.2-Vision (locally) with Ollama

Code authored by: Shaw Talebi

[Blog link](https://towardsdatascience.com/multimodal-models-llms-that-can-see-and-hear-5c6737c981d3)
<br>[Video link](https://youtu.be/Ot2c5MKN_-w)

### imports

In [3]:
import ollama

### pull model

In [4]:
ollama.pull('llama3.2-vision')

{'status': 'success'}

#### Basic Usage

In [3]:
response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['images/shaw-sitting.jpeg']
    }]
)

print(response['message']['content'])

This image shows a man sitting on a yellow ottoman with his hands clasped together. He is wearing a black polo shirt with a name tag that says "Shaw" and a black baseball cap with white text that reads, "THE DATA ENREPRENEUR." The background of the image appears to be an office or lounge area, with a large screen on the wall behind him displaying a presentation slide. There are also several chairs and tables in the background, suggesting that this may be a meeting room or common area for employees to gather and work.


#### Image captioning - streaming

In [4]:
stream = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'Can you write a caption for this image?',
        'images': ['images/shaw-sitting.jpeg']
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

This image features a man sitting on a yellow chair. He is wearing a black polo shirt with a blue name tag that says "Shaw", khaki pants, and a black baseball cap with white text that reads "THE DATA ENTHUSIAST". The man has his hands clasped together in front of him and appears to be smiling.

The background of the image consists of a room with various pieces of furniture. There is a green ottoman to the left of the yellow chair, and two blue chairs on the right side of the image. A brown table or desk sits behind the man, along with a fireplace. The walls are painted teal blue and have a wooden accent wall featuring holes for hanging items.

The overall atmosphere suggests that this may be a modern office space or co-working area where people can come to work, relax, or socialize.

#### Explaining memes

In [5]:
stream = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'Can you explain this meme to me?',
        'images': ['images/ai-meme.jpeg']
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

The meme depicts Patrick Star from SpongeBob SquarePants, surrounded by various AI tools and symbols. The caption reads "Trying to build with AI today..." The image humorously illustrates the challenges of using AI in building projects, implying that it can be overwhelming and frustrating.

#### OCR

In [6]:
stream = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'Can you transcribe the text from this screenshot in a markdown format?',
        'images': ['images/5-ai-projects.jpeg']
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Here is the transcription of the text in markdown format:

5 AI Projects You Can Build This Weekend (with Python)

1. **Resume Optimization (Beginner)**
	* Idea: build a tool that adapts your resume for a specific job description
2. **YouTube Lecture Summarizer (Beginner)**
	* Idea: build a tool that takes YouTube video links and summarizes them
3. **Automatically Organizing PDFs (Intermediate)**
	* Idea: build a tool to analyze the contents of each PDF and organize them into folders based on topics
4. **Multimodal Search (Intermediate)**
	* Idea: use multimodal embeddings to represent user queries, text knowledge, and images in a single space
5. **Desktop QA (Advanced)**
	* Idea: connect a multimodal knowledge base to a multimodal model like Llama-3.2-11B-Vision

Note that I've added some minor formatting changes to make the text more readable in markdown format. Let me know if you have any further requests.