# Local Visual QA with LLaMA 3.2 Vision
## ABB #1 - Session 3

Code authored by: Shaw Talebi

### imports

In [1]:
import ollama
import gradio as gr
import time

  from .autonotebook import tqdm as notebook_tqdm


### basic usage

In [2]:
# pull model
ollama.pull('llama3.2-vision')

{'status': 'success'}

In [3]:
# interact with model (locally)
stream = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is this paper about?',
        'images': ['papers/attention-is-all-you-need.png']
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

The abstract of the paper states that it proposes a new system for translating text from German to English. The system, known as BLEU (Bilingual Evaluation Understudy), uses deep learning techniques to generate more accurate and natural-sounding translations than existing methods.

**Key Features:**

* **Parallelization**: The model achieves significant speedup by dividing the translation task into smaller sub-tasks that can be processed in parallel.
* **Improved Accuracy**: The system produces more accurate and natural-sounding translations than existing methods.
* **Efficient Training**: The model is trained using a combination of supervised and unsupervised learning techniques, which allows it to learn from large amounts of data quickly and efficiently.

**Impact:**

The paper presents a new approach to machine translation that has the potential to significantly improve the accuracy and efficiency of language translation systems. By leveraging deep learning techniques and paralleliz

### gradio UI

In [4]:
# Function to interact with the Ollama model
def stream_chat(message, history):
    """
    Streams the response from the Ollama model and sends it to the Gradio UI.
    
    Args:
        message (str): The user input message.
        history (list): A list of previous conversation messages.
        
    Yields:
        str: The chatbot's response chunk by chunk.
    """
    # Append the user message to the conversation history
    history.append({"role": "user", "content": message["text"], "images":message["files"]})
    
    # Initialize streaming from Ollama
    stream = ollama.chat(
        model='llama3.2-vision',
        messages=history,  # Full chat history including the current user message
        stream=True,
    )
    
    response_text = ""
    for chunk in stream:
        content = chunk['message']['content']
        response_text += content
        yield response_text  # Send the response incrementally to the UI

    # Append the assistant's full response to the history
    history.append({"role": "assistant", "content": response_text})

In [5]:
# Create a Gradio ChatInterface
gr.ChatInterface(
    fn=stream_chat,  # The function handling the chat
    type="messages",  # Using "messages" to enable chat-style conversation
    examples=[{"text": "What is this paper about?", "files": ['papers/attention-is-all-you-need.png']}],  # Example inputs
    multimodal=True,
).launch()

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


