# Video Generation

In this example, we will generate a description of a video using `Qwen2-VL`.

This feature is currently in beta, may not work as expected and only supported by `Qwen2-VL` at the moment.



## Install Dependencies

Qwen2-VL requires a custom installation of the `qwen-vl-utils` package for video processing.


In [None]:
!pip install -U mlx-vlm qwen-vl-utils

## Import Dependencies

In [6]:
from pprint import pprint
from mlx_vlm import load
from mlx_vlm.video_generate import generate
from qwen_vl_utils import process_vision_info

import mlx.core as mx

In [None]:
# Load the model and processor
model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-8bit")

In [11]:
# Messages containing a video and a text query
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video",
                "video": "videos/fastmlx_local_ai_hub.mp4",
                "max_pixels": 360 * 360,
                "fps": 1.0,
            },
            {"type": "text", "text": "Describe this video."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)


In [12]:
# Convert inputs to mlx arrays
input_ids = mx.array(inputs['input_ids'])
pixel_values = mx.array(inputs['pixel_values_videos'])
mask = mx.array(inputs['attention_mask'])
image_grid_thw = mx.array(inputs['video_grid_thw'])

kwargs = {
    "image_grid_thw": image_grid_thw,
}

In [None]:
response = generate(model, processor, input_ids, pixel_values, mask, temp=0.7, max_tokens=100, **kwargs)

In [None]:
pprint(response)


In [None]:
# open video and play it
from ipywidgets import Video
Video.from_file("videos/fastmlx_local_ai_hub.mp4", width=320, height=240)