---
title: "QwenVL Paper Summary"
author: "Samith Va"
date: "2024-07-27"
categories: [LLM]
format:
  html:
    code-fold: false
    toc: true
# jupyter: python3
---


Qwen-VL is a series of LVLMs which used to understand texts and images, it is developed by Ali group and it is an open-source model.   What’s make it different from Qwen-7B is that with Qwen-VL it introduces **visual receptor**, **language-aligned visual encoder** and **position-aware receptor**.  

## Features of Qwen-VL

There are 4 features of Qwen-VLs, that make it superior to other LVLMs : 

1. Leading performance & open source (less model parameters, only 9.6B compare to other LVLMs.)
2. Multilingual 
3. Multiple images input
4. More accurate : using higher resolution image in training process (fine-grained visual understanding)

## Model Architecture

There 3 basic components in Qwen-VL : 

- Base model : Qwen-7B
- Visual Encoder : Vision Transformer (pretrained weights from Openclip’s ViT-bigG)
- Position-aware Vision-Language Adaptor : 

## Training Process

The training part consists of 3 processes :

1. Pre-train (1.4 billion images, 77.3% in English and 22.7 in Chinese)
2. Multi-task Pre-train 
3. Finetuning (Result in Qwen-VL-Chat)

![Pre-train Dataset](pretrain_data.png)



### 2. Writing image
```python
import cv2
# Read an image in BGR
image = cv2.imread('image.jpg')

# Write the image to a file
cv2.imwrite('output_image.jpg', image)
```
### 3. Coverting color spaces

OpenCV provides functions to convert images between different color spaces, such as RGB, BGR, HSV, etc.

```python
import cv2
# Read an image
image = cv2.imread('image.jpg')

# Convert BGR to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Convert BGR to HSV
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
cv2.imwrite('gray_image.jpg', gray_image)
cv2.imwrite('hsv_image.jpg', hsv_image)
```
::: {layout-ncol=2 layout-valign="bottom"}
![Grey](gray_image.jpg)

![HSV](hsv_image.jpg)
:::

### 4. Resizing and Cropping Images

Resize images to a specific width and height or by a scaling factor. Cropping involves selecting a region of interest (ROI) from the image.

```python
import cv2
# Read an image
image = cv2.imread('image.jpg')

# Resize the image to a specific width and height
new_width, new_height = 200, 200
resized_image = cv2.resize(image, (new_width, new_height))

# Resize the image by a scaling factor
scale_percent = 50  # percent of original size
width = int(image.shape[1] * scale_percent / 100)
height = int(image.shape[0] * scale_percent / 100)
resized_image = cv2.resize(image, (width, height))
cv2.imwrite('resized.jpg', resized_image)

# Crop a region of interest (ROI) from the image
x, y, w, h = 100, 100, 200, 200  # Example coordinates and dimensions
cropped_image = image[y:y+h, x:x+w]
cv2.imwrite('cropped.jpg', cropped_image)

```

::: {layout-ncol=2 layout-valign="bottom"}
![Resized](resized.jpg)

![Cropped](cropped.jpg)
:::

### 5. Flipping Images

The function flip flips the array in one of three different ways (row and column indices are 0-based):
`dst = cv.flip( src, flipCode )`
`dst`: output array of the same size and type as src.

The function has 2 required arguments:

- `src`: input image
- `flipCode`: a flag to specify how to flip the array; 0 means flipping around the x-axis and positive value (for example, 1) means flipping around y-axis. Negative value (for example, -1) means flipping around both axes.


In [None]:
import cv2
import matplotlib.pyplot as plt

img_bgr = cv2.imread('image.jpg')
img_rgb = img_bgr[:, :, ::-1]

img_rgb_flipped_horz = cv2.flip(img_rgb, 1)
img_rgb_flipped_vert = cv2.flip(img_rgb, 0)
img_rgb_flipped_both = cv2.flip(img_rgb, -1)

# Show the images
plt.figure(figsize=(18, 5))
plt.subplot(141) # 141 : 1 row, 4 columns, current index of subplot
plt.imshow(img_rgb)
plt.title("Original")
plt.subplot(142) 
plt.imshow(img_rgb_flipped_horz)
plt.title("Horizontal Flip")
plt.subplot(143)
plt.imshow(img_rgb_flipped_vert)
plt.title("Vertical Flip")
plt.subplot(144)
plt.imshow(img_rgb_flipped_both)
plt.title("Both Flipped")
plt.show()

### 6. Reading and Displaying Videos

```python
import cv2

# Open a video file
cap = cv2.VideoCapture('video.mp4')

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Video', frame)
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

```


Fore more reference, visit [OpenCV Documentation](https://docs.opencv.org/4.x/index.html)
