# SOWLv2 Demo Notebook
This notebook demonstrates the usage of **SOWLv2**, combining [OWLv2](https://huggingface.co/docs/transformers/en/model_doc/owlv2) and [SAM2](https://github.com/facebookresearch/sam2) for text-prompted object segmentation on images, folders of frames, and video.


### 1 · 🚀 Clone & Install the SOWLv2 repo

In [1]:
# Install SOWLv2 and required packages
!pip install git+https://github.com/bladeszasza/SOWLv2.git

Collecting git+https://github.com/bladeszasza/SOWLv2.git
  Cloning https://github.com/bladeszasza/SOWLv2.git to /tmp/pip-req-build-d65xd38n
  Running command git clone --filter=blob:none --quiet https://github.com/bladeszasza/SOWLv2.git /tmp/pip-req-build-d65xd38n
  Resolved https://github.com/bladeszasza/SOWLv2.git to commit a24ed0156d16ecd3b566c39211e624c3dec5be3b
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sam2>=1.1.0 (from sowlv2==0.1.0)
  Downloading sam2-1.1.0.tar.gz (152 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.8/152.8 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting hydra-core>=1.3.2 (from sam2>=1.1.0->sowlv2==0.1.0)
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting iopath>=0.1.10 (from sam2>=1.1.0->sowlv2==0.1.0)
  Down

### 2 ·🧪 Utils

In [2]:
from IPython.display import HTML
from base64 import b64encode
from google.colab import files

def show_video(video_path, width=640):
    mp4 = open(video_path, 'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
    return HTML(f"""
    <video width="{width}" controls>
        <source src="{data_url}" type="video/mp4">
    </video>
    """)


### 3 · 📷  Single Image Example
We create a sample image and run `sowlv2-detect` with a text prompt.


In [1]:
from skimage import data
import imageio
import os

# Create a sample image (cat) using skimage
image = data.chelsea()  # a cat image
imageio.imwrite('cat.png', image)

# Run the SOWLv2 detector on the image
!sowlv2-detect --prompt "cat" --input cat.png --output output_image --threshold 0.1

# List output files
print("Output directory contents:", os.listdir('output_image'))

2025-05-08 12:03:45.371053: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746705825.591947    1893 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746705825.652453    1893 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-08 12:03:46.129420: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
preprocessor_config.json: 100% 425/425 [00:00<00:00, 2.08MB/s]
tokenizer_config.json: 100% 1.10k/1.10k [00:00<00:00, 

### 4 · 📁 Frames Folder Example
We create a folder with sample images and run the detector on the folder.


In [11]:
from skimage import data
import os
import imageio

os.makedirs('frames', exist_ok=True)
# # Create sample images: astronaut (person) and camera (object)
imageio.imwrite('frames/person.png', data.astronaut())
imageio.imwrite('frames/object.png', data.camera())

# Run the detector on the frames folder
!sowlv2-detect --prompt "person" --input frames --output output_frames --threshold 0.1

# List output files
print("Output directory contents:", os.listdir('output_frames'))

2025-05-08 19:25:31.872542: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-08 19:25:31.889869: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746732331.911582    7107 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746732331.918050    7107 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-08 19:25:31.939828: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

### 5 · 🎬 Video Example
We download a small sample video and run the detector on it with a prompt.


In [3]:
import os

%rm -r output_video

# Download a sample video
!wget -O malamut.mp4 "https://dm0qx8t0i9gc9.cloudfront.net/watermarks/video/Sks4W_9Alj1v0vmgb/videoblocks-young-beautiful-female-walking-with-siberian-husky-dog-on-the-beach-woman-runs-and-plays-with-husky-dog_hxp1nfbns__4ed9e1619fcbfd31478e7384d5950220__P360.mp4"


# Run the detector on the video
!sowlv2-detect --prompt "person" --input malamut.mp4 --output output_video --threshold 0.1

# List output files (frame overlays and masks)
print("Output directory contents:", os.listdir('output_video'))

rm: cannot remove 'output_video': No such file or directory
--2025-05-08 15:58:35--  https://dm0qx8t0i9gc9.cloudfront.net/watermarks/video/rPUojDiU3liixc93j/a011c020-240530ut-mugi-02-v1-0026-o4-isjlro6__e142ab6233f74a73bd6d1ef70115058e__P360.mp4
Resolving dm0qx8t0i9gc9.cloudfront.net (dm0qx8t0i9gc9.cloudfront.net)... 3.169.243.212, 3.169.243.231, 3.169.243.63, ...
Connecting to dm0qx8t0i9gc9.cloudfront.net (dm0qx8t0i9gc9.cloudfront.net)|3.169.243.212|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 765444 (748K) [video/mp4]
Saving to: ‘sample.mp4’


2025-05-08 15:58:36 (2.72 MB/s) - ‘sample.mp4’ saved [765444/765444]

2025-05-08 15:58:40.279215: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746719920.299299    2886 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN w

Large model usage, more GPU ram is needed

In [5]:
import os

%rm -r output_video

# Download a sample video
!wget -O malamut.mp4 "https://dm0qx8t0i9gc9.cloudfront.net/watermarks/video/Sks4W_9Alj1v0vmgb/videoblocks-young-beautiful-female-walking-with-siberian-husky-dog-on-the-beach-woman-runs-and-plays-with-husky-dog_hxp1nfbns__4ed9e1619fcbfd31478e7384d5950220__P360.mp4"

# Run the detector on the video
!sowlv2-detect --prompt "dog" --input malamut.mp4 --output output_video --threshold 0.1 --sam-model "facebook/sam2.1-hiera-large" --owl-model "google/owlv2-large-patch14-ensemble"

# List output files (frame overlays and masks)
print("Output directory contents:", os.listdir('output_video'))

--2025-05-08 19:15:53--  https://dm0qx8t0i9gc9.cloudfront.net/watermarks/video/Sks4W_9Alj1v0vmgb/videoblocks-young-beautiful-female-walking-with-siberian-husky-dog-on-the-beach-woman-runs-and-plays-with-husky-dog_hxp1nfbns__4ed9e1619fcbfd31478e7384d5950220__P360.mp4
Resolving dm0qx8t0i9gc9.cloudfront.net (dm0qx8t0i9gc9.cloudfront.net)... 108.158.4.77, 108.158.4.44, 108.158.4.5, ...
Connecting to dm0qx8t0i9gc9.cloudfront.net (dm0qx8t0i9gc9.cloudfront.net)|108.158.4.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1323046 (1.3M) [video/mp4]
Saving to: ‘malamut.mp4’


2025-05-08 19:15:54 (107 MB/s) - ‘malamut.mp4’ saved [1323046/1323046]

2025-05-08 19:15:57.949979: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-08 19:15:57.967210: E exte

### 6 · 🏁 Showcase the different output videos

In [6]:
show_video('/content/output_video/obj1_overlay_video.mp4')

In [9]:
files.download('/content/output_video/obj1_overlay_video.mp4')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [8]:
show_video('/content/output_video/obj1_mask_video.mp4')

In [10]:
files.download('/content/output_video/obj1_mask_video.mp4')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
### 7 · 🌟 **Created with enthusiasm by Csaba Bolyòs** 🚀  

[🔗 LinkedIn](https://www.linkedin.com/in/csaba-boly%C3%B2s-00a11767/) | [🌐 Google Colab Demo](https://colab.research.google.com/drive/1vX6P4KNmWoisY-Vfq6bAVunsHaLrC-AO)  