## Before you start

Let's make sure that we have access to GPU. We can use `nvidia-smi` command to do that. In case of any problems navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, and then click `Save`.

In [None]:
!nvidia-smi

Fri Sep  8 15:36:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P0    26W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

**NOTE:** To make it easier for us to manage datasets, images and models we create a `HOME` constant.

In [None]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

# temporary fix for some weird locale bug
import locale
locale.getpreferredencoding = lambda: "UTF-8"

HOME: /content


## Install Grounding DINO and Segment Anything Model

We use grounded segment anything as our image model. It consists of two components: [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) - for zero-shot detection and [Segment Anything Model (SAM)](https://github.com/facebookresearch/segment-anything) - for converting boxes into segmentations. We have to install them first.

We install our fork of grounded segment anything which comes with a minor bug fix.


In [None]:
%cd {HOME}
!git clone https://github.com/hkchengrex/Grounded-Segment-Anything
%env CUDA_HOME=/usr/local/cuda
%env BUILD_WITH_CUDA=True
%env AM_I_DOCKER=False
%cd {HOME}/Grounded-Segment-Anything
!pip uninstall -y GroundingDINO
!pip install -e GroundingDINO
!pip install -q -e segment_anything

/content
fatal: destination path 'Grounded-Segment-Anything' already exists and is not an empty directory.
env: CUDA_HOME=/usr/local/cuda
env: BUILD_WITH_CUDA=True
env: AM_I_DOCKER=False
/content/Grounded-Segment-Anything
Found existing installation: groundingdino 0.1.0
Uninstalling groundingdino-0.1.0:
  Successfully uninstalled groundingdino-0.1.0
Obtaining file:///content/Grounded-Segment-Anything/GroundingDINO
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: groundingdino
  Running setup.py develop for groundingdino
Successfully installed groundingdino-0.1.0
  Preparing metadata (setup.py) ... [?25l[?25hdone


<font color='red' size=3>Please restart: Runtime -> Restart Runtime</font>

## Make sure GroundingDINO has been installed properly.
## If this does not work, **Runtime -> Restart Runtime and try again**

In [None]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

# temporary fix for some weird locale bug
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# either one of these should work
%cd {HOME}
try:
  import groundingdino
  from groundingdino.util.inference import Model as GroundingDINOModel
except ImportError:
  import GroundingDINO
  from GroundingDINO.groundingdino.util.inference import Model as GroundingDINOModel

/content


Then we install [DEVA](https://github.com/hkchengrex/Tracking-Anything-with-DEVA).

In [None]:
%cd {HOME}
!git clone https://github.com/hkchengrex/Tracking-Anything-with-DEVA
%cd {HOME}/Tracking-Anything-with-DEVA
!pip install -q -e .

/content
fatal: destination path 'Tracking-Anything-with-DEVA' already exists and is not an empty directory.
/content/Tracking-Anything-with-DEVA
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building editable for deva (pyproject.toml) ... [?25l[?25hdone


### Download Model Weights

We need a few model weights for all of these to work. Below is a simplified download script from DEVA.

In [None]:
%cd {HOME}/Tracking-Anything-with-DEVA
!wget -q -P ./saves/ https://github.com/hkchengrex/Tracking-Anything-with-DEVA/releases/download/v1.0/DEVA-propagation.pth
!wget -q -P ./saves/ https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
!wget -q -P ./saves/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
!wget -q -P ./saves/ https://github.com/hkchengrex/Tracking-Anything-with-DEVA/releases/download/v1.0/GroundingDINO_SwinT_OGC.py

/content/Tracking-Anything-with-DEVA


## Initializing general stuff for DEVA and getting default parameters



In [None]:
%cd {HOME}/Tracking-Anything-with-DEVA

import os
from os import path
from argparse import ArgumentParser

import torch
import numpy as np

from deva.model.network import DEVA
from deva.inference.inference_core import DEVAInferenceCore
from deva.inference.result_utils import ResultSaver
from deva.inference.eval_args import add_common_eval_args, get_model_and_config
from deva.inference.demo_utils import flush_buffer
from deva.ext.ext_eval_args import add_ext_eval_args, add_text_default_args
from deva.ext.grounding_dino import get_grounding_dino_model
from deva.ext.with_text_processor import process_frame_with_text as process_frame

from tqdm import tqdm
import json

torch.autograd.set_grad_enabled(False)

# for id2rgb
np.random.seed(42)

# default parameters
parser = ArgumentParser()
add_common_eval_args(parser)
add_ext_eval_args(parser)
add_text_default_args(parser)

# load model and config
args = parser.parse_args([])
cfg = vars(args)
cfg['enable_long_term'] = True

# Load our checkpoint
deva_model = DEVA(cfg).cuda().eval()
if args.model is not None:
    model_weights = torch.load(args.model)
    deva_model.load_weights(model_weights)
else:
    print('No model loaded.')

gd_model, sam_model = get_grounding_dino_model(cfg, 'cuda')

/content/Tracking-Anything-with-DEVA


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


final text_encoder_type: bert-base-uncased


### Set hyperparameters

Default values should generally work fine. See https://github.com/hkchengrex/Tracking-Anything-with-DEVA/blob/main/docs/DEMO.md for some explanation of these parameters.

In [None]:
cfg['enable_long_term_count_usage'] = True
cfg['max_num_objects'] = 50
cfg['size'] = 480
cfg['DINO_THRESHOLD'] = 0.35
cfg['amp'] = True
cfg['chunk_size'] = 4
cfg['detection_every'] = 5
cfg['max_missed_detection_count'] = 10
cfg['sam_variant'] = 'original'
cfg['temporal_setting'] = 'online' # semionline usually works better; but online is faster for this demo
cfg['pluralize'] = True

## Download Example Data

Let's download an example video. Feel free to replace the video with yours. All you have to do is upload them to the `{HOME}/data` directory.

In [None]:
f"{HOME}/data"

'/content/data'

In [None]:
!mkdir -p {HOME}/data
%cd {HOME}/data

!wget -q -O example.mp4 https://user-images.githubusercontent.com/7107196/265518886-e5f6df87-9fd0-4178-8490-00c4b8dc613b.mp4

/content/data


## Specifying all the inputs and output directory

In [None]:
SOURCE_VIDEO_PATH = f"{HOME}/data/example.mp4"
CLASSES = ['person', 'hat', 'horse']
cfg['DINO_THRESHOLD'] = 0.5
OUTPUT_VIDEO_PATH = f"{HOME}/data/example_output.webm"

## Running DEVA

In [None]:
%cd {HOME}/Tracking-Anything-with-DEVA

from deva.ext.with_text_processor import process_frame_with_text as process_frame_text
import tempfile
import cv2

cfg['prompt'] = '.'.join(CLASSES)

deva = DEVAInferenceCore(deva_model, config=cfg)
deva.next_voting_frame = cfg['num_voting_frames'] - 1
deva.enabled_long_id()

# obtain temporary directory
result_saver = ResultSaver(None, None, dataset='gradio', object_manager=deva.object_manager)
writer_initizied = False

cap = cv2.VideoCapture(SOURCE_VIDEO_PATH)
fps = cap.get(cv2.CAP_PROP_FPS)
ti = 0
# only an estimate
with torch.cuda.amp.autocast(enabled=cfg['amp']):
    with tqdm(total=int(cap.get(cv2.CAP_PROP_FRAME_COUNT))) as pbar:
        while (cap.isOpened()):
            ret, frame = cap.read()
            if ret == True:
                if not writer_initizied:
                    h, w = frame.shape[:2]
                    writer = cv2.VideoWriter(OUTPUT_VIDEO_PATH, cv2.VideoWriter_fourcc(*'vp80'), fps, (w, h))
                    writer_initizied = True
                    result_saver.writer = writer

                process_frame_text(deva,
                                    gd_model,
                                    sam_model,
                                    'null.png',
                                    result_saver,
                                    ti,
                                    image_np=frame)
                ti += 1
                pbar.update(1)
            else:
                break
    flush_buffer(deva, result_saver)
writer.release()
cap.release()
deva.clear_buffer()

/content/Tracking-Anything-with-DEVA


100%|██████████| 45/45 [00:31<00:00,  1.43it/s]


## Play the output video

In [None]:
from IPython.display import HTML
from base64 import b64encode
webm = open(OUTPUT_VIDEO_PATH,'rb').read()
data_url = "data:video/webm;base64," + b64encode(webm).decode()
HTML("""
<video width=720 controls>
      <source src="%s" type="video/webm">
</video>
""" % data_url)