<a href="https://colab.research.google.com/github/aliciafmachado/stcn-video-segmentation/blob/main/Running_STCN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Alicia Fortes Machado

## Installations

In [1]:
! pip install progressbar2 opencv-python gitpython gdown git+https://github.com/cheind/py-thin-plate-spline
! pip install pyyaml==5.1

Collecting git+https://github.com/cheind/py-thin-plate-spline
  Cloning https://github.com/cheind/py-thin-plate-spline to /tmp/pip-req-build-hb_nlc3x
  Running command git clone -q https://github.com/cheind/py-thin-plate-spline /tmp/pip-req-build-hb_nlc3x
Collecting gitpython
  Downloading GitPython-3.1.26-py3-none-any.whl (180 kB)
[K     |████████████████████████████████| 180 kB 4.0 MB/s 
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 1.8 MB/s 
[?25hCollecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Building wheels for collected packages: thinplate
  Building wheel for thinplate (setup.py) ... [?25l[?25hdone
  Created wheel for thinplate: filename=thinplate-1.0.0-py3-none-any.whl size=6723 sha256=2005e657db45eb486df5177b5bbf9d3581ff222c589d8daf8753e032bce7aba1
  Stored in directory: /tmp/pip-ephem-wheel-cache-3gloo2h0/wheels/c2/e5/57/3a7c488e2aa9b0452f8ddf0191fae86be1667a362e

In [8]:
import torch

TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
! pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html

torch:  1.10 ; cuda:  cu111
Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html
Collecting detectron2
  Downloading https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/detectron2-0.6%2Bcu111-cp37-cp37m-linux_x86_64.whl (7.0 MB)
[K     |████████████████████████████████| 7.0 MB 33.1 MB/s 
[?25hCollecting omegaconf>=2.1
  Downloading omegaconf-2.1.1-py3-none-any.whl (74 kB)
[K     |████████████████████████████████| 74 kB 2.1 MB/s 
Collecting yacs>=0.1.8
  Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Collecting hydra-core>=1.1
  Downloading hydra_core-1.1.1-py3-none-any.whl (145 kB)
[K     |████████████████████████████████| 145 kB 8.9 MB/s 
[?25hCollecting fvcore<0.1.6,>=0.1.5
  Downloading fvcore-0.1.5.post20220119.tar.gz (55 kB)
[K     |████████████████████████████████| 55 kB 3.2 MB/s 
[?25hCollecting iopath<0.1.10,>=0.1.7
  Downloading iopath-0.1.9-py3-none-any.whl (27 kB)
Collecting black==21.4b2
  Downloading black-21.4b

## Setup

In [2]:
! git clone --recursive https://github.com/aliciafmachado/stcn-video-segmentation.git
% cd stcn-video-segmentation

Cloning into 'stcn-video-segmentation'...
remote: Enumerating objects: 131, done.[K
remote: Counting objects: 100% (126/126), done.[K
remote: Compressing objects: 100% (90/90), done.[K
remote: Total 131 (delta 72), reused 89 (delta 35), pack-reused 5[K
Receiving objects: 100% (131/131), 20.83 KiB | 6.94 MiB/s, done.
Resolving deltas: 100% (72/72), done.
Submodule 'code/STCN' (https://github.com/aliciafmachado/STCN.git) registered for path 'code/STCN'
Submodule 'code/davis2017-evaluation' (https://github.com/aliciafmachado/davis2017-evaluation.git) registered for path 'code/davis2017-evaluation'
Cloning into '/content/stcn-video-segmentation/code/STCN'...
remote: Enumerating objects: 196, done.        
remote: Counting objects: 100% (196/196), done.        
remote: Compressing objects: 100% (125/125), done.        
remote: Total 196 (delta 100), reused 165 (delta 70), pack-reused 0        
Receiving objects: 100% (196/196), 104.93 KiB | 1.40 MiB/s, done.
Resolving deltas: 100% (100/

In [3]:
# Download best model
! python code/STCN/download_model.py

Downloading stcn.pth...
Downloading...
From: https://drive.google.com/uc?id=1mRrE0uCI2ktdWlUgapJI_KmgeIiF2eOm
To: /content/stcn-video-segmentation/saves/stcn.pth
100% 218M/218M [00:01<00:00, 209MB/s]
Done.


In [4]:
# Download datasets - here we don't download the Youtube VOS and BL30K datasets
! python code/STCN/download_datasets.py


These are either re-distribution of the original datasets or derivatives (through simple processing) of the original datasets. 
Please read and respect their licenses and terms before use. 
You should cite the original papers if you use any of the datasets.

For BL30K, see download_bl30k.py

Links:
DUTS: http://saliencydetection.net/duts
HRSOD: https://github.com/yi94code/HRSOD
FSS: https://github.com/HKUSTCV/FSS-1000
ECSSD: https://www.cse.cuhk.edu.hk/leojia/projects/hsaliency/dataset.html
BIG: https://github.com/hkchengrex/CascadePSP

YouTubeVOS: https://youtube-vos.org
DAVIS: https://davischallenge.org/
BL30K: https://github.com/hkchengrex/MiVOS

Datasets will be downloaded and extracted to ../YouTube, ../YouTube2018, ../static, ../DAVIS
[y] to confirm, others to exit: y
Downloading DAVIS 2016...
Downloading...
From: https://drive.google.com/uc?id=198aRlh5CpAoFz0hfRgYbiNenn_K8DxWD
To: /content/DAVIS/DAVIS-data.zip
100% 1.96G/1.96G [00:25<00:00, 76.2MB/s]
Downloading DAVIS 2017 trai

### Davis 2019 (optional)

In [None]:
# Download DAVIS2019 - Unsupervised version
# Only download if you want to try with this dataset

% cd ..
! mkdir tmp_DAVIS
! wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-480p.zip
! unzip DAVIS-2017-Unsupervised-trainval-480p.zip -d tmp_DAVIS
! rm DAVIS-2017-Unsupervised-trainval-480p.zip 
! mv tmp_DAVIS/DAVIS/Annotations_unsupervised DAVIS/2017/trainval
! rm -r tmp_DAVIS
% cd stcn-video-segmentation

[1;30;43mA saída de streaming foi truncada nas últimas 5000 linhas.[0m
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00001.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00032.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00030.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00012.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00025.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00063.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00045.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00006.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00029.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00049.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00057.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00035.jpg  
  inflating: tmp_DAVIS/DAVIS/JPEGImages/480p/lady-running/00028.jpg  
  inflating: tmp_

### Import Something-Something data from Something-Else repository

In [5]:
# We use the data provided by something-else repository
% cd ..
! git clone https://github.com/joaanna/something_else.git
! mv something_else/videos videos
! python stcn-video-segmentation/code/scripts/rename_smth_else.py --smth_else_path videos
! rm -r something_else
% cd stcn-video-segmentation

/content
Cloning into 'something_else'...
remote: Enumerating objects: 1284, done.[K
remote: Counting objects: 100% (1284/1284), done.[K
remote: Compressing objects: 100% (1257/1257), done.[K
remote: Total 1284 (delta 43), reused 1251 (delta 23), pack-reused 0[K
Receiving objects: 100% (1284/1284), 37.23 MiB | 14.34 MiB/s, done.
Resolving deltas: 100% (43/43), done.
/content/stcn-video-segmentation


## Segmentation using Detectron2 and our heuristic

### Segmenting DAVIS datasets

First, for davis 2017:

In [14]:
! python code/segmentation/seg_first_frame.py --dataset 'davis2017' --annotations_folder Annotations

Segmenting first frames...
Processing bear!
  max_size = (max_size + (stride - 1)) // stride * stride
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Processing bike-packing!
Processing blackswan!
Processing bmx-bumps!
Processing bmx-trees!
Processing boat!
Processing boxing-fisheye!
Processing breakdance!
Processing breakdance-flare!
Processing bus!
Processing camel!
Processing car-roundabout!
Processing car-shadow!
Processing car-turn!
Processing cat-girl!
Processing classic-car!
Processing color-run!
Processing cows!
Processing crossing!
Processing dance-jump!
Processing dance-twirl!
Processing dancing!
Processing disc-jockey!
Processing dog!
Processing dog-agility!
Processing dog-gooses!
Processing dogs-jump!
Processing dogs-scale!
Processing drift-chicane!
Processing drift-straight!
Processing drift-turn!
Processing drone!
Processing elephant!
Processing flamingo!
Processing goat!
Processing gold-fish!
Processing hike!
Processing hockey!
Processing horsejump

Then, for DAVIS 2016:

In [13]:
! python code/segmentation/seg_first_frame.py --dataset 'davis2016' --annotations_folder Annotations --real_path '../DAVIS/2016' --pred_path "../DAVIS/2016"

Segmenting first frames...
Processing bear!
  max_size = (max_size + (stride - 1)) // stride * stride
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Processing blackswan!
Processing bmx-bumps!
Processing bmx-trees!
Processing boat!
Processing breakdance!
Processing breakdance-flare!
Processing bus!
Processing camel!
Processing car-roundabout!
Processing car-shadow!
Processing car-turn!
Processing cows!
Processing dance-jump!
Processing dance-twirl!
Processing dog!
Processing dog-agility!
Processing drift-chicane!
Processing drift-straight!
Processing drift-turn!
Processing elephant!
Processing flamingo!
Processing goat!
Processing hike!
Processing hockey!
Processing horsejump-high!
Processing horsejump-low!
Processing kite-surf!
Processing kite-walk!
Processing libby!
Processing lucia!
Processing mallard-fly!
Processing mallard-water!
Processing motocross-bumps!
Processing motocross-jump!
Processing motorbike!
Processing paragliding!
Processing paragliding-launc

### Segmenting Something-Something data

In [15]:
! python code/segmentation/seg_first_frame.py --real_path '../something-something/JPEGImages' --pred_path '../something-something' --dataset 'smth-smth' --max_nb_objects 15 --palette_path ../DAVIS/2017/trainval/Annotations/480p/bear/00000.png

Segmenting first frames...
Processing 13201!
  max_size = (max_size + (stride - 1)) // stride * stride
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Processing 151201!
Processing 2!
Processing 2003!
Processing 22983!
Processing 3201!
Processing 4!
Processing 44862!
Processing 57082!
Processing 6981!
Processing 77005!
Processing 80962!
Processing 862!
Finished!


## Applying video segmentation using image segmentation results

### DAVIS datasets

First, for DAVIS 2017:

In [16]:
! python code/STCN/eval_davis.py --output results_2017_unsup --first_frame_folder Auto_Annotations

  cpuset_checked))
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100% 97.8M/97.8M [00:01<00:00, 90.6MB/s]
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100% 44.7M/44.7M [00:00<00:00, 95.7MB/s]
  cpuset_checked))
100% (30 of 30) |#########################| Elapsed Time: 0:03:06 Time:  0:03:06
Total processing time:  170.1976068019867
Total processed frames:  1999
FPS:  11.74517102538169


Then, for DAVIS 2016:

In [17]:
! python code/STCN/eval_davis_2016.py --output results_2016_unsup --first_frame_folder Auto_Annotations

  cpuset_checked))
  cpuset_checked))
100% (20 of 20) |#########################| Elapsed Time: 0:01:52 Time:  0:01:52
Total processing time:  104.14360523223877
Total processed frames:  1376
FPS:  13.212525117903681


### Something-Something

In [18]:
! python code/STCN/eval_generic.py --output results_smth_smth --data_path ../something-something

Processing 13201 ...
Processing 151201 ...
Processing 2 ...
Processing 2003 ...
Processing 22983 ...
Processing 3201 ...
Processing 4 ...
Processing 44862 ...
Processing 57082 ...
Processing 6981 ...
Processing 77005 ...
Processing 80962 ...
Processing 862 ...
100% (13 of 13) |#########################| Elapsed Time: 0:01:07 Time:  0:01:07


## Conclusion

After all of this, you can download your data so that you can check the metrics by using DAVIS 2017 framework for calculating the metrics:

In [None]:
from google.colab import files
# Example follows for DAVIS 2017
# ! zip -r results_2017_unsup.zip results_2017_unsup
# files.download('results_2017_unsup.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>