GitHub - sbelharbi/tcam-wsol-video: Pytorch code for paper "TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos"

Pytorch 1.11.0 code for:

TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos(https://arxiv. org/abs/2208.14542)

WACV 2023: [Slides] [Poster]

See below for demonstrative videos. [More video demos]

Citation:

@InProceedings{tcamsbelharbi2023,
  title={FTCAM: Temporal Class Activation Maps for Object Localization in
Weakly-Labeled Unconstrained Videos},
  author={Belharbi, S. and Ben Ayed, I. and McCaffrey, L. and Granger, E.},
  booktitle = {WACV},
  year={2023}
}

Issues:

Please create a github issue.

Method:

Results:

shot-000123.mp4

shot-000373.mp4

shot-000178.mp4

048.mp4

026.mp4

horse-006.mp4

plane-044.mp4

021.mp4

012.mp4

006.mp4

car-012.mp4

car-024.mp4

car-031.mp4

horse-014.mp4

005.mp4

029.mp4

car-004.mp4

shot-000097.mp4

horse-010.mp4

horse-004.mp4

car-018.mp4

shot-000045.mp4

shot-000381.mp4

shot-000198.mp4

shot-000001.mp4

shot-000179.mp4

shot-000002.mp4

shot-000047.mp4

shot-000426.mp4

shot-000008.mp4

shot-000122.mp4

shot-000160.mp4

shot-000108.mp4

Requirements:

See full requirements at ./dependencies/requirements.txt

Python 3.7.10
Pytorch 1.11.0
torchvision 0.12.0
Full dependencies
Build and install CRF:
- Install Swig
- CRF

cdir=$(pwd)
cd dlib/crf/crfwrapper/bilateralfilter
swig -python -c++ bilateralfilter.i
python setup.py install
cd $cdir
cd dlib/crf/crfwrapper/colorbilateralfilter
swig -python -c++ colorbilateralfilter.i
python setup.py install

Download datasets :

See folds/wsol-done-right-splits/dataset-scripts. For more details, see wsol-done-right repo.

You can use these scripts to download the datasets: cmds. Use the script _video_ds_ytov2_2.py to reformat YTOv2.2.

Once you download the datasets, you need to adjust the paths in get_root_wsol_dataset().

Run code :

Download files in download-files.txt from google drive.

WSOL baselines: CAM over YouTube-Objects-v1.0 using ResNet50:

cudaid=0  # cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid

getfreeport() {
freeport=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
}
export OMP_NUM_THREADS=50
export NCCL_BLOCKING_WAIT=1
plaunch=$(python -c "from os import path; import torch; print(path.join(path.dirname(torch.__file__), 'distributed', 'launch.py'))")
getfreeport
torchrun --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_port=$freeport main.py --local_world_size=1 \
       --task STD_CL \
       --encoder_name resnet50 \
       --arch STDClassifier \
       --opt__name_optimizer sgd \
       --dist_backend gloo \
       --batch_size 32 \
       --max_epochs 100 \
       --checkpoint_save 100 \
       --keep_last_n_checkpoints 10 \
       --freeze_cl False \
       --freeze_encoder False \
       --support_background True \
       --method CAM \
       --spatial_pooling WGAP \
       --dataset YouTube-Objects-v1.0 \
       --box_v2_metric False \
       --cudaid $cudaid \
       --amp True \
       --plot_tr_cam_progress False \
       --opt__lr 0.001 \
       --opt__step_size 15 \
       --opt__gamma 0.9 \
       --opt__weight_decay 0.0001 \
       --exp_id 08_28_2022_11_51_57_590148__5889160

Train until convergence, then store the cams of trainset to be used later. From the experiment folder, copy both folders 'YouTube-Objects-v1.0-resnet50-CAM-WGAP-cp_best_localization-boxv2_False' and 'YouTube-Objects-v1.0-resnet50-CAM-WGAP-cp_best_classification -boxv2_False' to the folder 'pretrained'. The contain best weights which will be loaded by TCAM model.

TCAM: Run:

cudaid=0  # cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid

getfreeport() {
freeport=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
}
export OMP_NUM_THREADS=50
export NCCL_BLOCKING_WAIT=1
plaunch=$(python -c "from os import path; import torch; print(path.join(path.dirname(torch.__file__), 'distributed', 'launch.py'))")
getfreeport
torchrun --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_port=$freeport main.py --local_world_size=1 \
       --task TCAM \
       --encoder_name resnet50 \
       --arch UnetTCAM \
       --opt__name_optimizer sgd \
       --dist_backend gloo \
       --batch_size 32 \
       --max_epochs 100 \
       --checkpoint_save 100 \
       --keep_last_n_checkpoints 10 \
       --freeze_cl True \
       --support_background True \
       --method CAM \
       --spatial_pooling WGAP \
       --dataset YouTube-Objects-v1.0 \
       --box_v2_metric False \
       --cudaid $cudaid \
       --amp True \
       --plot_tr_cam_progress False \
       --opt__lr 0.01 \
       --opt__step_size 15 \
       --opt__gamma 0.9 \
       --opt__weight_decay 0.0001 \
       --elb_init_t 1.0 \
       --elb_max_t 10.0 \
       --elb_mulcoef 1.01 \
       --sl_tc True \
       --sl_tc_knn 1 \
       --sl_tc_knn_mode before \
       --sl_tc_knn_t 0.0 \
       --sl_tc_knn_epoch_switch_uniform -1 \
       --sl_tc_min_t 0.0 \
       --sl_tc_lambda 1.0 \
       --sl_tc_min 1 \
       --sl_tc_max 1 \
       --sl_tc_ksz 3 \
       --sl_tc_max_p 0.6 \
       --sl_tc_min_p 0.1 \
       --sl_tc_seed_tech seed_weighted \
       --sl_tc_use_roi True \
       --sl_tc_roi_method roi_all \
       --sl_tc_roi_min_size 0.05 \
       --crf_tc True \
       --crf_tc_lambda 2e-09 \
       --crf_tc_sigma_rgb 15.0 \
       --crf_tc_sigma_xy 100.0 \
       --crf_tc_scale 1.0 \
       --max_sizepos_tc True \
       --max_sizepos_tc_lambda 0.01 \
       --size_bg_g_fg_tc False \
       --empty_out_bb_tc False \
       --sizefg_tmp_tc False \
       --knn_tc 0 \
       --rgb_jcrf_tc False \
       --exp_id 08_28_2022_11_50_04_936875__7685436

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
best-config		best-config
cmds		cmds
config_bash		config_bash
config_yaml		config_yaml
dependencies		dependencies
dlib		dlib
doc		doc
exps		exps
folds		folds
full_best_exps		full_best_exps
jobs		jobs
outputjobs		outputjobs
pretrained-imgnet		pretrained-imgnet
pretrained		pretrained
results		results
LICENSE		LICENSE
README.md		README.md
download-files.txt		download-files.txt
eval.py		eval.py
main.py		main.py

License

sbelharbi/tcam-wsol-video

Folders and files

Latest commit

History

Repository files navigation

Pytorch 1.11.0 code for:

WACV 2023: [Slides] [Poster]

See below for demonstrative videos. [More video demos]

Citation:

Issues:

Content:

Method:

Results:

Requirements:

Download datasets :

Run code :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages