Skip to content

DirtyHarryLYL/HOI-Learning-List

master
Switch branches/tags
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

HOI-Learning-List

Some recent (2015-now) Human-Object Interaction Learing studies. If you find any errors or problems, please feel free to comment.

A list of Transfomer-based vision works: https://github.com/DirtyHarryLYL/Transformer-in-Vision.

Dataset/Benchmark

More...

Video HOI Datasets

Method

HOI Image Generation

  • Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [Paper]

  • Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [Paper]

HOI Recognition: Image-based, to recognize all the HOIs in one image.

More...

Unseen or zero-shot learning (image-level recognition).

  • ICompass (ICCV2021) [Paper], [Code]

  • Compositional Learning for Human Object Interaction (ECCV2018) [Paper]

  • Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]

More...

HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.

More...

Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).

More...

Video HOI methods

  • SPDTP (arXiv, Jun 2022), [Paper]

  • V-HOI (arXiv, Jun 2022), [Paper]

  • Detecting Human-Object Relationships in Videos (ICCV2021) [Paper]

  • STIGPN (Aug 2021), [Paper], [Code]

  • VidHOI (May 2021), [Paper]

  • LIGHTEN (ACMMM2020) [Paper] [Code]

  • Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [Paper]

  • Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [Code] [Paper]

  • GPNN (ECCV2018) [Code] [Paper]

More...

Result

PaStaNet-HOI:

Proposed by TIN (TPAMI version, Transferable Interactiveness Network). It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling). It has a more severe long-tailed data distribution thus is more difficult.

Detector: COCO pre-trained

Method mAP
iCAN 11.00
iCAN+NIS 13.13
TIN 15.38

HICO-DET:

1) Detector: COCO pre-trained

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
Shen et al. WACV2018 6.46 4.24 7.12 - - -
HO-RCNN WACV2018 7.81 5.37 8.54 10.41 8.94 10.85
InteractNet CVPR2018 9.94 7.16 10.77 - - -
Turbo AAAI2019 11.40 7.30 12.60 - - -
GPNN ECCV2018 13.11 9.34 14.23 - - -
Xu et. al ICCV2019 14.70 13.26 15.13 - - -
iCAN BMVC2018 14.84 10.45 16.15 16.26 11.33 17.73
Wang et. al. ICCV2019 16.24 11.16 17.75 17.73 12.78 19.21
Lin et. al IJCAI2020 16.63 11.30 18.22 19.22 14.56 20.61
Functional (suppl) AAAI2020 16.96 11.73 18.52 - - -
Interactiveness CVPR2019 17.03 13.42 18.11 19.17 15.51 20.26
No-Frills ICCV2019 17.18 12.17 18.68 - - -
RPNN ICCV2019 17.35 12.78 18.71 - - -
PMFNet ICCV2019 17.46 15.65 18.00 20.34 17.47 21.20
SIGN ICME2020 17.51 15.31 18.53 20.49 17.53 21.51
Interactiveness-optimized CVPR2019 17.54 13.80 18.65 19.75 15.70 20.96
Liu et.al. arXiv 17.55 20.61 - - - -
Wang et al. ECCV2020 17.57 16.85 17.78 21.00 20.74 21.08
In-GraphNet IJCAI-PRICAI 2020 17.72 12.93 19.31 - - -
HOID CVPR2020 17.85 12.85 19.34 - - -
MLCNet ICMR2020 17.95 16.62 18.35 22.28 20.73 22.74
SAG arXiv 18.26 13.40 19.71 - - -
Sarullo et al. arXiv 18.74 - - - - -
DRG ECCV2020 19.26 17.74 19.71 23.40 21.75 23.89
Analogy ICCV2019 19.40 14.60 20.90 - - -
VCL ECCV2020 19.43 16.55 20.29 22.00 19.09 22.87
VS-GATs arXiv 19.66 15.79 20.81 - - -
VSGNet CVPR2020 19.80 16.05 20.91 - - -
PFNet CVM 20.05 16.66 21.07 24.01 21.09 24.89
ATL(w/ COCO) CVPR2021 20.08 15.57 21.43 - - -
FCMNet ECCV2020 20.41 17.34 21.56 22.04 18.97 23.12
ACP ECCV2020 20.59 15.92 21.98 - - -
PD-Net ECCV2020 20.81 15.90 22.28 24.78 18.88 26.54
SG2HOI ICCV2021 20.93 18.24 21.78 24.83 20.52 25.32
TIN-PAMI TAPMI2021 20.93 18.95 21.32 23.02 20.96 23.42
ATL CVPR2021 21.07 16.79 22.35 - - -
PMN arXiv 21.21 17.60 22.29 - - -
IPGN TIP2021 21.26 18.47 22.07 - - -
DJ-RN CVPR2020 21.34 18.53 22.18 23.69 20.64 24.60
OSGNet IEEE Access 21.40 18.12 22.38 - - -
K-BAN arXiv2022 21.48 16.85 22.86 24.29 19.09 25.85
SCG+ODM arXiv 21.50 17.59 22.67 - - -
DIRV AAAI2021 21.78 16.38 23.39 25.52 20.84 26.92
SCG ICCV2021 21.85 18.11 22.97 - - -
HRNet TIP2021 21.93 16.30 23.62 25.22 18.75 27.15
ConsNet ACMMM2020 22.15 17.55 23.52 26.57 20.8 28.3
IDN NeurIPS2020 23.36 22.47 23.63 26.43 25.01 26.85
QAHOI-Res50 arXiv2021 24.35 16.18 26.80 - - -
DOQ CVPR2022 25.97 26.09 25.93 - - -
STIP CVPR2022 28.81 27.55 29.18 32.28 31.07 32.64

2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)

Finetuned detector would learn to only detect the interactive humans and objects (with interactiveness), thus suppress many wrong pairings (non-interactive human-object pairs) and boost the performance.

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
UniDet ECCV2020 17.58 11.72 19.33 19.76 14.68 21.27
IP-Net CVPR2020 19.56 12.79 21.58 22.05 15.77 23.92
RR-Net arXiv 20.72 13.21 22.97 - - -
PPDM (paper) CVPR2020 21.10 14.46 23.09 - - -
PPDM (github-hourglass104) CVPR2020 21.73/21.94 13.78/13.97 24.10/24.32 24.58/24.81 16.65/17.09 26.84/27.12
Functional AAAI2020 21.96 16.43 23.62 - - -
SABRA-Res50 arXiv 23.48 16.39 25.59 28.79 22.75 30.54
VCL ECCV2020 23.63 17.21 25.55 25.98 19.12 28.03
ATL CVPR2021 23.67 17.64 25.47 26.01 19.60 27.93
PST ICCV2021 23.93 14.98 26.60 26.42 17.61 29.05
SABRA-Res50FPN arXiv 24.12 15.91 26.57 29.65 22.92 31.65
ATL(w/ COCO) CVPR2021 24.50 18.53 26.28 27.23 21.27 29.00
IDN NeurIPS2020 24.58 20.33 25.86 27.89 23.64 29.16
FCL CVPR2021 24.68 20.03 26.07 26.80 21.61 28.35
HOTR CVPR2021 25.10 17.34 27.42 - - -
FCL+VCL CVPR2021 25.27 20.57 26.67 27.71 22.34 28.93
OC-Immunity AAAI2022 25.44 23.03 26.16 27.24 24.32 28.11
ConsNet-F ACMMM2020 25.94 19.35 27.91 30.34 23.4 32.41
SABRA-Res152 arXiv 26.09 16.29 29.02 31.08 23.44 33.37
QAHOI-Res50 arXiv2021 26.18 18.06 28.61 - - -
Zou et al. CVPR2021 26.61 19.15 28.84 29.13 20.98 31.57
RGBM arXiv2022 27.39 21.34 29.20 30.87 24.20 32.87
GTNet arXiv 28.03 22.73 29.61 29.98 24.13 31.73
K-BAN arXiv2022 28.83 20.29 31.31 31.05 21.41 33.93
AS-Net CVPR2021 28.87 24.25 30.25 31.74 27.07 33.14
QPIC-Res50 CVPR2021 29.07 21.85 31.23 31.68 24.14 33.93
GGNet CVPR2021 29.17 22.13 30.84 33.50 26.67 34.89
QPIC-CPC CVPR2022 29.63 23.14 31.57 - - -
QPIC-Res101 CVPR2021 29.90 23.92 31.69 32.38 26.06 34.27
SCG ICCV2021 29.26 24.61 30.65 32.87 27.89 34.35
PhraseHOI AAAI2022 30.03 23.48 31.99 33.74 27.35 35.64
MSTR CVPR2022 31.17 25.31 32.92 34.02 28.83 35.57
SSRT CVPR2022 31.34 24.31 33.32 - - -
OCN AAAI2022 31.43 25.80 33.11 65.3 67.1
SCG+ODM arXiv 31.65 24.95 33.65 - - -
DT CVPR2022 31.75 27.45 33.03 34.50 30.13 35.81
CATN (w/ Bert) arXiv2022 31.86 25.15 33.84 34.44 27.69 36.45
CDN NeurIPS2021 32.07 27.19 33.53 34.79 29.48 36.38
STIP CVPR2022 32.22 28.15 33.43 35.29 31.43 36.45
DEFR arXiv2021 32.35 33.45 32.02 - - -
CDN-s+HQM ECCV2022 32.47 28.15 33.76 - - -
UPT arXiv2021 32.62 28.62 33.81 36.08 31.41 37.47
Iwin arXiv2022 32.79 27.84 35.40 35.84 28.74 36.09
SDT arXiv2022 32.97 28.49 34.31 36.32 31.90 37.64
DOQ CVPR2022 33.28 29.19 34.50 - - -
IF CVPR2022 33.51 30.30 34.46 36.28 33.16 37.21
GEN-VLKT (w/ CLIP) CVPR2022 34.95 31.18 36.08 38.22 34.36 39.37
ParMap ECCV2022 35.15 33.71 35.58 37.56 35.87 38.06
QAHOI-Swin-Large-ImageNet-22K arXiv2021 35.78 29.80 37.56 37.59 31.66 39.36

3) Ground Truth human-object pair boxes (only evaluating HOI recognition)

Method Pub Full(def) Rare(def) None-Rare(def)
iCAN BMVC2018 33.38 21.43 36.95
Interactiveness CVPR2019 34.26 22.90 37.65
Analogy ICCV2019 34.35 27.57 36.38
ATL CVPR2021 43.32 33.84 46.15
IDN NeurIPS2020 43.98 40.27 45.09
ATL(w/ COCO) CVPR2021 44.27 35.52 46.89
FCL CVPR2021 45.25 36.27 47.94
GTNet arXiv 46.45 35.10 49.84
SCG ICCV2021 51.53 41.01 54.67
K-BAN arXiv2022 52.99 34.91 58.40
ConsNet ACMMM2020 53.04 38.79 57.3

4) Interactiveness detection (interactive or not + pair box detection):

Method Pub HICO-DET V-COCO
TIN++ TPAMI2022 14.35 29.36
PPDM CVPR2020 27.34 -
QPIC CVPR2021 32.96 38.33
CDN NeurIPS2021 33.55 40.13
ParMap ECCV2022 38.74 43.61

5) Enhanced with HAKE:

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
iCAN BMVC2018 14.84 10.45 16.15 16.26 11.33 17.73
iCAN + HAKE-HICO-DET CVPR2020 19.61 (+4.77) 17.29 20.30 22.10 20.46 22.59
Interactiveness CVPR2019 17.03 13.42 18.11 19.17 15.51 20.26
Interactiveness + HAKE-HICO-DET CVPR2020 22.12 (+5.09) 20.19 22.69 24.06 22.19 24.62
Interactiveness + HAKE-Large CVPR2020 22.66 (+5.63) 21.17 23.09 24.53 23.00 24.99

6) Zero-Shot HOI detection:

Unseen action-object combination scenario (UC)
Method Pub Detector Unseen(def) Seen(def) Full(def)
Shen et al. WACV2018 COCO 5.62 - 6.26
Functional AAAI2020 HICO-DET 11.31 ± 1.03 12.74 ± 0.34 12.45 ± 0.16
ConsNet ACMMM2020 COCO 16.99 ± 1.67 20.51 ± 0.62 19.81 ± 0.32
VCL (NF-UC) ECCV2020 HICO-DET 16.22 18.52 18.06
ATL(w/ COCO) ((NF-UC)) CVPR2021 HICO-DET 18.25 18.78 18.67
FCL (NF-UC) CVPR2021 HICO-DET 18.66 19.55 19.37
SCL arxiv HICO-DET 21.73 25.00 24.34
GEN-VLKT*(NF-UC) CVPR2022 HICO-DET 25.05 23.38 23.71
VCL (RF-UC) ECCV2020 HICO-DET 10.06 24.28 21.43
ATL(w/ COCO) ((RF-UC)) CVPR2021 HICO-DET 9.18 24.67 21.57
FCL (RF-UC) CVPR2021 HICO-DET 13.16 24.23 22.01
SCL(RF-UC) arxiv HICO-DET 19.07 30.39 28.08
GEN-VLKT*(RF-UC) CVPR2022 HICO-DET 21.36 32.91 30.56
  • * indicates large Visual-Language model pretraining, \eg, CLIP.
  • For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.
Unseen object scenario (UO)
Method Pub Detector Full(def) Seen(def) Unseen(def)
Functional AAAI2020 HICO-DET 13.84 14.36 11.22
FCL CVPR2021 HICO-DET 19.87 20.74 15.54
ConsNet ACMMM2020 COCO 20.71 20.99 19.27
Unseen action scenario (UA)
Method Pub Detector Full(def) Seen(def) Unseen(def)
ConsNet ACMMM2020 COCO 19.04 20.02 14.12
Another setting
Method Pub Unseen Seen Full
Shen et. al. WACV2018 5.62 - 6.26
Functional AAAI2020 10.93 12.60 12.26
VCL ECCV2020 10.06 24.28 21.43
ATL CVPR2021 9.18 24.67 21.57
FCL CVPR2021 13.16 24.23 22.01
THID (w/ CLIP) CVPR2022 15.53 24.32 22.96

Ambiguous-HOI

Detector: COCO pre-trained

Method mAP
iCAN 8.14
Interactiveness 8.22
Analogy(reproduced) 9.72
DJ-RN 10.37
OC-Immunity 10.45

SWiG-HOI

Method Pub Non-Rare Unseen Seen Full
JSR ECCV2020 10.01 6.10 2.34 6.08
CHOID ICCV2021 10.93 6.63 2.64 6.64
QPIC CVPR2021 16.95 10.84 6.21 11.12
THID (w/ CLIP) CVPR2022 17.67 12.82 10.04 13.26

V-COCO: Scenario1

1) Detector: COCO pre-trained or one-stage detector

Method Pub AP(role)
Gupta et al. arXiv 31.8
InteractNet CVPR2018 40.0
Turbo AAAI2019 42.0
GPNN ECCV2018 44.0
iCAN BMVC2018 45.3
Xu et. al CVPR2019 45.9
Wang et. al. ICCV2019 47.3
UniDet ECCV2020 47.5
Interactiveness CVPR2019 47.8
Lin et. al IJCAI2020 48.1
VCL ECCV2020 48.3
Zhou et. al. CVPR2020 48.9
In-GraphNet IJCAI-PRICAI 2020 48.9
Interactiveness-optimized CVPR2019 49.0
TIN-PAMI TAPMI2021 49.1
IP-Net CVPR2020 51.0
DRG ECCV2020 51.0
RGBM arXiv2022 51.7
VSGNet CVPR2020 51.8
PMN arXiv 51.8
PMFNet ICCV2019 52.0
Liu et.al. arXiv 52.28
FCL CVPR2021 52.35
PD-Net ECCV2020 52.6
Wang et.al. ECCV2020 52.7
PFNet CVM 52.8
Zou et al. CVPR2021 52.9
SIGN ICME2020 53.1
ACP ECCV2020 52.98 (53.23)
FCMNet ECCV2020 53.1
HRNet TIP2021 53.1
SGCN4HOI arXiv2022 53.1
ConsNet ACMMM2020 53.2
IDN NeurIPS2020 53.3
SG2HOI ICCV2021 53.3
OSGNet IEEE Access 53.43
SABRA-Res50 arXiv 53.57
K-BAN arXiv2022 53.70
IPGN TIP2021 53.79
AS-Net CVPR2021 53.9
RR-Net arXiv 54.2
SCG ICCV2021 54.2
SABRA-Res50FPN arXiv 54.69
GGNet CVPR2021 54.7
MLCNet ICMR2020 55.2
HOTR CVPR2021 55.2
DIRV AAAI2021 56.1
SABRA-Res152 arXiv 56.62
PhraseHOI AAAI2022 57.4
GTNet arXiv 58.29
QPIC-Res101 CVPR2021 58.3
QPIC-Res50 CVPR2021 58.8
CATN (w/ fastText) arXiv2022 60.1
Iwin arXiv2022 60.85
UPT-ResNet-101-DC5 arXiv2021 61.3
SDT arXiv2022 61.8
MSTR CVPR2022 62.0
IF CVPR2022 63.0
ParMap ECCV2022 63.0
QPIC-CPC CVPR2022 63.1
DOQ CVPR2022 63.5
GEN-VLKT (w/ CLIP) CVPR2022 63.58
QPIC+HQM ECCV2022 63.6
CDN NeurIPS2021 63.91
SSRT CVPR2022 65.0
STIP CVPR2022 66.0
DT CVPR2022 66.2

2) Enhanced with HAKE:

Method Pub AP(role)
iCAN CVPR2019 45.3
iCAN + HAKE-Large (transfer learning) CVPR2020 49.2 (+3.9)
Interactiveness CVPR2019 47.8
Interactiveness + HAKE-Large (transfer learning) CVPR2020 51.0 (+3.2)

HOI-COCO:

based on V-COCO

Method Pub Full Seen Unseen
VCL ECCV2020 23.53 8.29 35.36
ATL(w/ COCO) CVPR2021 23.40 8.01 35.34

HICO

1) Default

Method mAP
R*CNN 28.5
Girdhar et.al. 34.6
Mallya et.al. 36.1
Pairwise 39.9
RelViT 40.12
DEFR-base 44.1
DEFR-CLIP 60.5
DEFR/16 CLIP 65.6

2) Enhanced with HAKE:

Method mAP
Mallya et.al. 36.1
Mallya et.al.+HAKE-HICO 45.0 (+8.9)
Pairwise 39.9
Pairwise+HAKE-HICO 45.9 (+6.0)
Pairwise+HAKE-Large 46.3 (+6.4)

Releases

No releases published

Packages

No packages published