GitHub - KangSooHan/GAI: Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

Robot vision data can be thought of as first-person videos.

There are three possible situations in one egocentric video.

Global - The global explains the overall situation including detailed information such as place, light,weather
Action - The action explains what the subject, i.e.I, is doing.
Interaction - The interaction explains the interacting situation or behavior between the subject, i.e. me,and others

Global Action Interaction(GAI)

Download UTEgocentric Dataset Dataset [Preprocess Dataset] (https://drive.google.com/file/d/1IlX_WosLWfqRnIGIobI9gipZ8EGOJIUz/view?usp=sharing)
Extract Video Features
```
$ python extract_RGB_feature.py
```
Train model
```
$ python train.py
```
Test model
```
$ python test.py
```

S2VT model by chenxinpeng

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
images		images
README.md		README.md
bias_init_vectorGAI.npy		bias_init_vectorGAI.npy
data_load.py		data_load.py
hparams.py		hparams.py
ixtowordGAI.npy		ixtowordGAI.npy
model.py		model.py
modules.py		modules.py
prepare.sh		prepare.sh
test.py		test.py
train.py		train.py
utils.py		utils.py
wordtoixGAI.npy		wordtoixGAI.npy