Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Video Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting.

The official implementation used in our paper, in ICRA'19.

Video object segmentation is an essential task in robot manipulation to facilitate grasping and learning affordances. Incremental learning is important for robotics in unstructured environments, since the total number of objects and their variations can be intractable. Inspired by the children learning process, human robot interaction (HRI) can be utilized to teach robots about the world guided by humans similar to how children learn from a parent or a teacher. A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels. We propose a novel teacher-student learning paradigm to teach robots about their surrounding environment. A two-stream motion and appearance "teacher" network provides pseudo-labels to adapt an appearance "student" network. The student network is able to segment the newly learned objects in other scenes, whether they are static or in motion. We also introduce a carefully designed dataset that serves the proposed HRI setup, denoted as (I)nteractive (V)ideo (O)bject (S)egmentation. Our IVOS dataset contains teaching videos of different objects, and manipulation tasks. Unlike previous datasets, IVOS provides manipulation tasks sequences with segmentation annotation along with the waypoints for the robot trajectories. It also provides segmentation annotation for the different transformations such as translation, scale, planar rotation, and out-of-plane rotation. Our proposed adaptation method outperforms the state-of-the-art on DAVIS and FBMS with 6.8% and 1.2% in F-measure respectively. It improves over the baseline on IVOS dataset with 46.1% and 25.9% in mIoU.

This implementation is based on the semi-supervised video segmentation method OnaVos implementation. Onavos


virtualenv --system-site-packages ~/.tfenvpy2
source ~/.tfenvpy2/bin/activate
pip install tensorflow-gpu
pip install sklearn
pip install scikit-image

Example Usage

To run on a certain sequence

  • CONFIG_FILE - Configuration File path
  • VID_ID - integer denoting the video index, if -1 all videos are used.

Configuration File

Explanation of the most important config parameters:

  • targets_path: Path to folder with adaptation targets
  • few_shot_samples: Number of samples to use in the adaptation.
  • n_adaptation_steps: number of update steps per sample.
  • adaptation_learning_rate: learning rate used during online adaptation
  • adapt_th: threshold used to obtain the positive training examples
  • distance_negative_threshold: distance threshold used to select the negative examples

IVOS Dataset

Download Data Formatted for MotAdapt

Trained Weights for Teacher model (2-stream WRN) trained on DAVIS16

This work is a continuation on Team Alberta KUKA Innovation Award submission.

Video Demo KUKA

Video Demo for Our Paper

IVOS Dataset


Please cite our work if you use the dataset or method outlined:

  title={Video Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting},
  author={Siam, Mennatullah and Jiang, Chen and Lu, Steven and Petrich, Laura and Gamal, Mahmoud and Elhoseiny, Mohamed and Jagersand, Martin},
  journal={arXiv preprint arXiv:1810.07733},


No description, website, or topics provided.



No releases published


No packages published