Skip to content

Mimicking human parallel visual information processing system (mismatch penalty) for Image semantic segmentation, using multiple feature maps.

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



42 Commits

Repository files navigation

Multi-scale feature map induced image segmentation

Deep neural network is mimicking hierachical and feedforward process of human visual cortex. However, it is not a whole story. Human visual system is rather dynamic and recurrsive, therefore, interactive through out different layers. Such a top-down and bottom-up interactions are seemed to mimicked as a form of residual layers (or short and long skip connections). However, it is unclear how it is explained with regard to human visual processing. In current project, characteristics of mutiple scale residual maps are studied, and their integration strategies are studied. Corresponding features and integration strategies are considered with respect to human perceptual features.

This was supported by Deep Learning Camp Jeju 2018 which was organized by TensorFlow Korea User Group and supported by tensorflow Korea, Google, Kakao-brain, Netmarble, SKT, Element AI, JDC, and Jeju Univ.

Fully proposed by Oh-hyeon Choung (PhD candidate, EPFL Neuroscence program)

Main references:

  1. Lauffs, M. M., Choung, O. H., Öğmen, H., & Herzog, M. H. (2018). Unconscious retinotopic motion processing affects non-retinotopic motion perception. Consciousness and cognition. (link)
  2. Shelhamer, E., Long, J., & Darrell, T. (2016). Fully Convolutional Networks for Semantic Segmentation. ArXiv:1605.06211 [Cs]. (link)
  3. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham. (link)

Task: Image semantic segmentation


alt text alt text


  1. Feature maps from each convolutional layer include distinct information
  2. Depending on it's local/abstract features, could they be integrated using different strategy as human does?


Human visual system starts from lower visual area and proceed to the higher areas. However, it is not a full story. Our lower visual areas are largely affected by various higher visual area interactively.

Retino and Non-retino images


  1. To

Base line model: FCN (fully convolutional network)

Base line model is forked from

For the baseline setting, please refer to original github repository.

Major Debugging Problems

  • The code is written in python 2 (python 2.7 and tensorflow ==1.9.0 worked for me)
  • In python 3 (and python 2 of tf 1.x.x): tf.pack --> tf.stack
  • Beaware of tfrecord's file path and name: causes
  • "std::bad_alloc" error: RAM memory out or in border
  • ['label' out of range] error: 255 (border) values in label file cuses error. For me I've added
# Take away the masked out values from evaluation
weights = tf.to_float( tf.not_equal(annotation_batch_tensor, 255) )
# Get rid of 255s from the annotation_batch_tensor -> by multiplying weight factor
annotation_batch_tensor = tf.multiply(annotation_batch_tensor, tf.cast(weights,tf.uint8))


Mimicking human parallel visual information processing system (mismatch penalty) for Image semantic segmentation, using multiple feature maps.






No releases published