Towards a comprehensive understanding of MHE networks, we propose to interpret them from multiple levels: 1) Neurons:unit-level dissection. Exploring the semantic and height selectivity of the learned internal deep representations; 2) Instances:object-level interpretation. Studying the effects of different semantic classes, scales and spatial contexts on height estimation; 3)Attribution: pixel-level analysis.
Fig. 1. MHE networks learn to recognize different semantic objects (road, building and tree) and height ranges implicitly. This figure shows thestrong selectivity of Transformer-based MHE networks on both the GTAH dataset and the real-world DFC 2019 dataset. (Best viewed with zoom in.) Fig. 2. Visualization of the high correlation between height ranges andfeature maps of MHE networks. (Best viewed with zoom in)This project is released under the Apache 2.0 license.