Visual Domain Knowledge-based Multimodal Zoning for Textual Region Localization in Noisy Historical Document Images
This repository contains the implementation of zoning solution based on multimodal approach using our novel visual representation, Gravity-map. The model predicts textual regions in a set of closed polygons on a document image. The underlying CNN models used in our solution is dhSegment and FPN.
The detail of this work can be found in the corresponding paper.
Step 1: Oversegment image using Voronoi-tesselation | Step 2: Compute geometric feature, gravity | Step 3: Construct Gravity-map |
---|---|---|