Dependency-aware Form Understanding
The raw dataset contains a total of 51,695 samples, each of which involves a UI state and a view hierarchy. After screening and removal of UIs without forms, the remaining samples are selected for analysis. Furthermore, we adopt a combination of automated and manual methods to label relations between form elements, which results in 25,140 annotated element dependency pairs (label-element: 8558, input-action: 5811, others: 10771). Notably, there are a large number of pairs in the others category, where input elements are randomly mapped to incorrect descriptions or actions. This aims to keep a balanced sample distribution.
UIE-Dependency: https://disk.pku.edu.cn:443/link/616DC67DEF24A4DAE8FD83F787270AAF
- System: Ubuntu 18.04
- Language: Python 3.6.8
- Devices: GeForce RTX 2080 Ti GPU
- scipy == 1.2.0
- numpy == 1.19.1
- pandas == 1.1.1
- torch == 1.2.0
- torchvision == 0.4.0
- scikit-learn == 0.23.2
- tables == 3.6.1
If you find this repository useful in your research, please cite the following paper:
@inproceedings{zhang2021dependency,
title={Dependency-aware Form Understanding},
author={Zhang, Shaokun and Li, Yuanchun and Yan, Weixiang and Guo, Yao and Chen, Xiangqun},
booktitle={2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)},
pages={139--149},
year={2021},
organization={IEEE}
}
For questions, please feel free to reach out via email at skzhang@pku.edu.cn.