Dataset for the publication:
A Network and Semantic Similarity Dataset of Requirements from the Tree Structure of System Requirement Specifications
Systems are developed as a solution to the problem space defined by their requirements. The requirements are acquired during the elicitation process. The creative nature of the elicitation process, proprietary nature of requirements, the need of extensive preprocessing and the diverse techniques for analysis restricts the development of a requirement dataset. There exists no formal or informal method to create a requirement dataset. Thus, we devise a semi-formal method to create a multi-purpose requirement dataset that harnesses human knowledge in the system requirement specification documents (SyRSDs) to facilitate the deployment of modern computing algorithms. Our dataset has three forms.
-
ReqList, a list of requirements from
$86$ distinct systems with their document structure in pure text form. The$12701$ requirements are ready to leverage natural language processing techniques and unsupervised machine learning techniques. -
ReqNet, a large network of requirements consisting of
$17375$ nodes to deploy graph-theoretic algorithms for requirement engineering. ReqNet portrays small-world network characteristics with an average distance of$\approx 9.5619$ links. -
ReqSim, a dataset consisting of
$10933$ pairs of requirements annotated with their similarity scores. ReqSim enables sentence-level supervised learning tasks to exploit the semantics of requirements. The similarity scores are coherent with human knowledge.
Our dataset is theoretically grounded by the tree structure of SyRSDs. We devise a method to extract a network from the SyRSDs and mathematically prove that the extracted network is a tree. The tree structure resonates with the hierarchical nature of the requirement allocation process.
If you find this dataset useful, please cite:
We thank the authors of PURE: A Dataset of Public Requirements Documents for making their dataset publicly available.