Given today's interconnection of systems of systems and the growing demand for confidential and privacy-preserving data processing, analyzing software systems and finding violations at an early stage of their development is crucial. Data Flow Diagrams (DFD) were proposed for such structural analysis already more than 40 years ago. Until recently, however, there was a lack of both the diagram's expressiveness and the analysis capabilities to make statements about software systems from a software architectural point of view.
We provide an open-source data flow analysis that leverages the power of label propagation to provide software architects with simple yet powerful means to analyze privacy-related quality properties like confidentiality. The analysis has been incorporated with the Palladio Software Architecture Simulator and also provides various input and output formats as well as a textual domain-specific language (DSL) for the formulation of data flow constraints and queries. The research originates from the DSiS group, KASTEL Institute, Karlsruhe Institute of Technology (KIT), is used in various research projects including KASTEL, ANYMOS, SofDCar, Trust 4.0, and FluidTrust, and is currently being driven by Nicolas Boltz and Sebastian Hahner.
Data Flow Analysis
Our analysis uses label propagation to analyze the characteristics of data flows. First, we extract all possible data flows from data flow diagrams or annotated software architecture models. The extracted data flows (also called action sequences) contain all relevant information about the characteristics of the flowing data and its processing, e.g., by components or servers. We propagate these characteristics through the data flows and compare the result against pre-formulated constraints to detect violations of confidentiality, or privacy in general. Exemplary questions are:
- Does personal data flow to unauthorized locations violating the GDPR?
- Does data leave an internal server without being encrypted first?
- Does the access to sensitive data follow Role-based Access Control (RBAC)?
- Are there any data flows that merge two distinct types of data that would void anonymity?
More information can be found in these key publications:
- F. Schwickerath, N. Boltz, S. Hahner, M. Walter, C. Gerking, and R. Heinrich, "Tool-Supported Architecture-Based Data Flow Analysis for Confidentiality", presented at 17th European Conference on Software Architecture (ECSA), Tool & Demo Track, Preprint, 2023, doi: 10.48550/arXiv.2308.01645.
- S. Seifermann, R. Heinrich, D. Werle, and R. Reussner, "Detecting violations of access control and information flow policies in data flow diagrams", in Journal of Systems and Software (JSS), vol. 184, Elsevier, 2022, doi: 10.1016/j.jss.2021.111138.
- S. Seifermann, R. Heinrich, D. Werle, and R. Reussner, "A Unified Model to Detect Information Flow and Access Control Violations in Software Architectures", in 18th International Conference on Security and Cryptography (SECRYPT), SciTePress, 2021, doi: 10.5220/0010515300260037.
- S. Hahner, S. Seifermann, R. Heinrich, M. Walter, T. Bureš, and P. Hnětynka, "Modeling Data Flow Constraints for Design-Time Confidentiality Analyses", presented at 18th International Conference on Software Architecture Companion (ICSA), IEEE, 2021, doi: 10.1109/ICSA-C52384.2021.00009.
- S. Seifermann, R. Heinrich, and R. Reussner, "Data-Driven Software Architecture for Analyzing Confidentiality", in International Conference on Software Architecture (ICSA), IEEE, 2019, doi: 10.1109/ICSA.2019.00009.
The following table shows the structure of the analysis. The most important repositories are pinned below.
|The core repository containing the data flow extraction rules as well as the label propagation algorithm and analysis. Depends on 2 and 3.
|The meta model is used to directly model data flow diagrams with characterized data flows and as analysis input by 1 and transformation output, e.g., by 6 and 7.
|This extension consists of meta models for annotating Palladio software architecture models and serves as analysis input for 1.
|With this online available editor, data flow diagrams with privacy information can be created that serve as analysis input via 6.
|This transformation takes data flow diagrams following 2 and yields JSON-files for the WebEditor.
|This transformation takes JSON representations from the WebEditor or microSecEnD and yields data flow diagrams following 2.
|This transformation takes action sequences extracted from Palladio models annotated with 3 and data flow diagrams following 2.
There are currently two extensions of the data flow analysis available:
- ABUNAI stands for Architecture-Based Uncertainty-Aware Confidentiality AnalysIs and supports the modeling and analysis of uncertainty and its impact on confidentiality. By combining the data flow analysis with architecture-based uncertainty propagation, predictions on the interaction of uncertainty and confidentiality can be made.
- MDPA provides Model-Based Data Protection Assessments. By incorporating legal information from the GDPR, experts can make statements about data privacy from an software architectural viewpoint.
The easiest way to get started is by downloading our ready-to-use Eclipse product. Alternatively, all main repositories' artifacts are available on our Eclipse updatesite to be directly installed into the Eclipse Modeling Framework, see this guide. This recent publication also provides a good overview of the analysis.