Skip to content

dilumb/DLTClust

Repository files navigation

DLTClust

A tool to derive candidate shared ledger combinations for industry ecosystems.

Multi-ledger designs are critical in industry ecosystems based on the Distributed Ledger Technology (DLT), including blockchain, to manage conflicting demands on data transparency for improved integrity against hiding commercially sensitive data. DLTClust can be used to answer the following questions in multi-ledger designs:

  • What parties should share which ledgers?
  • What data should be on those ledgers?

Given a matrix where elements represent parties' interest in data generated by other parties, DLTClust can ascertain ledger membership while minimizing data exposure and number of ledgers. It captures the following party-party relationships:

  1. Clusters - Well-connected set of parties or party-data relationships, e.g., consortium
  2. Busses - Parties that read/write data from/to all other parties, e.g., a supply chain integrator
  3. Sinks - Parties that read data from all other parties, e.g., regulator
  4. Sources - Parties that write data to all other parties, e.g., oracle

Alternatively, given a matrix where elements represent parties' interest in specific data elements, DLTClust can ascertain which party and data element should be placed on what ledger while minimizing data exposure and number of ledgers.

Party-party and party-data relationships should be encoded as binary Design Structure Matrix (DSM) and Domain Mapping Matrix (DMM), respectively. Given a DSM or DMM, DLTClust uses extended versions of the following clustering algorithms to identify DLT-ecosystem-specific party-party and party-data relationships:

  1. Tian-Li Yu, Ali A. Yassine, and David E. Goldberg, "An information theoretic method for developing modular architectures using genetic algorithms" Res Eng Design, 18:91-109, 2007, DOI 10.1007/s00163-007-0030-1
  2. McCormick, William T; Schweitzer, Paul J; and White, Thomas W, "Problem Decomposition and Data Reorganization by a Clustering Technique," Operations Research, 20(5), Sep. 1972, 993-1009

[2] is used sometimes to improve the layout of clustered representation.

Paper and Slides

  • H.M.N. Dilum Bandara, Mark Staples, and Sidra Malik. 2025. Designing for Shared Ledgers in Industry Ecosystems. Distrib. Ledger Technol. 0, 0, Article 0 (March 2025), https://doi.org/10.1145/3724410
  • Slides

How to Use

Configuration parameters

Set following configuration values on config.ini (if unsure start with default values from Yu et al.):

  • alpha - Type I error weight
  • beta - Type II error weight
  • population_size - Initial population size
  • offspring_size - Number of offsprings to generate
  • p_c - Crossover probability
  • p_m - Mutation probability
  • generation_limit - No of generation cycles to try
  • generation_limit_without_improvement - Stop if this many consecutive generations show no improvement
  • cluster_can_have_read_only_elements - Can a square cluster have real-only elements/parties?
  • cluster_can_have_partial_bus - Can a bus have a subset of rows and columns filled?
  • cluster_can_have_partial_sink - Can a sink have a subset of column filled?
  • cluster_can_have_partial_source - Can a source have a subset of row filled?

Typical values from Yu et al. [1]:

  • Population size = 3000
  • OffSpring size = 3000
  • Crossover probability p_c = 1/chromosome-length, 0.5, or 1
  • Mutation probability p_m = 1/chromosome-length
  • Generation limit = Not specified
  • Generation limit without improvement = 50
  • Alpha (Type I error weight) = 0.8116
  • Beta (Type II error weight) = 0.1102
  • 1 - Alpha - Beta = 0.0784 If number of clusters (n_c) is not decided, a typical value could be 1/2 n_n

Input File Format

DSM and DMM files should be in CSV format. A DSM file should have the following format:

  • List of labels starting with a comma to indicate an empty cell, e.g., ,C,D,E,A,B
  • Keep diagonals empty
  • From row 2 onwards 1st value must be a label
  • Use 1, x, or X to indicate the presence of a relationship
  • Use 0, o, O, or empty cell to indicate the absence of a relationship

A DMM file should have the following format:

  • Data must be in rows while parties must be in columns (just an assumption).
  • List of party labels starting with a comma to indicate an empty cell, e.g., ,P2,P1,P3,P5,P4
  • From row 2 onwards 1st value must be a data label
  • Use 1, x, or X to indicate the presence of a relationship
  • Use 0, o, O, or empty cell to indicate the absence of a relationship

Sample input DSM and DMM files are provided. Also, DSMs and DMMs from our paper can be found in DSMs_DMMs_from_paper folder.

Command-line Arguments

Set following command-line arguments (some are optional, and if not provided default values are used):

  • -b, --busses - No of busses to generate (integer). Optional (default is 0)
  • -c, --clusters - No of square clusters to generate (integer). Compulsory
  • -e, --test - Type of test statistics. Either DSM2Graph or Stat
  • -i, --input - Input matrix file to cluster (DSM or DMM). It should be a CSV file in given format (see example). Optional (default is dsm.csv)
  • -m, --members - Find test statistics for given cluster membership
  • -o, --output - Output matrix file name. Optional (default is clusters.csv)
  • -p, --params - Config file with parameters. Optional (default is config.ini)
  • -u, --sources - No of sources (aka writers) to generate (integer). Optional (default is 0)
  • -r, --seed - Random seed (integer). Optional (default is 123)
  • -s, --Sinks - No of Sinks (aka readers) to generate (integer). Optional (default is 0)
  • -t, --type - Type of matrix to cluster, i.e., DSM or DMM. Optional (default is DSM

Examples

DSM Clustering

To cluster the sample DSM use the following command:

python3 DLTClust -c 2

Clustering results will be saved as a .csv file and square clusters (to see overlapping membership) will be saved to a .png file.

Use the following command to cluster one of the example DSMs from the paper to get up to 4 square clusters, a data bus, sink, and source while setting the random seed to 123:

python3 DLTClust -c 4 -b 1 -u 1 -s 1 -r 123 -i DLTClust/DSMs_DMMs_from_paper/DSM_single_party_instance_supply_chain.csv

DMM Clustering

To cluster the sample DMM use the following command:

python3 DLTClustlust -c 2 -i DLTClust/dmm.csv -t DMM

Use the following command to cluster one of the example DMMs from the paper to get up to 5 clusters while setting the random seed to 999: python3 DLTClust -c 5 -r 999 -i dltclust/DSMs_DMMs_from_paper/DMM_BCDFI.csv -t DMM

Install

Tested on Python 3.13.1

pip install -r requirements.txt

Alternatively, you may create a Python environment and install the dependencies.

Unit Tests

python3 -m unittest discover DLTClust

This project uses linting: pylint for code quality controls.

License

This software is released under the CSIRO Open Source Software License Agreement. Details can be found at LICENSE.

About

A tool to derive candidate shared ledger structures for DLT ecosystems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages