Diblob is a package designed for computations on digraphs (pseudographs). The main assumption is to enable easy operations on JSON representations of digraphs. The package allows treating a subgraph as a node and facilitates the extraction of subgraphs for further work. It is based on basic Python structures and does not depend on external packages.
Package requires python with version >= 3.10.
- The package can be installed with pip using the command
pip install diblob. - For installing packages necessary only for testing, use the command
pip install -r requirements.txt.
The core of Diblob revolves around its data structure, managed by the GraphManager. Operations within Diblob are executed on Node, Edge, and Blob components.
Node representation in digraph. Components cosists of following fields:
Node representation in a digraph consists of the following fields:
node_id: The identifier for the node used by GraphManager.diblob_id: The identifier of the diblob where the node is located.incoming_nodes: A list of node identifiers (tails of the edges for which the node is the head).outgoing_nodes: A list of node identifiers (heads of the edges for which the node is the tail).
Both incoming and outgoing nodes can be redundant, as pseudographs are accommodated as well.
Representation of an edge in a digraph:
path: A list of node IDs that the edge connects.
Representation of a diblob within a digraph:
diblob_id: The identifier for the diblob, used by GraphManager.parent_id: The identifier of the parent diblob to this diblob.children: IDs of diblobs that are children of this diblob.nodes: diblob IDs contained within this diblob.
Diblobs sharing the same graph form a tree-based structure. Furthermore, the entire graph is treated as a diblob (root_id).
A digraph data structure can be created as follows:
from diblob import DigraphManager
digraph_dict = {"B0": {"A": ["B", "F"],
"B": ["C", "D", "E"],
"C": ["D"],
"D": ["E", "F", "G"],
"E": ["F", "A"],
"F": ["G", "B"],
"G": ["A", "D"]}}
digraph_manager = DigraphManager(digraph_dict)Note that if a digraph is stored in a JSON file, it can be loaded using json.load. To illustrate, let's create blobs B1 and B2 in the digraph, each containing specific nodes. Blob B1 will include nodes A and B, while B2 will encompass nodes C, D, and E:
from diblob import tools
digraph_manager.gather('B1', {'A', 'B'})
digraph_manager.gather('B2', {'C', 'D', 'E'})
tools.display_digraph_json(digraph_manager('B0'))The result is as follows (where display_digraph is a helper function used for printing a human-friendly output in Python):
{
"B0": {
"B1": {
"B": [{"B2": ["C", "D", "E"]}],
"A": ["B", {"B0": ["F"]}],
},
"B2": {
"C": ["D"],
"D": ["E", {"B0": ["F", "G"]}],
"E": [{"B0": ["F"]}, {"B1": ["A"]}],
},
"F": ["G", {"B1": ["B"]}],
"G": [{"B1": ["A"]}, {"B2": ["D"]}],
},
}Now, let's compress the created diblobs into nodes:
digraph_manager.compress_diblob('B1')
digraph_manager.compress_diblob('B2')
tools.display_digraph(digraph_manager('B0'))The result is as follows:
{
"B0": {
"F": ["G", "B1"],
"B2": ["F", "B1", "G", "F"],
"B1": ["F", "B2", "B2", "B2"],
"G": ["B1", "B2"],
},
}The Diblob package is composed of the following modules:
components: Utilized by the Graph Manager, this module includes node, edge, and diblob components.digraph_manager: Serves as the core of the proposed data structure.factory: Provides examples of using the graph_manager for creating more complex digraphs.algorithms: Contains examples of employing Diblob with basic digraph algorithms such as DFS, DFSA (a modified version of DFS), and the Dijkstra algorithm.exceptions: Defines exceptions that are utilized across the modules.tools: Offers utilities for convenient work with digraphs.
Within the entire package, methods prefixed with an underscore ('_') are intended for internal use by the GraphManager.
Components are utilized by the GraphManager for the creation of the data structure, which consists of instances of the Node, Edge, and Diblob classes.
Node Representation in a Digraph: Nodes in a digraph can have overlapping incoming and outgoing nodes, accommodating pseudographs as well.
node_id: The identifier of the node, used by GraphManager.diblob_id: The identifier of the diblob in which the node is located.incoming_nodes: A list of node IDs (tails of the edges for which this node is the head).outgoing_nodes: A list of node IDs (heads of the edges for which this node is the tail).
get_incoming_edges(self): Returns a set of edge IDs where the node is the head.get_outgoing_edges(self): Returns a set of edge IDs where the node is the tail.incoming_dim(self): Returns the number of incoming nodes.outgoing_dim(self): Returns the number of outgoing nodes._add_incoming(self, node_id: str): Adds a new node_id to incoming_nodes._add_outgoing(self, node_id: str): Adds a new node_id to outgoing_nodes._rm_incoming(self, node_id: str): Removes a node_id from incoming_nodes._rm_outgoing(self, node_id: str): Removes a node_id from outgoing_nodes.
Edge Representation in a Digraph: This representation allows for treating certain types of paths as a single edge, facilitating more efficient and simplified graph analyses.
path: A list of node IDs that constitute the sequence of nodes this edge traverses.
The path is maintained as a list, enabling the treatment of a chain of node IDs as a single edge.
This approach is particularly useful in operations like compress_edges, where multiple consecutive edges are consolidated into one for simplified graph representation.
get_tail_and_head(self): Returns the tail and head nodes of the edge, indicating the direction of traversal.get_id(self): Returns the edge_id, uniquely identifying the edge by its head and tail nodes._reverse(self): Reverses the order of nodes in the path field, effectively inverting the direction of the edge.
Diblob Representation in a Digraph:
diblob_id: The unique identifier for the diblob, utilized by GraphManager.parent_id: The identifier of the parent diblob, establishing a hierarchical relationship within the graph.children: A list of identifiers for diblobs that are children of this diblob, further defining the structure of the graph.nodes: A list of node IDs (or diblob IDs) that are contained within this diblob, representing the diblob's internal structure.
_add_children(self, *child_ids: tuple[str]): Adds one or more diblob_ids to the diblob, expanding its child elements._add_nodes(self, *node_ids: tuple[str]): Incorporates one or more node_ids into the diblob, enhancing its internal structure.
The Digraph Manager is at the heart of the package, overseeing the management of the data structure. It is crucial for the entire package's functionality. To create an instance of DigraphManager, a dictionary representation of the digraph is required, such as:
digraph_dict = {
"B0": {
"B1": {
"B": [{"B2": ["C", "D", "E"]}],
"A": ["B", {"B0": ["F"]}],
},
"B2": {
"C": ["D"],
"D": ["E", {"B0": ["F", "G"]}],
"E": [{"B0": ["F"]}, {"B1": ["A"]}],
},
"F": ["G", {"B1": ["B"]}],
"G": [{"B1": ["A"]}, {"B2": ["D"]}],
},
}
graph_manager = GraphManager(digraph_dict)As a result, the following digraph has been created:
diblobs: A dictionary where each key-value pair corresponds to diblob_id and its associated Diblob object, respectively. This includesB0,B1, andB2as shown in the diagram.nodes: A dictionary where each key-value pair corresponds to node_id and its associated Node object, respectively. This includes nodesA,B,C,D,E,F, andGas depicted in the diagram.edges: A dictionary where each key-value pair corresponds to node_id and a list of Edge objects, respectively. A list is used to accommodate multiple edges sharing the same head and tail, encapsulating edgesAB,AF,BC,BD,BE,CD,DE,DF,DG,FB,FG, andGAas illustrated in the diagram.root_diblob_id: The rootdiblob_idrepresenting the entire digraph. Even if the digraph contains no internal diblobs, the entire graph is treated as a single diblob (B0in the diagram)..
-
construct(self, diblob_id: str, graph_dict_representation: dict, gather_dict: dict, edges_to_connect: list[str])diblob_id: str: The ID of the diblob being considered (for recursion).graph_dict_representation: dict: The provided dictionary representation of the digraph.gather_dict: dictA dictionary used to accumulate the results of recursion.edges_to_connect: list[str]: A list of edge IDs to be connected during the digraph's construction.
Constructs a diblob using recursion. This helper function is utilized in the init method for diblob construction.
-
get_diblobs_common_ancestor(self, diblob_id1: str, diblob_id2: str)diblob_id1: str: The ID of the first diblob.diblob_id2: str: The ID of the second diblob.
Returns thediblob_idof the common ancestor of two diblobs, based on their IDs. Diblobs are structured as a tree.
-
get_diblob_descendants(self, diblob_id: str)diblob_id: str: The ID of the diblob whose descendants are being searched.
Returns a set of diblob_ids that are descendants in the subtree for which the diblob with the given diblob_id is the root.
-
get_diblob_edges(self, diblob_id: str)diblob_id: str: The ID of the diblob being considered.
Returns a set of all edge IDs, incoming edge IDs, outgoing edge IDs, and the set of diblob descendants for the specified diblob_id. This method also has side effects on the considered diblob.
-
is_diblob_ancestor(self, potential_ancestors: set, diblob_id: str)potential_ancestors: set: A set of IDs of potential ancestor diblobs.diblob_id: str: The ID of the diblob being considered.
Validates whether a diblob with the given diblob_id is an ancestor among the provided potential ancestors.
-
flatten(self, *diblob_ids: tuple[str])*diblob_ids: tuple[str]: IDs of the diblobs to be flattened.
Flattens the structure by removing diblobs with the provided IDs. Note that removing a diblob does not imply the deletion of its nodes; instead, nodes are transferred to the diblob's direct ancestor. The root diblob cannot be flattened.
-
gather(self, new_diblob_id: str, node_ids: set[str])new_diblob_id: str: The ID of the new diblob that will be created.node_ids: set[str]: The IDs of the nodes that will be included in the new diblob.
Accumulates nodes and existing diblobs into a new diblob specified by new_diblob_id.
compress_diblob(self, diblob_id: str)diblob_id: str: The ID of the Diblob that is to be compressed.
This method compresses the specified Diblob into a single node, effectively consolidating its structure for simplified representation:
join_diblobs(self, diblob_fst_id: str, diblob_snd_id: str, join_id: str)diblob_fst_id: str: The ID of the first Diblob that is to be joined.diblob_snd_id: str: The ID of the second Diblob that is to be compressed.join_id: str: The ID of the joined Diblob.
This method join two diblobs which have the sameparent_id:
merge_edges(self, edge_1: Edge, edge_2: Edge)edge_1: Edge: The first edge to be merged.edge_2: Edge: The second edge to be merged.
Merges two compatible edges into one, provided the head of edge_1 is the same as the tail of edge_2, and edge_2 has exactly one incoming and one outgoing edge.
get_multiple_edge_ids(self, *edge_ids : tuple[str])*edge_ids : tuple[str]: The edge IDs for which the operation will be performed.
Returns a list of edge IDs, including every occurrence of the specified edges.
remove_edges(self, *edges: tuple[Edge])*edges: tuple[Edge]: The edges to be removed.
Removes the specified edges from the digraph. This method operates directly on edge objects, not on edge IDs.
connect_nodes(self, *edge_ids: tuple[str])*edge_ids: tuple[str]: Pairs of node IDs for which edges will be created.
Creates edges between pairs of nodes.
remove_nodes(self, *nodes: tuple[Node])*nodes: tuple[Node]: The nodes to be removed.
Removes the specified nodes from the digraph. This method operates directly on node objects, not on node IDs.
add_nodes(self, *node_ids: tuple[str], diblob_id: str=None)*node_ids: tuple[str]: The node IDs to be added.diblob_id: str: The diblob ID where the nodes will be placed. If not provided, nodes are added to the root diblob.
Adds nodes to the digraph. Optionally, a diblob_id can be specified to place the nodes within a specific diblob; otherwise, they are added to the root.
compress_edges(self)- Compresses edges in the digraph by accumulating nodes where both the number of incoming and outgoing nodes is exactly one.
decompress_edges(self)Performs the reverse operation ofcompress_edges.
inject(self, digraph_manager: GraphManager, node_id: str)digraph_manager: DiraphManager: The digraph that will be injected.node_id: The ID of the node that will be replaced by the injected digraph.
Incorporates another GraphManager instance into the current digraph by replacing a specified node with the entire structure of the other digraph:
decouple_edges(self): Transforms a pseudograph into a digraph by decoupling edges.
reverse_edges(self, *edges: tuple[Edge])*edges: tuple[Edge]: The edges to be reversed.
Reverses the direction of the specified edges within the digraph. This operation is performed on the edge objects themselves, rather than their node IDs.sorted(self): Sorts all elements within the digraph structure, ensuring an orderly arrangement of nodes, edges, and possibly diblobs.
GraphManager has also magic methods which enables comfortable work on the structure:
__setitem__: Enables working with structure in new methods:
"""
Note that maintaining the structure is covered by other methods. __set_item__ is used just for methods implementation.
For instance digraph_manager[('A', 'B')] = AB not implies that nodes 'A' and 'B' are connected in structure.
Edge object is just registered by GraphManager.
If you want to create correct edge use connect_nodes method on structure without Edge.
"""
digraph_dict = {"B0": {}}
digraph_manager = DigraphManager(digraph_dict)
A = Node(node_id='A', diblob_id='B0', incoming_nodes=[], outgoing_nodes=[])
B = Node(node_id='B', diblob_id='B0', incoming_nodes=[], outgoing_nodes=[])
B1 = Diblob(children={}, diblob_id='B1', nodes=['A', 'B'], parent_id='B0')
AB = Edge(['A', 'B'])
digraph_manager['A'] = A
digraph_manager['B'] = B
digraph_manager['B1'] = B1
digraph_manager[('A', 'B')] = AB__get_item__: Enables retrieval of an object using its registered ID. This method simplifies accessing nodes, edges, or diblobs directly by their identifiers.__contains__: Checks if a specific ID is registered within the graph. This can be particularly useful for verifying the presence of a node, edge, or diblob without directly accessing the item.__call__: Facilitates operations such as diblob extraction, making the GraphManager more versatile. For instance, the following code snippet demonstrates how to use this method:
digraph_dict = {
"B0": {
"B2": {
"E": [{"B0": ["F"]}, {"B1": ["A"]}],
"C": ["D"],
"D": ["E", {"B0": ["F", "G"]}],
},
"G": [{"B1": ["A"]}, {"B2": ["D"]}],
"F": ["G", {"B1": ["B"]}],
"B1": {
"B": [{"B2": ["C", "D", "E"]}],
"A": ["B", {"B0": ["F"]}],
},
},
}
digraph_manager = DigraphManager(digraph_dict)
tools.display_digraph(digraph_manager('B1'))returns
{
"B1": {
"B": [{"B2": ["C", "D", "E"]}],
"A": ["B", {"B0": ["F"]}],
},
}Please note that outgoing edges are preserved. To remove them, utilize the cut_outgoing_edges function available in the tools module.
The Factory is designed to facilitate the creation of various types of digraphs from a given input digraph. It operates independently from the GraphManager to ensure separation of concerns, as the GraphManager is dedicated to managing its internal graph structure.
Edge digraphs and Bipartite digraphs can only be created from digraphs that consist solely of root diblobs.
generate_edge_digraph(digraph_manager: DigraphManager, reduce_value: int = 0, delimiter: str = '|')digraph_manager: DigraphManager: The DigraphManager instance that serves as the foundation for constructing the edge digraph.reduce_value: int: Specifies the reduction value used as a numerical delimiter.delimiter: str: The delimiter employed for creating node IDs derived from edges.
Enables the creation of an edge digraph, where edges are transformed into nodes:
In the example default delimiter and reduce_value was used. Delimiter add separator between node_ids during node_id creation in edge graph, reduce value enable cutting delimiter (for example if we use generate_edge_digraph second time in the graph on the right in the picture, we get for instance the node with node_id = "A|C|C|B" with default reduce value = 0, but "A|C|B" if reduce_value = 1 is set).
generate_bipartite_digraph(digraph_manager: DigraphManager)digraph_manager: DigraphManager- DigraphManager which is the base for bipartite digraph contruction.
enable bipartite digraph creation:
generate_flow_digraph(digraph_manager: DigraphManager)digraph_manager: DigraphManager- DigraphManager which is the base for flow digraph contruction.
enable flow digraph creation (commonly used in control flow algorithm like the Ford-Fulkerson algorithm):
In order to working with diblob explanation, DFS, DFSA and Dijkstra algorithms are created.
- DFS - Deep first search alghoritm.
- DFSA - modification of the DFS (with the nodes visit time)
- DijkstraAlgorithm - the shortes paths between node and the others.
Algorithm use duck typing. GraphManager instance have to be delivered during creation. Then run methon can be used on selected node_id (in DijkstraAlgorithm cost_function as a dict can be also delivered)
Example of run:
from diblob import DigraphManager
from diblob.algorithms import DFS, DFSA, DijkstraAlgorithm
digraph_dict = {"B0": {"A": ["B", "F"],
"B": ["C", "D", "E"],
"C": ["D"],
"D": ["E", "F", "G"],
"E": ["F", "A"],
"F": ["G", "B"],
"G": ["A", "D"]}}
digraph_manager = DigraphManager(digraph_dict)
dfs = DFS(digraph_manager)
dfsa = DFSA(digraph_manager)
da = DijkstraAlgorithm(digraph_manager)
dfs.run('A')
dfsa.run('B')
da.run('C')Tools for diblob which are used for user friendly printing or cutting nodes in json. Tools deliver following functions:
display_digraph(d: dict, indent=0): Enables printing dict in json format (work as print).cut_outgoing_edges(digraph_manager, diblob_id: str): Cuts outgoing nodes of the Digraph (can be used with GraphManager__call__)sort_outgoing_nodes_in_dict_repr(node_lst: list): Sorts outgoing nodes.
Example:['C', 'B', 'A', {'G1': ['A', 'C', 'B']}] -> ['A', 'B', 'C', {'G1': ['A', 'B', 'C']}]list_groupby: Works like groupby from itertools (no need data to be sorted).
Random digraphs can be generated using generators module. Digraphs which can be generated:
- random cycle
- random directed acyclic graph
- random strongly connected graph (as a sum of cycles)
- random graph as a combination of DAG and the others (injection)
example of usage can be found in notebooks: https://github.com/JakubZelek/Diblob/tree/main/notebooks