# Matcher

In [35]:
#| default_exp matcher

In [36]:
#| hide
from nbdev.showdoc import show_doc
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Overview
In the previous modules, the LHS Parser received an LHS string, describing some graph pattern, and parsed it to an equivalent NetworkX DiGraph. In the following module, we search for **matches** to this pattern in our input graph - That is, find all subgraphs of our input graph, which have the same structure as the pattern in terms of nodes, the edges connecting them and the attributes they all have. 

Each match is basically a mapping from a subset of the input graph nodes to a the pattern nodes (as we can deduce what are the matched edges and attributes accordingly).

The **Matcher** in this module does the following:
* **Searches for matches** in the input graph, according to some LHS pattern graph.
* **Filters matches** based on an explicit boolean function, and / or constraints given to the matcher by the parser to be handled later.
* **Constructs a list of Match objects**, each corresponds to one of the filtered matches we've found. The list will be used in later modules, and will also allow users to view the matches and to use them imperatively.

The final, filtered list of Match objects is returned from this module's main function, **find_matches**.

### Requirements

In [37]:
#| export
from typing import *
from networkx import DiGraph
from networkx.algorithms import isomorphism # check subgraph's isom.
from graph_rewrite.core import NodeName, _create_graph, draw
from graph_rewrite.lhs import lhs_to_graph
from graph_rewrite.match_class import Match, mapping_to_match ,draw_match
from itertools import product
from typing import Tuple, Iterator
from collections import defaultdict

### Find Matches
Given an input graph and a pattern graph, we want to find the list of matches from pattern nodes to input-graph nodes, each constructs a corresponding Match object. 

#### Checking for Attribute Existence and Constant Values
A crucial aspect of finding matches is comparing the attributes of nodes to determine whether they are compatible. Nodes that match based on either the existence of specific attributes or constant attribute values are considered candidates. These candidates are then further refined to identify all valid matches for the given pattern.

In [38]:
#| export
# TODO: after solving all bugs in notebook 1 switch to this code and delete the code that comes afterwards
'''
    def _attributes_match(pattern_attrs: dict, input_attrs: dict) -> bool:
    """
    Check if the input attributes match the pattern attributes.

    This function supports both:
    - Existence checks (ensures that required attributes exist).
    - Constant value checks (ensures that constant values match).

    Args:
        pattern_attrs (dict): The pattern attributes.
        input_attrs (dict): The input attributes.

    Returns:
        bool: True if the input attributes match the pattern attributes, False otherwise.
    """
    for attr_name in pattern_attrs:
        (attr_type,attr_value) = pattern_attrs[attr_name]

        if attr_name not in input_attrs:  # If the attribute does not exist, return False
            return False

        if attr_type is None and attr_value is None:  # If the user only asked to check for existence, continue
            continue

        # TODO: check this part and see there are no missing cases
        if attr_value is not None and input_attrs[attr_name] != attr_value:
            return False
        elif attr_type is not None and not isinstance(input_attrs[attr_name], attr_type): 
            return False

    return True
    '''

def _attributes_match(pattern_attrs: dict, input_attrs: dict) -> bool:
    """
    Check if the input attributes match the pattern attributes.

    This function supports both:
    - Existence checks (ensures that required attributes exist).
    - Constant value checks (ensures that constant values match).

    Args:
        pattern_attrs (dict): The pattern attributes.
        input_attrs (dict): The input attributes.

    Returns:
        bool: True if the input attributes match the pattern attributes, False otherwise.
    """
    

    for attr_name in pattern_attrs:
        if attr_name not in input_attrs:  # If the attribute does not exist, return False
            return False
        
        if pattern_attrs[attr_name] is None or pattern_attrs[attr_name] == (None, None): # #TODO: I added this to bypass the weird behaviour in the lhs_to_graph function, where it doesn't get to the part of the edges attributes - I should fix this later
            continue

        (attr_type,attr_value) = pattern_attrs[attr_name]
        if pattern_attrs[attr_name] is None: # If the attribute exists, but the value is None, continue to the next attribute
            continue
        
        if input_attrs[attr_name] != attr_value:
            return False
        
        if attr_type is None:
            continue

        if not isinstance(input_attrs[attr_name], attr_type):
            return False

    return True

#### Narrow Down Search Space
Using the functions presented thus far, the search for matches might take a lot of time if the graph has a high number of nodes / edges. Nodes which are no real candidate to match any pattern node (do not share attributes with any pattern node) are checked eitherway, which is extermely inefficient. 

Therefore, before we search for matches in our input graph, we will reduce it to only contain the nodes that might match any of the pattern nodes (and their connected edges as well). This might improve the whole matching performance. The following function is used in order to do single nodesly that:

In [39]:
#| export
def _find_input_nodes_candidates(pattern_node: NodeName, pattern: DiGraph, input_graph: DiGraph) -> set[NodeName]:
    """
    Given a pattern node and an input graph, return a set of input graph nodes that:
    - Contain the required attributes of the pattern node, including constant value checks (if specified) and existence checks (if no value is specified / no constant value).
    - Have at least one edge with matching attributes for each edge of the pattern node that has attributes specified.

    Args:
        pattern_node (NodeName): The pattern node.
        pattern (DiGraph): The pattern graph.
        input_graph (DiGraph): The input graph.

    Returns:
        set[NodeName]: A set of input graph nodes that match the required attributes and have at least one matching edge.
    """
    pattern_attrs = pattern.nodes[pattern_node] # {attr_name: (attr_type,attr_value)}

    #check if _id is in the pattern_node_attrs, if so, we will only check the node with the same _id
    if "_id" in pattern_attrs:
        input_node_id = pattern_attrs.pop("_id")[1]
        input_nodes_to_check = [input_node_id]
    else:
        input_nodes_to_check = list(input_graph.nodes)

    # Filter nodes by attributes first
    candidate_nodes = {
        input_node
        for input_node in input_nodes_to_check
        if _attributes_match(pattern_attrs, input_graph.nodes[input_node])
    }

    return candidate_nodes

In [40]:
#| export
# Helper function to check if an input edge matches the pattern edge
def _is_valid_edge_candidate(input_graph: DiGraph, pattern_edge_attrs: dict, 
                   src_candidate: NodeName, dst_candidate: NodeName, 
                   src_pattern_node: NodeName, dst_pattern_node: NodeName) -> bool:
    """
    Check if an edge between two input nodes matches the pattern edge and attributes.

    Args:
        input_graph (DiGraph): The input graph.
        pattern_edge_attrs (dict): Attributes of the pattern edge to match.
        src_candidate (NodeName): The source candidate node in the input graph.
        dst_candidate (NodeName): The destination candidate node in the input graph.
        src_pattern_node (NodeName): The source pattern node.
        dst_pattern_node (NodeName): The destination pattern node.

    Returns:
        bool: True if the input edge is valid, False otherwise.
    """
    if (src_candidate, dst_candidate) not in input_graph.edges:
        return False

    input_edge_attrs = input_graph.get_edge_data(src_candidate, dst_candidate, default={})
    if not _attributes_match(pattern_edge_attrs, input_edge_attrs):
        return False

    # Special case - self loops in pattern graph: If the source and destination are the same node in the pattern,
    # ensure the source and destination candidates are also the same in the input graph
    if src_pattern_node == dst_pattern_node and src_candidate != dst_candidate:
        return False

    return True

#### Find Pattern Based Matches
NetworkX provides an out-of-the-box isomorphism matcher, which compares the structure of two graphs and tells whether they are isomorphic (have the same nodes and edges). We utilize this isomorphism matcher by beginning our matching process with structural matches, after filtering out candidates that do not match based on existance of attributes or constant values of attributes:

In [41]:
#| export
def _find_pattern_based_matches(graph: DiGraph, pattern: DiGraph) -> Iterator[Tuple[DiGraph, Dict[NodeName, NodeName]]]:
    """
    Find all subgraphs in the input graph that match the given pattern graph based on both structure (nodes and edges)
    and attributes (existence of attributes or constant value checks).
    
    Args:
        graph (DiGraph): The graph to search for matches.
        pattern (DiGraph): The pattern graph representing the structure and attributes to match.
    
    Yields:
        Iterator[Tuple[DiGraph, Dict[NodeName, NodeName]]]: Tuples of (subgraph, mapping),
        where subgraph is the matched subgraph, and mapping is a dictionary mapping nodes in the
        subgraph to nodes in the pattern.
    """
    map_pattern_nodes_to_candidates = {pattern_node: _find_input_nodes_candidates(pattern_node, pattern, graph)
                                       for pattern_node in pattern.nodes}
    
    pattern_nodes = list(map_pattern_nodes_to_candidates.keys())  # List of pattern nodes
    candidate_sets = list(map_pattern_nodes_to_candidates.values())  # List of candidate sets, matches the indexes of pattern_nodes
    
    # All possible assignments of input nodes to pattern nodes, where each input node is assigned to exactly one pattern node
    # that matches the pattern node's attributes (existence checks and constant value checks)
    assignments = [{pattern_node: input_node for pattern_node, input_node in zip(pattern_nodes, assignment)}
                   for assignment in product(*candidate_sets)]
    
    # Filter assignments that map the same input node to multiple pattern nodes or have mismatching edges
    valid_assignments = []
    for assignment in assignments:
        # Check if any input node is mapped to multiple pattern nodes
        input_node_counts = defaultdict(int) # automatically initializes to 0
        for input_node in assignment.values():
            input_node_counts[input_node] += 1
            if input_node_counts[input_node] > 1:
                break  # Skip assignment if duplicate nodes found
        else:
            # Check if edges between input nodes match the edges in the pattern 
            # (existence checks, and attributes check based on attribute existence and constant values)
            for (src_pattern_node, dst_pattern_node) in pattern.edges:
                src_input_node = assignment[src_pattern_node]
                dst_input_node = assignment[dst_pattern_node]
                pattern_edge_attrs = pattern.edges[src_pattern_node, dst_pattern_node]
                if not _is_valid_edge_candidate(graph, pattern_edge_attrs,
                                                src_input_node, dst_input_node, src_pattern_node, dst_pattern_node):
                    break
            else:
                valid_assignments.append(assignment)

    # We take the valid assignments and create a subgraph from them, using the matching nodes and edges from the input graph
    for assignment in valid_assignments:
        subgraph = DiGraph()
        subgraph.add_nodes_from(set(assignment.values()))
        for src_pattern_node, dst_pattern_node in pattern.edges:
            src_input_node = assignment[src_pattern_node]
            dst_input_node = assignment[dst_pattern_node]
            subgraph.add_edge(src_input_node, dst_input_node)

        # Validate subgraph for isomorphism
        if isomorphism.is_isomorphic(subgraph, pattern, node_match=_attributes_match, edge_match=_attributes_match):
            yield subgraph, assignment


#### Filtering Matches
The only thing we ignored up until now is attribute values. As metioned above, the LHS Parser does not include required attribute values in the pattern graph. Instead, it constructs a boolean function which receives a Match object and checks whether the match it represents has the required attribute values (if there are any). 

This boolean function is further extended by the user of the library, which can pass as parameter a function of the same format, which filteres a list of Match objects based on any condition it wishes to apply. The LHS Parser, in addition to the pattern graph, provides the extended filtering function, that mixes both the user and the parser constraints which were not handled by the matcher so far.

Later in this module, we will use the extended function to filter the list of Match objects we get from the structural and attribute-existence-based matchers. The signature of that function will be as follows:

In [42]:
#| export
FilterFunc = Callable[[Match], bool]

#### Putting It All Togehter
Given our ability to find matches (both structural and in terms of attribute existence) between two graphs, as well as filtering matches according to desired conditions and constraints, we can finally find complete matches of the pattern in our input graph. 

We define one last auxiliary function, which removes duplicated matches based on their mappings:

In [43]:
#| export
def _filter_duplicated_matches(matches: list[Match]) -> Iterator[Match]:
    """Remove duplicates from a list of Matches, based on their mappings. Return an iterator of the matches without duplications.

    Args:
        matches (list[Match]): list of Match objects

    Yields:
        Iterator[list[Match]]: Iterator of the matches without duplications.
    """

    # We can't use a set directly because Match objects are not hashable. 
    # This is why we use a list of matche's mappings to check for duplicates.
    mappings = []
    for match in matches:
        if match.mapping not in mappings:
            mappings.append(match.mapping)
            yield match

In [44]:
#| export
def _find_intersecting_pattern_nodes(single_nodes_pattern: DiGraph, collection_pattern: DiGraph) -> list:
    """
    Find the intersecting pattern nodes between the single nodes match pattern and the collection pattern.

    The intersecting pattern nodes are those that appear in both the single nodes match pattern 
    (i.e., pattern nodes that aim to match a single, unique input node) and the collection pattern 
    (i.e., pattern nodes that aim to match multiple input nodes).

    Args:
        single_nodes_pattern (DiGraph): The pattern graph representing nodes that match single nodesly one input node.
        collection_pattern (DiGraph): The pattern graph representing nodes that match multiple input nodes.

    Returns:
        list: A list of pattern nodes that are present in both the single nodes match pattern and the collection pattern.
    """

    single_nodes_pattern_nodes = set(single_nodes_pattern.nodes)
    collection_pattern_nodes = set(collection_pattern.nodes)
    
    return list(single_nodes_pattern_nodes.intersection(collection_pattern_nodes))

In [45]:
#| export
def _add_collections_to_single_nodes_matches(input_graph: DiGraph, single_match_pattern: DiGraph ,collection_pattern: DiGraph, 
                                      single_nodes_matches: List[Dict[NodeName, Set[NodeName]]], intersecting_pattern_nodes: List[NodeName],
                                      condition: FilterFunc = lambda match : True, filter: bool = True, warn_on_collisions: bool = True
                                      )-> list[Dict[NodeName, List[NodeName]]]:
    """
    Add collection matches to the existing single nodes matches by finding subgraph matches for collection pattern nodes
    and merging them with the given single nodes match mapping.

    This function finds matches in the input graph that satisfy both the single nodes match pattern (pattern nodes 
    that aim to match single nodesly one input node) and the collection pattern (pattern nodes that aim to match 
    multiple input nodes).

    Args:
        input_graph (DiGraph): The input graph.
        single_match_pattern (DiGraph): The pattern graph representing nodes that match single nodesly one input node.
        collection_pattern (DiGraph): The pattern graph representing nodes that match multiple input nodes.
        single_nodes_matches (List[Dict[NodeName, List[NodeName]]]): A list of mappings for single nodes matches, in list semantics.
        intersecting_pattern_nodes (List[NodeName]): A list of pattern nodes that are present in both the single nodes match pattern and the collection pattern.

    Yields:
        list[Dict[NodeName, List[NodeName]]]: A list of mappings that include the single nodes match mapping and the collection matches.
    """

    input_graph_copy = input_graph.copy()

    # Enrich the single nodes match mapping with the corresponding collection matches.
    for mapping in list(single_nodes_matches):  # We need to iterate over a copy of the list to avoid modifying it while iterating
        # Lock intersecting pattern nodes to their corresponding input node in the single nodes match
        collection_pattern_copy = collection_pattern.copy()
        for intersecting_pattern_node in intersecting_pattern_nodes:
            intersecting_input_node = list(mapping[intersecting_pattern_node])[0]  # When we get here, we know that the single nodes match has only one node for each intersecting pattern node - we can take the first element
            collection_pattern_copy.nodes[intersecting_pattern_node]['_id'] = (None,intersecting_input_node)  # Lock the intersecting pattern node to the corresponding input node

        # Find collection matches using the locked pattern
        for _, collection_mapping in _find_pattern_based_matches(input_graph_copy, collection_pattern_copy):
            # Filter matches based on the condition
            match = mapping_to_match(input_graph, single_match_pattern, collection_pattern, collection_mapping, warn_on_collisions)
            if (filter and condition(match)) or not filter:
                # Add the collection nodes mapping (not including the intersecting pattern nodes) to the single nodes match mapping
                for collection_pattern_node, matched_input_nodes in collection_mapping.items():
                    if collection_pattern_node not in intersecting_pattern_nodes:
                        mapping.setdefault(collection_pattern_node, set()).add(matched_input_nodes)

    # Now all mappings are enriched with the collection matches, we can convert the mappings to matches, and filter out duplicates
    matches = [mapping_to_match(input_graph, single_match_pattern, collection_pattern, mapping, warn_on_collisions) for mapping in single_nodes_matches]
    return list(_filter_duplicated_matches(matches))

We are now combining everything we saw in order to find the matches of a pattern in our input graph. The matches are returned as a (filtered) list of Match objects:

In [46]:
#| export
def find_matches(input_graph: DiGraph, single_match_pattern: DiGraph, collections_pattern: DiGraph = None, 
                 condition: FilterFunc = lambda match : True, filter: bool = True, warn_on_collisions: bool = True
                 ) -> Iterator[Match]:
    """
    Find all matches of a pattern graph in an input graph, satisfying a certain condition.

    This function identifies subgraphs of the input graph that match the single match pattern 
    and the collections pattern (if provided) based on structure, attributes, and additional conditions 
    specified by the user.

    Args:
        input_graph (DiGraph): The input graph.
        single_match_pattern (DiGraph): The pattern graph representing nodes that match single nodesly one input node.
        collections_pattern (DiGraph, optional): The pattern graph representing nodes that match multiple input nodes. Defaults to None.
        condition (FilterFunc, optional): A function that filters matches based on additional conditions. Defaults to lambda match : True.
        filter (bool, optional): Whether to filter out duplicate matches. Defaults to True.
        warn_on_collisions (bool, optional): Whether to warn on collisions. Defaults to True.

    Yields:
        Iterator[Match]: Iterator of Match objects, each representing a match of the pattern in the input graph.
    """
    # Find all single nodes matches (single node mapping) based on structure and attributes.
    # We already enforce set semantics for the mapping, and we don't need to check for duplicates - _find_pattern_based_matches returns unique mappings.
    single_node_mappings = [{pattern_node: {input_node} for pattern_node, input_node in mapping.items()} 
                     for _, mapping in _find_pattern_based_matches(input_graph, single_match_pattern)]
    
    # Tuple of (mapping, match) for each single node mapping
    single_node_mapping_with_matches = [(mapping,(mapping_to_match(input_graph, single_match_pattern, collections_pattern, mapping, warn_on_collisions)))
        for mapping in single_node_mappings]
    
    # Tuple of (mapping, match) for each single node match that satisfies the condition
    filtered_single_node_mapping_with_matches = [(mapping,match) for mapping,match in single_node_mapping_with_matches if (filter and condition(match)) or not filter]

    # If a collections pattern is not None, enrich single matches by adding matching collections.
    if collections_pattern:
        intersecting_pattern_nodes = _find_intersecting_pattern_nodes(single_match_pattern, collections_pattern)
        # all single node mappings after filtering based on the condition
        filtered_single_node_mappings = [mapping for mapping,_ in filtered_single_node_mapping_with_matches]
        # Add collections to single nodes matches
        filtered_matches = _add_collections_to_single_nodes_matches(input_graph, single_match_pattern, collections_pattern, 
                                                                             filtered_single_node_mappings, intersecting_pattern_nodes, 
                                                                             condition, filter, warn_on_collisions)
    else:
        # If a collections pattern is None, the matches are the same as the filtered single node matches
        filtered_matches = [match for _, match in filtered_single_node_mapping_with_matches]

    # Moved it to both single and collection matches to reduce steps
    # matches = [(mapping_to_match(input_graph, single_match_pattern, collections_pattern, mapping, warn_on_collisions))
    #     for mapping in mappings]        
    # filtered_matches = [match for match in matches if condition(match)]

    # remove anonymous nodes and edges - I had to split this from the previous loop because it was causing a bug (it inserted none matches)
    for match in filtered_matches:
        match.remove_anonymous_nodes_and_edges()

    # Remove any duplicate matches
    yield from _filter_duplicated_matches(filtered_matches)

### Tests

#### Test Utils

In [47]:
def _assert_match(input_graph: DiGraph, LHS: str, expected: list[dict], condition=lambda x: True, plot=True):
    """Match the pattern in the input graph, and validate that the list of matches
    is equal to the expected list of matches. Also allows plotting the first match instance.

    Args:
        input_graph (DiGraph): The input graph where matches are searched.
        LHS (str): The pattern string to be converted into a pattern graph.
        expected (list[dict]): The expected matches (as mappings from pattern nodes to input graph nodes).
        condition (callable, optional): A function that receives a Match object and checks if a condition holds. Defaults to True.
        plot (bool, optional): If True, plots the first match instance on the input graph. Defaults to True.
    """
    pattern, collection_pattern = lhs_to_graph(LHS)
    matches = list(find_matches(input_graph, pattern, collection_pattern, condition))
        
    # Ensure that the number of matches and their mappings are as expected
    assert all([match.mapping in expected for match in matches]) and len(matches) == len(expected)
    
    # STAV: This is the original code, but for debugging I want to see all the matches, so I commented it out
    # Plot the first match if requested and there are any matches
    # if plot and len(matches) > 0:
    #     match = matches[0]
    #     mapping = match.mapping
         
    #     print(f"Plotting the match: {mapping}")
    #     draw_match(input_graph, match)

    # Plot all matches
    if plot and len(matches) > 0:
        number_of_matches = len(matches)
        print(f"Plotting {number_of_matches} matches:")
        for match in matches:
            mapping = match.mapping
            print(f"Plotting the match: {mapping}")
            draw_match(input_graph, match)

#### Basic Test Cases

We begin with simple cases, which do not take attributes into account at all. Consider the following quite-generic input graph (in which we will try to find matches for different patterns):

In [48]:
input_graph = _create_graph(
    ['A','B','C','D'], 
    [
        ('A', 'B'),
        ('A', 'C'),
        ('A', 'A'),
        ('C', 'C'),
    ]
)
draw(input_graph)

In the following tests, we try to match different patterns and make sure that the matcher found all of the possible matches. The first match in the list will be highlighted in the input graph (nodes and edges are colored in red), and printed above the plot:

In [49]:
# Find all pairs of nodes 1, 2 where 1 has a self loop 
_assert_match(input_graph, '1->1, 2', [{'1': {'A'}, '2': {'B'}}, {'1': {'A'}, '2': {'C'}}, {'1': {'A'}, '2': {'D'}}
                                        ,{'1': {'C'}, '2': {'B'}}, {'1': {'C'}, '2': {'A'}}, {'1': {'C'}, '2': {'D'}}])

Plotting 6 matches:
Plotting the match: {'1': {'A'}, '2': {'D'}}


Plotting the match: {'1': {'A'}, '2': {'B'}}


Plotting the match: {'1': {'A'}, '2': {'C'}}


Plotting the match: {'1': {'C'}, '2': {'D'}}


Plotting the match: {'1': {'C'}, '2': {'A'}}


Plotting the match: {'1': {'C'}, '2': {'B'}}


In [50]:
_assert_match(input_graph, '1->1, 1->2', [{'1': {'A'}, '2': {'B'}}, {'1': {'A'}, '2': {'C'}}]) 

Plotting 2 matches:
Plotting the match: {'1': {'A'}, '2': {'B'}}


Plotting the match: {'1': {'A'}, '2': {'C'}}


In [51]:
_assert_match(input_graph, '1->1, 2->1', [{'1': {'C'}, '2': {'A'}}])

Plotting 1 matches:
Plotting the match: {'1': {'C'}, '2': {'A'}}


In [52]:
# Find a circle in the graph + self loop. There is no such match in the graph
_assert_match(input_graph, '1->1, 2->1->2', []) 

In [53]:
# Find five different nodes (the different pattern names enforce it).
# There are only 4 nodes in the input graph, and so there are no matches.
_assert_match(input_graph, '1,2,3,4,5', []) 

#### Advanced Test Cases

Now, we want to check more advanced features of both the parser and the matcher:
* Checking for attributes (existance only)
* Checking for attributes (match the values as well, using the parser-generated condition function)
* Add user conditions
* Anonymous nodes

We will work with a new input graph, which shows the connections between students and the courses they took throughout their degree:
* Each node in the graph is associated with either a student or a course (and has an attribute "type" to denote which is which).
* A student is defined by his/her name. Some students (not all of them) also metion their faculty.
* A course is defined by its name. Some courses mention their associated number of units.
* An edge from a student to a course denotes that the student took the course. It mentions the semester in which the student took the course.
* An edge from a course to another course denotes that the first must be taken before the latter.

The graph looks like this:

In [54]:
input_graph = _create_graph(
    [
        # Names
        ('John', {'type': 'student', 'faculty': 'Biology'}),
        ('Lucy', {'type': 'student', 'fauclty': 'CS'}),
        ('Amy', {'type': 'student'}),
        # Courses
        ('Algo', {'type': 'course', 'units': 3}),
        ('AI', {'type': 'course', 'units': 3}),
        ('NLP', {'type': 'course', 'units': 5}),
        ('DB', {'type': 'course'}),
        ('Bio', {'type': 'course'})
    ], 
    [
        # Students take
        ('John', 'Bio', {'sem': 3}),
        ('Lucy', 'Algo', {'sem': 5}),
        ('Lucy', 'AI', {'sem': 7}),
        ('Amy', 'Algo', {'sem': 5}),
        # KDAM
        ('Algo', 'AI'),
        ('AI', 'NLP'),
    ]
)
draw(input_graph)

We will now run some useful queries by matching patterns in the graph:

In [55]:
# Find all nodes that have attribute "type"  
_assert_match(input_graph, 's[type]', [{'s': {'AI'}}, {'s': {'Amy'}}, {'s': {'NLP'}}, {'s': {'John'}}, {'s': {'Lucy'}}, 
                                        {'s': {'Algo'}}, {'s': {'Bio'}}, {'s': {'DB'}}])

Plotting 8 matches:
Plotting the match: {'s': {'Bio'}}


Plotting the match: {'s': {'John'}}


Plotting the match: {'s': {'AI'}}


Plotting the match: {'s': {'Lucy'}}


Plotting the match: {'s': {'Amy'}}


Plotting the match: {'s': {'Algo'}}


Plotting the match: {'s': {'DB'}}


Plotting the match: {'s': {'NLP'}}


In [56]:
# Find all students
_assert_match(input_graph, 's[type="student"]', [{'s': {'John'}}, {'s': {'Lucy'}}, {'s': {'Amy'}}])

Plotting 3 matches:
Plotting the match: {'s': {'John'}}


Plotting the match: {'s': {'Lucy'}}


Plotting the match: {'s': {'Amy'}}


In [57]:
# Find all courses
_assert_match(input_graph, 'c[type="course"]', [{'c': course} for course in [{'DB'},{'NLP'},{'AI'},{'Algo'},{'Bio'}]])

Plotting 5 matches:
Plotting the match: {'c': {'AI'}}


Plotting the match: {'c': {'DB'}}


Plotting the match: {'c': {'Bio'}}


Plotting the match: {'c': {'Algo'}}


Plotting the match: {'c': {'NLP'}}


In [58]:
# Find all students that took some course (all of them)
_assert_match(input_graph, 's[type="student"]->_[type="course"]', [{'s': {'Amy'}}, {'s': {'John'}}, {'s': {'Lucy'}}])

Plotting 3 matches:
Plotting the match: {'s': {'John'}}


Plotting the match: {'s': {'Lucy'}}


Plotting the match: {'s': {'Amy'}}


In [59]:
# Find all students that took some 3-units course
_assert_match(input_graph, 's[type="student"]->_[type="course", units=3]', [{'s': {'Amy'}}, {'s': {'Lucy'}}])

Plotting 2 matches:
Plotting the match: {'s': {'Lucy'}}


Plotting the match: {'s': {'Amy'}}


In [60]:
# Find all students that took some 3-units course, and the associated courses
_assert_match(input_graph, 's[type="student"]->c[type="course", units=3]', [
    {'s': {'Amy'}, 'c': {'Algo'}}, {'s': {'Lucy'}, 'c': {'Algo'}}, {'s': {'Lucy'}, 'c': {'AI'}}
])

Plotting 3 matches:
Plotting the match: {'s': {'Lucy'}, 'c': {'AI'}}


Plotting the match: {'s': {'Lucy'}, 'c': {'Algo'}}


Plotting the match: {'s': {'Amy'}, 'c': {'Algo'}}


In [61]:
# Find all students which took two courses (and the courses) 
_assert_match(input_graph, 's[type="student"]->c1[type="course"], s->c2[type="course"]', [ 
    {'s': {'Lucy'}, 'c1': {'AI'}, 'c2': {'Algo'}},
    {'s': {'Lucy'}, 'c1': {'Algo'}, 'c2': {'AI'}}
])

Plotting 2 matches:
Plotting the match: {'s': {'Lucy'}, 'c1': {'AI'}, 'c2': {'Algo'}}


Plotting the match: {'s': {'Lucy'}, 'c1': {'Algo'}, 'c2': {'AI'}}


In [62]:
# Find all tripltes c1, c2, c3 of courses such that c1 is a prerequisite of c2, and the same for c2 and c3
_assert_match(input_graph, 'c1[type="course"]->c2[type="course"]->c3[type="course"]', [
    {'c1': {'Algo'}, 'c2': {'AI'}, 'c3': {'NLP'}}
])

Plotting 1 matches:
Plotting the match: {'c1': {'Algo'}, 'c2': {'AI'}, 'c3': {'NLP'}}


In [63]:
# Find all students that took a course in their 7th semester

_assert_match(input_graph, 's[type="student"]-[sem=7]->_[type="course"]', [ {'s': {'Lucy'}}])


Plotting 1 matches:
Plotting the match: {'s': {'Lucy'}}


In [64]:
# Find all students that took a course before their 5th semester (use user-defined condition)
_assert_match(input_graph, 's[type="student"]-[sem]->c[type="course"]', [{'s': {'John'}, 'c': {'Bio'}}], 
              condition=lambda match: match['s->c']['sem'] < 5)

Plotting 1 matches:
Plotting the match: {'s': {'John'}, 'c': {'Bio'}}


#### POC For Large Graphs

In [65]:
# POC: High number of nodes, solved with attribute filtering
num_nodes = 100000

input_graph = _create_graph(
    [n for n in range(num_nodes)] + [(num_nodes+1, {'attr': 15}), (num_nodes+2, {'attr': 15})], 
    [
        (num_nodes+1, num_nodes+2),
        (2,4),
        (3,1)
    ]
)

_assert_match(input_graph, 'X[attr]->Y[attr]', [{'X': {num_nodes+1}, 'Y': {num_nodes+2}}], plot=False)

# Export

In [66]:
#|hide
import nbdev; nbdev.nbdev_export()
     

Collections Feature - Matcher Tests

In [67]:
input_graph = _create_graph(
    [
        # Names
        ('John', {'type': 'student', 'faculty': 'Biology'}),
        ('Lucy', {'type': 'student', 'fauclty': 'CS'}),
        ('Amy', {'type': 'student'}),
        # Courses
        ('Algo', {'type': 'course', 'units': 3}),
        ('AI', {'type': 'course', 'units': 3}),
        ('NLP', {'type': 'course', 'units': 5}),
        ('DB', {'type': 'course'}),
        ('Bio', {'type': 'course'})
    ], 
    [
        # Students take
        ('John', 'Bio', {'sem': 3}),
        ('Lucy', 'Algo', {'sem': 5}),
        ('Lucy', 'AI', {'sem': 7}),
        ('Amy', 'Algo', {'sem': 5}),
        # KDAM
        ('Algo', 'AI'),
        ('AI', 'NLP'),
    ]
)
draw(input_graph)

In [68]:
# For each course, find the students that took it
_assert_match(input_graph, 'c[type="course"];s[type="student"]->c', [{'c': {'AI'}, 's': {'Lucy'}},{'c': {'Bio'}, 's': {'John'}},{'c': {'Algo'}, 's': {'Amy', 'Lucy'}}, {'c': {'DB'}},{'c': {'NLP'}}])

Input node o is mapped to multiple pattern nodes: {'s', 'c'}


Input node A is mapped to multiple pattern nodes: {'s', 'c'}


Plotting 5 matches:
Plotting the match: {'c': {'AI'}, 's': {'Lucy'}}


Plotting the match: {'c': {'DB'}}


Plotting the match: {'c': {'Bio'}, 's': {'John'}}


Plotting the match: {'c': {'Algo'}, 's': {'Lucy', 'Amy'}}


Plotting the match: {'c': {'NLP'}}
