# Bipartite matching

Bipartite matching is a classic problem in graph theory.

 Given a bipartite graph, we want to find the maximum matching, which is a set of edges such that no two edges share a common vertex.

 Some real life examples of this problem:

    - Assigning jobs to workers, where each worker can only do one job and each job can only be done by one worker.
    - Assigning courses to students, where each student can only take one course and each course can only be taken by one student.
    - Assigning tasks to machines, where each machine can only do one task and each task can only be done by one machine.
    - Assigning rooms to guests, where each guest can only stay in one room and each room can only be occupied by one guest.

The algorithm we will use to solve this problem is called the Hopcroft-Karp algorithm. It is a very efficient algorithm that runs in O(sqrt(V) * E) time, where V is the number of vertices and E is the number of edges in the graph.

In [None]:
# first how would we represent our bipartite graphs, let's use simply a list of numbers for both sides
# since we know each side has the same number of nodes, each side would have a list of the same length
# for example, if we have 3 nodes on each side, we can represent the graph as [0, 1, 2] and [3, 4, 5]
# then we can represent the edges as a list of tuples, for example [(0, 3), (1, 4), (2, 5)]
# so each edge represents a possible connection between a node on the left side and a node on the right side
# we can also represent the graph as an adjacency matrix, where the rows represent the nodes on the left side - not for now

# so input to solve our bipartite matching problem would be a list of edges, with each edge represented as a tuple
# so this edge would represent one possible connection between a node on the left side and a node on the right side

# let's start by implementing a function that takes a list of edges and returns a list of tuples representing the matching
# the matching is a subset of the edges such that no two edges share a node on the left side or the right side
# so the matching is a list of edges such that no two edges share a node on the left side or the right side

def get_bipartite_matching(edges, left_nodes, right_nodes, verbose=False):
    """
    Edges represent all possible connections between nodes on the left side and nodes on the right side
    Left nodes represent all nodes on the left side
    Right nodes represent all nodes on the right side
    We return a list of edges that represent a matching such that no two edges share a node on the left side or the right side
    """
    # we can use a greedy algorithm to solve the bipartite matching problem
    # we can start by iterating over the edges and adding an edge to the matching if it doesn't share a node with any other edge in the matching
    # we can keep track of the nodes that are already in the matching and add an edge to the matching if it doesn't share a node with any of the nodes in the matching

    matching = []
    left_nodes_in_matching = set()
    right_nodes_in_matching = set()

    # our first greedy algorithm is to iterate over the edges and add an edge to the matching if it doesn't share a node with any other edge in the matching
    for edge in edges:
        left_node, right_node = edge
        if left_node not in left_nodes_in_matching and right_node not in right_nodes_in_matching:
            matching.append(edge)
            left_nodes_in_matching.add(left_node)
            right_nodes_in_matching.add(right_node)

    if verbose:
        print("Initial matching:", matching)
    # now that our greedy approach has given us a matching, we can try to improve it
    #         
    # TODO make iterative improvement to the matching
    # we have some matching, we can try to improve it by iterating over the edges and replacing an edge in the matching with another edge if it improves the matching
    # we can replace an edge in the matching if the new edge doesn't share a node with any other edge in the matching
    # we can keep track of the nodes that are already in the matching and add an edge to the matching if it doesn't share a node with any of the nodes in the matching

    # let's iterate over the edges and try to improve the matching
    
    for edge in edges:
        left_node, right_node = edge
        # we can only improve the matching if the edge is not already in the matching
        if edge in matching:
            continue
        # we assume that nodes that are already matching will not try to improve the matching just because they are already in the matching
        if left_node in left_nodes_in_matching and right_node in right_nodes_in_matching: 
            continue
        # we can improve the matching by replacing an edge in the matching with the current edge
        # at this point we have an edge that is not in the matching and at least one node is not in the matching
        # if both are not in the matching, we can safely add the edge to the matching
        if left_node not in left_nodes_in_matching and right_node not in right_nodes_in_matching:
            matching.append(edge)
            left_nodes_in_matching.add(left_node)
            right_nodes_in_matching.add(right_node)
            continue # easy case, we can add the edge to the matching
        # now the hard case where one of the nodes is in the matching and that means we have to find a replacement
        # we can iterate over the matching and try to find an edge that we can replace with the current edge
        for i, matching_edge in enumerate(matching):
            matching_left_node, matching_right_node = matching_edge
            if left_node in left_nodes_in_matching and matching_left_node == left_node:
                # we can replace the matching edge with the current edge
                matching[i] = edge
                right_nodes_in_matching.remove(matching_right_node)
                right_nodes_in_matching.add(right_node)
                break
            if right_node in right_nodes_in_matching and matching_right_node == right_node:
                # we can replace the matching edge with the current edge
                matching[i] = edge
                left_nodes_in_matching.remove(matching_left_node)
                left_nodes_in_matching.add(left_node)
                break

    if verbose:
        print("Final matching:", matching)
    # TODO is this the best matching we can get? what guarantees do we have that this is the best matching?

    return matching 

# so left side is "Alice", "Bob", "Carol", "Dave"
# right side is "ACME", "Boeing", "Cat", "DB"
# so we have the following edges
# Alice - ACME
# Alice - Boeing
# Bob - Cat
# Carol - ACME
# Carol - Cat
# Dave - DB

edges = [("Alice", "ACME"), ("Alice", "Boeing"), ("Bob", "Cat"), ("Carol", "ACME"), ("Carol", "Cat"), ("Dave", "DB")]
left_nodes = ["Alice", "Bob", "Carol", "Dave"]
right_nodes = ["ACME", "Boeing", "Cat", "DB"]
print(get_bipartite_matching(edges, left_nodes, right_nodes, verbose=True)) # should return [("Alice", "ACME"), ("Bob", "Cat"), ("Dave", "DB")]



Initial matching: [('Alice', 'ACME'), ('Bob', 'Cat'), ('Dave', 'DB')]
[('Alice', 'Boeing'), ('Bob', 'Cat'), ('Dave', 'DB'), ('Carol', 'ACME')]
