# NAC coloring search

In this notebook we provide utils to run benchmarks, analyze results and experiment with our code.
As the package is still in development, not all parts must be user friendly, still we try to provide good enough abstraction.

First we define some utility functions for matplotlib,
then we provide utility functions for loading and storing benchmarks results,
after we provide a framework for loading graph classes, defining benchmarks and running them.
Lastly, we provide tools for quick results analyzations.

For some advanced functionality in datasets and running pytest it is necesarry to PyRigi v0.3.0.
Either install it using `pip install pyrigi==0.3.0` or clone it into the project root directory from https://github.com/PyRigi/PyRigi/tree/0.3.0.
The `nac` package itself is completely independent.

In [None]:
from typing import *
from dataclasses import dataclass
from collections import defaultdict, deque
import random
import importlib
from random import Random
from enum import Enum

import matplotlib.pyplot as plt
import matplotlib_inline.backend_inline as backend_inline
import matplotlib.backends
from matplotlib.backends import backend_agg
from matplotlib.figure import Figure
from matplotlib.ticker import MaxNLocator

import numpy as np
import pandas as pd
import networkx as nx
import os
import time
import datetime
import signal
import itertools
import base64

from tqdm import tqdm

import benchmarks
from benchmarks import dataset
import benchmarks.notebook_utils
from benchmarks.notebook_utils import *
importlib.reload(benchmarks)
importlib.reload(dataset)
importlib.reload(benchmarks.notebook_utils)

seed=42

### Benchmarks directory

You can either choose to use our precomputed results or run the benchmarks yourself.
The algorithms take usually tens or hunderes of miliseconds to run,
but there is plenty of graphs and strategies combinations, so times add up.

In [None]:
OUTPUT_DIR_PRECOMPUTED = os.path.join("benchmarks", "precomputed")
OUTPUT_DIR_LOCAL = os.path.join("benchmarks", "local")

benchmarks.notebook_utils.OUTPUT_DIR = OUTPUT_DIR_PRECOMPUTED
os.makedirs(benchmarks.notebook_utils.OUTPUT_DIR, exist_ok=True)

In [None]:
# https://jwalton.info/Embed-Publication-Matplotlib-Latex/
import matplotlib.backends.backend_pgf


def fig_size(
        width: float = 398.33858,
        fraction: float = 2,
        subplots: Tuple[int,int] = (1, 1),
    ):
    """Set figure dimensions to avoid scaling in LaTeX.

    Parameters
    ----------
    width: float or string
            Document width in points, or string of predined document type
    fraction: float, optional
            Fraction of the width which you wish the figure to occupy
    subplots: array-like, optional
            The number of rows and columns of subplots.
    Returns
    -------
    fig_dim: tuple
            Dimensions of figure in inches
    """
    # Width of figure (in pts)
    fig_width_pt = width * fraction
    # Convert from pt to inches
    inches_per_pt = 1 / 72.27

    # Golden ratio to set aesthetic figure height
    # https://disq.us/p/2940ij3
    golden_ratio = (5**.5 - 1) / 2

    # Figure width in inches
    fig_width_in = fig_width_pt * inches_per_pt
    # Figure height in inches
    fig_height_in = fig_width_in * golden_ratio * (subplots[0] / subplots[1])

    return (fig_width_in, fig_height_in)


@copy_doc(plt.figure)
def figure(num: Any = 1, *args, **kwargs) -> Figure:
    """Creates a figure that is independent on the global plt state"""
    fig = Figure(*args, **kwargs)
    def show():
        # manager = backend_agg.new_figure_manager_given_figure(num, fig)
        manager = matplotlib.backends.backend_pgf.new_figure_manager_given_figure(num, fig)
        display(
            manager.canvas.figure,
            metadata=backend_inline._fetch_figure_metadata(manager.canvas.figure),
        )
        manager.destroy()
    fig.show = show
    return fig

# import seaborn as sns
# sns.set_style("whitegrid")
# sns.set_theme("paper")
# matplotlib.style.use("ggplot")
plt.rcParams.update(
    {
        "text.usetex": True,
        # "font.size": 6,
        # "savefig.dpi": 600,
        "legend.fontsize": 9,
        # "figure.titlesize": 14,
        # "axes.labelsize": 6,
        # "xtick.labelsize": 6,
        # "ytick.labelsize": 6
    }
)

In [None]:
def export_figure(
    fig: Figure,
    dataset: str,
    mode: Literal["first", "all"],
    groupped_by: Literal["vertex", "monochromatic"],
    metric: Literal["runtime", "checks"],
    dir: str = "figures",
) -> None:
    os.makedirs(dir, exist_ok=True)
    fig.savefig(os.path.join(dir, f'graph_export_{dataset}_{mode}_{groupped_by}_{metric}.pgf'), format='pgf', bbox_inches='tight')

def export_standard_figure_list(
    dataset: str,
    figs: Sequence[Figure],
    dir: str = "figures",
) -> None:
    configs = (
        ("first", "vertex",        "runtime"),
        ("first", "monochromatic", "runtime"),
        ("first", "vertex",        "chekcs"),
        ("first", "monochromatic", "chekcs"),
        ("all",   "vertex",        "runtime"),
        ("all",   "monochromatic", "runtime"),
        ("all",   "vertex",        "chekcs"),
        ("all",   "monochromatic", "chekcs"),
    )
    for fig, (mode, groupped_by, metric) in tqdm(zip(figs, configs)):
        export_figure(fig, dataset, mode, groupped_by, metric, dir)

# Analytics

In this section we provide a framework for plotting results of the previous benchmarks.

**All the chars plotted bellow in this section are created from runs with more than one monochromatic classes.**
If that is the case, the results can be obtained immediately as the answer is trivial.
Therefore, we filter them out.

The first group of graphs show the time required to find
a first/all NAC coloring based on the number of vertices or the number of monochromatic classes.
In one row you can see mean and median plots with lines for each strategy.
Graphs show mean and median, but it is not hard to add additional aggregation function to the framework.

In [None]:
df_analytics = load_records()

df_analytics = df_analytics.query("nac_any_finished == True")
df_analytics["split_latex"] = df_analytics["split"].str.replace("_", r"\_")
df_analytics["merging_latex"] = df_analytics["merging"].str.replace("_", r"\_")
df_analytics = df_analytics.assign(split_merging=lambda x: x["split_latex"] + r" \& " + x["merging_latex"])
df_analytics = df_analytics.assign(split_merging=lambda x: (x["split_latex"] + r" \& " + x["merging_latex"]).str.replace(r"naive-cycles \& naive-cycles", "naive cycles"))

df_analytics_triangles = df_analytics.query("triangle_components_no > 1")
df_analytics = df_analytics.query("monochromatic_classes_no > 1")

# display(df_analytics.info())
print("Records:", df_analytics.shape[0], "graphs:", df_analytics["graph"].nunique())
display(df_analytics.columns)
display(list(df_analytics["dataset"].unique()))
display(list(df_analytics["relabel"].unique()))
display(list(df_analytics["split"].unique()))
display(list(df_analytics["merging"].unique()))

In [None]:
def _group_and_plot(
    df: pd.DataFrame,
    with_log: bool,
    axs: List[plt.Axes],
    x_column: Literal["vertex_no", "monochromatic_classes_no"],
    based_on: Literal["relabel", "split", "merging"],
    value_columns: List[Literal["nac_first_mean_time", "nac_all_mean_time"]],
):
    aggregations = ["mean", "median", "3rd quartile"]
    df = df.loc[:, [x_column, based_on, *value_columns]]
    groupped = df.groupby([x_column, based_on])

    for ax, aggregation in zip(axs, aggregations):
        match aggregation:
            case "mean":
                aggregated = groupped.mean()
            case "median":
                aggregated = groupped.median()
            case "3rd quartile":
                aggregated = groupped.quantile(.75)

        aggregated = aggregated.reorder_levels([based_on, x_column], axis=0)

        for name in aggregated.index.get_level_values(based_on).unique():
            data = aggregated.loc[name]
            for value_column in value_columns:
                title = ",".join([name, value_column]) if len(value_columns) > 1 else name
                ax.plot(data.index, data[value_column], label=title)

        rename_based_on = {
            "vertex_no": "Vertices",
            "triangle_components_no": "Triangle components",
            "monochromatic_classes_no": "Monochromatic classes",
        }

        # ax.set_title(f"{rename_based_on[x_column]} {based_on} ({aggregation})")
        # ax.set_title(f"{rename_based_on[x_column]} ({aggregation})")
        ax.set_title(f"{aggregation.capitalize()}")
        if with_log:
            ax.set_yscale("log")
        ax.xaxis.set_major_locator(MaxNLocator(integer=True))
        ax.set_xlabel(rename_based_on[x_column])
        ax.legend(loc='upper left')

def plot_frame(
    title: str,
    with_log: bool,
    df: pd.DataFrame,
    ops_value_columns_sets = [
        [ "nac_first_mean_time", ],
        [ "nac_first_check_cycle_mask", ],
        [ "nac_all_mean_time", ],
        [ "nac_all_check_cycle_mask", ],
    ],
    ops_x_column = ["vertex_no", "monochromatic_classes_no",],
    ops_based_on = [
        #  "relabel",
        # "split",
        # "merging",
        "split_merging",
    ],
    ops_aggregation = ["mean", "median",], #  "3rd quartile",
) -> List[Figure]:
    print(f"Plotting {df.shape[0]} records...")
    figs = []

    title_rename = {
        "nac_first_mean_time": "First NAC-coloring, Runtime",
        "nac_first_check_cycle_mask": "First NAC-coloring, Checks number",
        "nac_all_mean_time": "All NAC-colorings, Runtime",
        "nac_all_check_cycle_mask": "All NAC-colorings, Checks number",
    }

    for value_columns in ops_value_columns_sets:
        local_df = df[(df[value_columns] != 0).all(axis=1)]
        if local_df.shape[0] == 0:
            continue

        for x_column in ops_x_column:

            nrows = len(ops_based_on)
            ncols = len(ops_aggregation)
            fig = figure(
                nrows * ncols,
                figsize=fig_size(subplots=(nrows, ncols)),
                layout='constrained',
            )
            title_detail = " | ".join(title_rename[value_column] for value_column in value_columns)
            # fig.suptitle(f"{title} ({title_detail})", fontsize=20)
            fig.suptitle(f"{title} ({title_detail})")
            figs.append(fig)
            row = 0

            for based_on in ops_based_on:
                axs = [
                    fig.add_subplot(nrows, ncols, i+ncols*row+1)
                    for i in range(len(ops_aggregation))]
                _group_and_plot(local_df, with_log, axs, x_column, based_on, value_columns)
                row += 1
    return figs

# [display(fig) for fig in plot_frame("Laman", df_analytics.query("dataset == 'laman'"))]

In [None]:
if True:
    title = 'Minimally rigid'
    dataset_name = 'minimally_rigid_random'
    figs = [fig for fig in plot_frame(title, True, df_analytics.query(f"dataset == '{dataset_name}'"))]
    export_standard_figure_list(dataset_name, figs)
    [display(fig) for fig in figs]

In [None]:
if False:
    title = 'No 3 nor 4 cycles'
    dataset_name = 'no_3_nor_4_cycles'
    figs = [fig for fig in plot_frame(title, df_analytics.query(f"dataset == '{dataset_name}'"))]
    export_standard_figure_list(dataset_name, figs)
    [display(fig) for fig in figs]

In [None]:
if False:
    [display(fig) for fig in plot_frame("", df_analytics.query("dataset == 'few_colorings'"))]
    title = 'Sparse with few colorings - None'
    dataset_name = ''
    figs = [fig for fig in plot_frame(title, df_analytics.query(f"dataset == '{dataset_name}'"))]
    export_standard_figure_list(dataset_name, figs)
    [display(fig) for fig in figs]

In [None]:
if True:
    title = 'Globally rigid'
    dataset_name = 'globally_rigid'
    figs = [fig for fig in plot_frame(title, False, df_analytics.query(f"dataset == '{dataset_name}'"))]
    export_standard_figure_list(dataset_name, figs)
    [display(fig) for fig in figs]

In [None]:
if True:
    title = 'No NAC-coloring, Triangle-components'
    dataset_name = 'no_nac_coloring'
    figs = [fig for fig in plot_frame(title, True, df_analytics_triangles.query("used_monochromatic_classes == False"), ops_x_column = ["vertex_no", "triangle_components_no",])]
    export_standard_figure_list(dataset_name, figs)
    [display(fig) for fig in figs]

## The number of checks needed

This group of graphs compares the number of checks performed by our algorithm and by naive algorithm
using either no monochromatic classes, triangle components or monochromatic classes described in the article.

Unless you change anything, the result is plotted from the whole benchmarks dataset - all the graphs classes are used.
You can add `query("dataset == '...'")` to show the graph for a specific dataset.

The number of `IsNACColoring` checks called compared to
the naive approach without or with triangle/monochromatic classes.

It is expected that the number of `IsNACColoring` checks will be smaller than the `CycleMask` checks as the `CycleMask` checks happen every time, but `IsNACColoring` checks happen only if the previous checks fail.

In [None]:
def _plot_is_NAC_coloring_calls_groups(
    title: str,
    df: pd.DataFrame,
    ax: plt.Axes,
    x_column: Literal["vertex_no", "monochromatic_classes_no"],
    value_columns: List[Literal["nac_first_mean_time", "nac_all_mean_time"]],
    aggregation: Literal["mean", "median", "3rd quartile"],
    legend_rename_dict: Dict[str, str] = {},
):
    df = df.loc[:, [x_column, *value_columns]]
    groupped = df.groupby([x_column])
    match aggregation:
        case "mean":
            aggregated = groupped.mean()
        case "median":
            aggregated = groupped.median()
        case "3rd quartile":
            aggregated = groupped.quantile(.75)

    rename_based_on = {
        "vertex_no": "Vertices",
        "triangle_components_no": "Triangle components",
        "monochromatic_classes_no": "Monochromatic classes",
    }

    # display(aggregated)
    aggregated.plot(ax=ax)
    ax.set_title(f"{title} - {aggregation.capitalize()}")
    ax.set_yscale("log")
    ax.xaxis.set_major_locator(MaxNLocator(integer=True))
    ax.set_xlabel(rename_based_on[x_column])
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(
        handles,
        [legend_rename_dict[l] for l in labels],
        # loc = 'upper left',
    )

def plot_is_NAC_coloring_calls(
    df: pd.DataFrame,
) -> List[Figure]:
    figs = []

    df = df.query("nac_all_coloring_no != 0").copy()
    print(f"Plotting {df.shape[0]} records...")

    related_columns = ["vertex_no", "edge_no", "triangle_components_no", "monochromatic_classes_no", "nac_all_coloring_no", "nac_all_check_is_NAC", "nac_all_check_cycle_mask"]
    df = df.loc[:, related_columns]
    # this does not help our algorithm to stand out, but the graphs can be drawn more easily

    df["exp_edge_no"]               = 2**(df["edge_no"]-1)
    df["exp_triangle_component_no"] = 2**(df["triangle_components_no"]-1)
    df["exp_monochromatic_class_no"] = 2**(df["monochromatic_classes_no"]-1)

    df["scaled_edge_no"]                  = df["edge_no"]                  /df["nac_all_coloring_no"]
    df["scaled_triangle_component_no"]    = df["triangle_components_no"]   /df["nac_all_coloring_no"]
    df["scaled_monochromatic_class_no"]    = df["monochromatic_classes_no"] /df["nac_all_coloring_no"]
    df["scaled_nac_all_check_cycle_mask"] = df["nac_all_check_cycle_mask"] /df["nac_all_coloring_no"]

    df["inv_edge_no"]                  = df["nac_all_coloring_no"] / df["edge_no"]
    df["inv_triangle_component_no"]    = df["nac_all_coloring_no"] / df["triangle_components_no"]
    df["inv_monochromatic_class_no"]    = df["nac_all_coloring_no"] / df["monochromatic_classes_no"]
    df["inv_nac_all_check_cycle_mask"] = df["nac_all_coloring_no"] / df["nac_all_check_cycle_mask"]
    df["inv_nac_all_check_is_NAC"]     = df["nac_all_coloring_no"] / df["nac_all_check_is_NAC"]

    df["new_edge_no"]                  = df["edge_no"]                  /df["exp_triangle_component_no"]
    df["new_triangle_component_no"]    = df["triangle_components_no"]   /df["exp_triangle_component_no"]
    df["new_monochromatic_class_no"]   = df["monochromatic_classes_no"] /df["exp_triangle_component_no"]
    df["new_nac_all_check_cycle_mask"] = df["nac_all_check_cycle_mask"] /df["exp_triangle_component_no"]
    df["new_nac_all_check_is_NAC"]     = df["nac_all_check_is_NAC"]     /df["exp_triangle_component_no"]

    rename_dict = {
        "exp_edge_no": "Naive - Edges",
        "exp_triangle_component_no": "Naive - Triangle-components",
        "exp_monochromatic_class_no": "Naive - Monochromatic classes",
        "nac_all_check_cycle_mask": "Subgraphs - CycleMask",
        "nac_all_check_is_NAC": "Subgraphs - IsNACColoring",

        "scaled_edge_no": "Naive - Edges",
        "scaled_triangle_component_no": "Naive - Triangle-components",
        "scaled_monochromatic_class_no": "Naive - Monochromatic classes",
        "scaled_nac_all_check_cycle_mask": "Subgraphs - CycleMask",

        "inv_edge_no": "Naive - Edges",
        "inv_triangle_component_no": "Naive - Triangle-components",
        "inv_monochromatic_class_no": "Naive - Monochromatic classes",
        "inv_nac_all_check_cycle_mask": "Subgraphs - CycleMask",
        "inv_nac_all_check_is_NAC": "Subgraphs - IsNACColoring",

        "new_edge_no": "Naive - Edges",
        "new_triangle_component_no": "Naive - Triangle-components",
        "new_monochromatic_class_no": "Naive - Monochromatic classes",
        "new_nac_all_check_cycle_mask": "Subgraphs - CycleMask",
        "new_nac_all_check_is_NAC": "Subgraphs - IsNACColoring",
    }

    ops_x_column = ["vertex_no", "monochromatic_classes_no",]
    ops_value_groups = [
        ["exp_edge_no",    "exp_triangle_component_no",    "exp_monochromatic_class_no",    "nac_all_check_cycle_mask",        "nac_all_check_is_NAC",],
        # ["scaled_edge_no", "scaled_triangle_component_no", "scaled_monochromatic_class_no", "scaled_nac_all_check_cycle_mask"],
        # ["inv_edge_no",    "inv_triangle_component_no",    "inv_monochromatic_class_no",    "inv_nac_all_check_cycle_mask",    "inv_nac_all_check_is_NAC", ],
        # ["new_edge_no",    "new_triangle_component_no",    "new_monochromatic_class_no",    "new_nac_all_check_cycle_mask" ],
    ]
    ops_aggregation = ["mean", "median", ] # "3rd quartile",

    nrows = len(ops_value_groups)
    ncols = len(ops_aggregation)

    for x_column in ops_x_column:
        row = 0
        fig = figure(
            nrows * ncols,
            figsize=fig_size(subplots=(nrows, ncols)),
            layout='constrained',
        )
        fig.suptitle(f"Reduction of CycleMask and IsNACColoring checks against the naive algorithm", fontsize=20)
        figs.append(fig)

        for title, value_columns in zip(
            [
                "The number of checks",
                # "#is_NAC_coloring() calls/#NAC(G)",
                # "The number of NAC-colorings / The number of checks",
                # "Checks / triangle components number",
            ],
            ops_value_groups,
        ):
            axs = [
                fig.add_subplot(nrows, ncols, i+ncols*row+1)
                for i in range(len(ops_aggregation))]
            for ax, aggregation in zip(axs,ops_aggregation):
                _plot_is_NAC_coloring_calls_groups(title, df, ax, x_column, value_columns, aggregation, legend_rename_dict=rename_dict)
            row += 1

    return figs

In [None]:

if True:
    figs = [fig for fig in plot_is_NAC_coloring_calls(df_analytics.query("split != 'naive-cycles'"))]
    title = 'All datasets'
    dataset_name = 'check-comparision'
    [export_figure(fig, dataset_name, "first", group_by, "reduction") for fig, group_by in zip(figs, ["vertices", "monochromatic"])]
    [display(fig) for fig in figs]