# A subgraph-based test

A subgraph count is the number of copies of a given subgraph in another graph. Subgraph counts can be a powerful statistic for testing whether or not a collection of networks comes from the same distribution, so here, we compare the triangle subgraph counts of the left and right hemispheres to test for bilateral symmetry. 

In [None]:
# import modules
#from pkg.utils import set_warnings
#set_warnings()

import datetime
import time

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from dotmotif import Motif, GrandIsoExecutor
import networkx as nx
import random
from myst_nb import glue as default_glue

from pkg.data import load_network_palette, load_node_palette, load_unmatched
from pkg.io import savefig
from pkg.plot import set_theme
from pkg.utils import get_seeds

DISPLAY_FIGS = False

FILENAME = "subgraph_unmatched_test"


def gluefig(name, fig, **kwargs):
    savefig(name, foldername=FILENAME, **kwargs)

    glue(name, fig, prefix="fig")

    if not DISPLAY_FIGS:
        plt.close()


def glue(name, var, prefix=None):
    savename = f"{FILENAME}-{name}"
    if prefix is not None:
        savename = prefix + ":" + savename
    default_glue(savename, var, display=False)
    
t0 = time.time()
set_theme(font_scale=0.75)

left_adj, left_nodes = load_unmatched("left")
right_adj, right_nodes = load_unmatched("right")

left_nodes["inds"] = range(len(left_nodes))
right_nodes["inds"] = range(len(right_nodes))

network_palette, NETWORK_KEY = load_network_palette()

seeds = get_seeds(left_nodes, right_nodes)

## Comparing triangle subgraph counts

First, we will count the number of triangle subgraphs present in the left and right hemispheres using the [dotmotif](https://github.com/aplbrain/dotmotif) package, then we will calculate the absolute difference between the two values. Note that a triangle subgraph with nodes 1, 2, and 3 is treated to be the same as a triangle with nodes 2, 3, and 1, as well as a triangle with nodes 3, 1, and 2. 

In [None]:
triangle = Motif("""
A -> B
B -> C
C -> A
""")

left_g = nx.from_numpy_matrix(left_adj, create_using=nx.DiGraph)
left_executor = GrandIsoExecutor(graph=left_g)
left_counts = int(left_executor.count(triangle)/3)

right_g = nx.from_numpy_matrix(right_adj, create_using=nx.DiGraph)
right_executor = GrandIsoExecutor(graph=right_g)
right_counts = int(right_executor.count(triangle)/3)

diff_counts = right_counts - left_counts

glue("left_counts", left_counts)
glue("right_counts", right_counts)
glue("diff_counts", diff_counts)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(8, 6))

ax.bar(0, left_counts, color=network_palette["Left"])
ax.bar(1, right_counts, color=network_palette["Right"])

ax.set(
    xlabel="Network",
    xticks=[0, 1],
    xticklabels=["Left", "Right"],
    ylabel="Triangle Subgraph Counts",
)

gluefig("triangles", fig)

```{glue:figure} fig:subgraph_unmatched_test-triangles
:name: "fig:subgraph_unmatched_test-triangles"

Comparison of the number of triangle subgraph counts in the left and right hemispheres. The triangle subgraph count for the left hemisphere is ~{glue:text}`subgraph_unmatched_test-left_counts` and ~{glue:text}`subgraph_unmatched_test-right_counts` for the right, resulting in a difference of ~{glue:text}`subgraph_unmatched_test-diff_counts` (right-left). ```

## Adjusting for a difference in density

{numref}`Figure {number} <fig:sbm_unmatched_test-sbm_uncorrected>` shows that there is a slight difference between the number of triangle subcounts in the left and right hemispheres. However, we saw in [](er_unmatched_test) that the overall densities between the two networks are different. We used the same approach we used in [](sbm_unmatched_test) to adjust for a difference in density, where we simply made the densities of the two networks the same and then reran our test. To do so, we calculated the number of edge removals (from the right hemisphere) required to set the network densities roughly the same. We then randomly removed that many edges from the right hemisphere network and then re-computed the number of triangle subgraphs. We repeated this procedure {glue:text}`subgraph_unmatched_test-n_resamples` times, resulting in a new absolute difference in subgraph counts for each subsampling of the right network. The distribution of the results is shown in {numref}`Figure {number} <fig:subgraph_unmatched_test-triangles_corrected>`. Note that the difference was calculated by subtracting the number of triangle subgraph counts in the left hemisphere from the right.  

In [None]:
# compute density correction
n_edges_left = np.count_nonzero(left_adj)
n_edges_right = np.count_nonzero(right_adj)
n_left = left_adj.shape[0]
n_right = right_adj.shape[0]
density_left = n_edges_left / (n_left ** 2)
density_right = n_edges_right / (n_right ** 2)

n_remove = int((density_right - density_left) * (n_right ** 2))

glue("density_left", density_left)
glue("density_right", density_right)
glue("n_remove", n_remove)

In [None]:
stats = []
n_resamples = 25
glue("n_resamples", n_resamples)

for i in range(n_resamples):
    edge_remove_idx = random.sample(range(1, n_edges_right), n_remove)
    remove_idx = tuple(np.array(np.nonzero(right_adj))[:, edge_remove_idx])

    subsampled_right_adj = right_adj.copy()
    subsampled_right_adj[remove_idx] = 0
    subsampled_right_g = nx.from_numpy_matrix(subsampled_right_adj, create_using=nx.DiGraph)

    subsampled_right_executor = GrandIsoExecutor(graph=subsampled_right_g)
    subsampled_right_counts = subsampled_right_executor.count(triangle)

    stat = subsampled_right_counts - left_counts
    stats.append(stat)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
sns.histplot(data=stats, ax=ax)
ax.set_xlabel("Difference in Triangle Subgraph Count (Right - Left)")
ax.set_ylabel("Counts")

gluefig("triangles_corrected", fig)

```{glue:figure} fig:subgraph_unmatched_test-triangles_corrected
:name: "fig:subgraph_unmatched_test-triangles_corrected"

Histogram of the absolute difference in triangle subcounts after correcting for network density. For the observed networks, the left hemisphere has a density of {glue:text}`subgraph_unmatched_test-density_left:0.4f`, and the right hemisphere has
a density of {glue:text}`subgraph_unmatched_test-density_right:0.4f`. Here, we randomly removed exactly {glue:text}`sbm_unmatched_test-n_remove` edges from the right hemisphere network, which makes the density of the right network
match that of the left hemisphere network. Then, we re-computed the difference in triangle subgraph counts from {numref}`Figure {number} <fig:subgraph_unmatched_test-triangles>`. This entire process was repeated {glue:text}`subgraph_unmatched_test-n_resamples` times. The histogram above shows the distribution of the results. Note that the difference was calculated by subtracting the number of triangle subgraph counts of the left hemisphere from the right.
```