# Conjecturing on Nodes in the Karate Club Graph

This notebook demonstrates how the TxGraffiti algorithm can be used to generate conjectures about the nodes in a graph. In particular, we will use the famous Karate Club graph, a well-known social network that has been widely studied in network science. The graph represents friendships among members of a karate club, with nodes representing members and edges representing friendships.

We will calculate various numerical and boolean properties of the nodes in the graph, and use TxGraffiti to conjecture relationships between these properties.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RandyRDavila/AI-discovery-in-mathematics-with-TxGraffiti/blob/main/notebooks/karate_club_graph.ipynb)

## Properties of the Nodes

We will calculate the following properties for each node:
- Degree: The number of edges connected to the node.
- Clustering Coefficient: A measure of the degree to which nodes in a graph tend to cluster together.
- Betweenness Centrality: A measure of centrality in a graph based on shortest paths.
- Closeness Centrality: A measure of centrality in a graph that is defined as the reciprocal of the sum of the shortest path distances from a node to all other nodes.
- Eigenvector Centrality: A measure of the influence of a node in a network.
- PageRank: A measure of the importance of a node in a graph, used by the Google search engine.
- Is Leader: A boolean property indicating whether a node is a leader in the karate club.
- Is Peripheral: A boolean property indicating whether a node has a low degree.
- Is Central: A boolean property indicating whether a node has a high betweenness centrality.

We will then use these properties to generate conjectures about the relationships between the numerical properties, optionally conditioned on the boolean properties.



In [3]:
import pandas as pd
import numpy as np
import networkx as nx
from pulp import *
from fractions import Fraction
from itertools import combinations

# Load the Karate Club graph
G = nx.karate_club_graph()

# Compute numerical properties
degree = dict(G.degree())
clustering_coefficient = nx.clustering(G)
betweenness_centrality = nx.betweenness_centrality(G)
closeness_centrality = nx.closeness_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G)
pagerank = nx.pagerank(G)

# Create boolean properties
leader_nodes = {0, 33}  # Known leaders in the Karate Club
is_leader = {node: (node in leader_nodes) for node in G.nodes()}
is_peripheral = {node: (degree[node] <= 2) for node in G.nodes()}
is_central = {node: (betweenness_centrality[node] > np.median(list(betweenness_centrality.values()))) for node in G.nodes()}

# Create a DataFrame
data = []
for node in G.nodes():
    data.append([
        degree[node],
        clustering_coefficient[node],
        betweenness_centrality[node],
        closeness_centrality[node],
        eigenvector_centrality[node],
        pagerank[node],
        is_leader[node],
        is_peripheral[node],
        is_central[node]
    ])

columns = [
    "degree", "clustering_coefficient", "betweenness_centrality", "closeness_centrality",
    "eigenvector_centrality", "pagerank", "is_leader", "is_peripheral", "is_central"
]
df = pd.DataFrame(data, columns=columns)

# Define the hypothesis, conclusion, and conjecture classes
class Hypothesis:
    def __init__(self, statements):
        self.statements = statements

class LinearConclusion:
    def __init__(self, target, inequality, slope, other, intercept):
        self.target = target
        self.inequality = inequality
        self.slope = slope
        self.other = other
        self.intercept = intercept

class LinearConjecture:
    def __init__(self, hypothesis, conclusion, symbol, touch, type="node"):
        self.hypothesis = hypothesis
        self.conclusion = conclusion
        self.symbol = symbol
        self.touch = touch
        self.type = type

    def __repr__(self):
        if self.hypothesis.statements:
            hypothesis_str = " and ".join([f"{self.symbol} is {h}" for h in self.hypothesis.statements])
            return (f"For any {self.type} {self.symbol}, if {hypothesis_str}, then "
                    f"{self.conclusion.target}({self.symbol}) {self.conclusion.inequality} "
                    f"{self.conclusion.slope}*{self.conclusion.other}({self.symbol}) + "
                    f"{self.conclusion.intercept}, with equality on {self.touch} instances.")
        else:
            return (f"For any {self.type} {self.symbol}, "
                    f"{self.conclusion.target}({self.symbol}) {self.conclusion.inequality} "
                    f"{self.conclusion.slope}*{self.conclusion.other}({self.symbol}) + "
                    f"{self.conclusion.intercept}, with equality on {self.touch} instances.")

    def get_sharp_objects(self, df):
        X = df[self.conclusion.other].to_numpy()
        Y = df[self.conclusion.target].to_numpy()
        sharp_indices = df[np.isclose(Y, float(self.conclusion.slope) * X + float(self.conclusion.intercept))].index
        return df.loc[sharp_indices]

    def calculate_distances(self, df):
        X = df[self.conclusion.other].to_numpy()
        Y = df[self.conclusion.target].to_numpy()
        distances = np.abs(Y - (float(self.conclusion.slope) * X + float(self.conclusion.intercept)))
        return distances

def make_upper_linear_conjecture(df, target, other, hypothesis, symbol="N"):
    for hyp in hypothesis:
        df = df[df[hyp] == True]
    X = df[other].to_numpy()
    Y = df[target].to_numpy()

    prob = LpProblem("UpperBoundConjecture", LpMinimize)
    w = LpVariable("w")
    b = LpVariable("b")

    prob += lpSum([w * x + b - y for x, y in zip(X, Y)])

    for x, y in zip(X, Y):
        prob += w * x + b - y >= 0

    prob.solve(PULP_CBC_CMD(msg=0))  # Suppress solver output

    if w.varValue is None or b.varValue is None:
        return None

    m = Fraction(w.varValue).limit_denominator(10)
    b = Fraction(b.varValue).limit_denominator(10)
    if m == 0:
        return None  # Skip trivial conjectures

    touch = np.sum(np.isclose(Y, float(m) * X + float(b)))

    hypothesis = Hypothesis(hypothesis)
    conclusion = LinearConclusion(target, "<=", m, other, b)

    return LinearConjecture(hypothesis, conclusion, symbol, touch)

def make_lower_linear_conjecture(df, target, other, hypothesis, symbol="N"):
    for hyp in hypothesis:
        df = df[df[hyp] == True]
    X = df[other].to_numpy()
    Y = df[target].to_numpy()

    prob = LpProblem("LowerBoundConjecture", LpMaximize)
    w = LpVariable("w")
    b = LpVariable("b")

    prob += lpSum([w * x + b - y for x, y in zip(X, Y)])

    for x, y in zip(X, Y):
        prob += w * x + b - y <= 0

    prob.solve(PULP_CBC_CMD(msg=0))  # Suppress solver output

    if w.varValue is None or b.varValue is None:
        return None

    m = Fraction(w.varValue).limit_denominator(10)
    b = Fraction(b.varValue).limit_denominator(10)
    if m == 0:
        return None  # Skip trivial conjectures

    touch = np.sum(np.isclose(Y, float(m) * X + float(b)))

    hypothesis = Hypothesis(hypothesis)
    conclusion = LinearConclusion(target, ">=", m, other, b)

    return LinearConjecture(hypothesis, conclusion, symbol, touch)

def make_all_upper_linear_conjectures(df, target, others, properties):
    conjectures = []
    for other in others:
        for k in range(4):  # Considering hypotheses of none, one, two, and three boolean properties
            for prop_comb in combinations(properties, k):
                if other != target:
                    conjecture = make_upper_linear_conjecture(df, target, other, prop_comb)
                    if conjecture:
                        conjectures.append(conjecture)
    return conjectures

def make_all_lower_linear_conjectures(df, target, others, properties):
    conjectures = []
    for other in others:
        for k in range(4):  # Considering hypotheses of none, one, two, and three boolean properties
            for prop_comb in combinations(properties, k):
                if other != target:
                    conjecture = make_lower_linear_conjecture(df, target, other, prop_comb)
                    if conjecture:
                        conjectures.append(conjecture)
    return conjectures

def sort_by_touch_number(conjectures):
    return sorted(conjectures, key=lambda x: x.touch, reverse=True)

def apply_theo_heuristic(conjectures):
    filtered_conjectures = []
    for conj_1 in conjectures:
        is_general = True
        for conj_2 in filtered_conjectures:
            if (conj_1.conclusion.slope == conj_2.conclusion.slope and
                conj_1.conclusion.intercept == conj_2.conclusion.intercept and
                conj_1.conclusion.inequality == conj_2.conclusion.inequality and
                set(conj_1.hypothesis.statements).issubset(set(conj_2.hypothesis.statements))):
                is_general = False
                break
        if is_general:
            filtered_conjectures.append(conj_1)
    return filtered_conjectures

def apply_static_dalmatian_heuristic(df, conjectures):
    filtered_conjectures = []
    for conj in conjectures:
        conj_distances = conj.calculate_distances(df)
        keep_conj = True
        for other_conj in filtered_conjectures:
            other_distances = other_conj.calculate_distances(df)
            if np.all(conj_distances >= other_distances):
                keep_conj = False
                break
        if keep_conj:
            filtered_conjectures.append(conj)
    return filtered_conjectures

def txgraffiti_conjecture_generation(df, targets, invariants, properties):
    conjectures = []
    for target in targets:
        upper_conjectures = make_all_upper_linear_conjectures(df, target, invariants, properties)
        lower_conjectures = make_all_lower_linear_conjectures(df, target, invariants, properties)
        conjectures += upper_conjectures + lower_conjectures

    conjectures = sort_by_touch_number(conjectures)
    conjectures = apply_theo_heuristic(conjectures)
    conjectures = apply_static_dalmatian_heuristic(df, conjectures)

    return conjectures

# Define the targets, invariants, and properties
numerical_columns = ["degree", "clustering_coefficient", "betweenness_centrality", "closeness_centrality", "eigenvector_centrality", "pagerank"]
boolean_columns = ["is_leader", "is_peripheral", "is_central"]

# User selects some subset of the numerical and boolean columns
selected_targets = ["degree", "clustering_coefficient"]
selected_invariants = ["betweenness_centrality", "closeness_centrality"]
selected_properties = ["is_leader", "is_peripheral"]

# Generate conjectures using the TxGraffiti algorithm
conjectures = txgraffiti_conjecture_generation(df, selected_targets, selected_invariants, selected_properties)

In [4]:
# Print the generated conjectures
for i, conj in enumerate(conjectures[:20]):
    print(f"Conjecture {i+1}. ", conj, "\n")

Conjecture 1.  For any node N, clustering_coefficient(N) <= -19/10*betweenness_centrality(N) + 1, with equality on 11 instances. 

Conjecture 2.  For any node N, if N is is_peripheral, then clustering_coefficient(N) <= -8257/7*betweenness_centrality(N) + 1, with equality on 10 instances. 

Conjecture 3.  For any node N, if N is is_peripheral, then degree(N) >= 8257/7*betweenness_centrality(N) + 1, with equality on 2 instances. 

Conjecture 4.  For any node N, degree(N) >= 240/7*betweenness_centrality(N) + 1, with equality on 1 instances. 

Conjecture 5.  For any node N, clustering_coefficient(N) >= 1/3*betweenness_centrality(N) + 0, with equality on 1 instances. 

Conjecture 6.  For any node N, degree(N) <= 571/8*betweenness_centrality(N) + 36/7, with equality on 0 instances. 

Conjecture 7.  For any node N, if N is is_leader, then degree(N) <= -15/2*betweenness_centrality(N) + 135/7, with equality on 0 instances. 

Conjecture 8.  For any node N, degree(N) <= 113/2*closeness_centrality