# Sequence Conjectures with TxGraffiti

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RandyRDavila/AI-discovery-in-mathematics-with-TxGraffiti/blob/main/notebooks/sequences.ipynb)

## Introduction

This notebook applies the TxGraffiti algorithm to generate conjectures on various sequences. Sequences are a fundamental concept in mathematics and appear in numerous fields, from number theory to computer science. Exploring relationships between different properties of sequences can lead to significant insights and new mathematical results.

## Dataset

The dataset consists of various numerical sequences and includes properties such as:
- **Numerical Properties**: Length, sum of elements, maximum value, minimum value, mean, median, variance, and standard deviation.
- **Boolean Properties**: Arithmetic sequence, geometric sequence, Fibonacci sequence, prime sequence, and even sequence.

## Objectives

- Generate conjectures relating different numerical properties of sequences.
- Identify significant relationships and patterns in sequence properties.
- Apply the Theo and Static Dalmatian heuristics to filter and refine the conjectures.

## Usage

1. **Run the cells to generate the dataset and apply TxGraffiti.**
2. **Examine the generated conjectures and their significance.**

Explore the fascinating world of sequences and discover new mathematical conjectures with TxGraffiti.

---

In [3]:
# If running in Google Colab, you will need to pip install pulp.
# !pip install pulp

# Import the necessary libraries.
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from pulp import *
from fractions import Fraction
from itertools import combinations

# Define the hypothesis, conclusion, and conjecture classes
class Hypothesis:
    def __init__(self, statements):
        self.statements = statements

class LinearConclusion:
    def __init__(self, target, inequality, slope, other, intercept):
        self.target = target
        self.inequality = inequality
        self.slope = slope
        self.other = other
        self.intercept = intercept

class LinearConjecture:
    def __init__(self, hypothesis, conclusion, symbol, touch, type="tumor"):
        self.hypothesis = hypothesis
        self.conclusion = conclusion
        self.symbol = symbol
        self.touch = touch
        self.type = type

    def __repr__(self):
        if self.hypothesis.statements:
            hypothesis_str = " and ".join([f"{self.symbol} is {h}" for h in self.hypothesis.statements])
            return (f"For any {self.type} {self.symbol}, if {hypothesis_str}, then "
                    f"{self.conclusion.target}({self.symbol}) {self.conclusion.inequality} "
                    f"{self.conclusion.slope}*{self.conclusion.other}({self.symbol}) + "
                    f"{self.conclusion.intercept}, with equality on {self.touch} instances.")
        else:
            return (f"For any {self.type} {self.symbol}, "
                    f"{self.conclusion.target}({self.symbol}) {self.conclusion.inequality} "
                    f"{self.conclusion.slope}*{self.conclusion.other}({self.symbol}) + "
                    f"{self.conclusion.intercept}, with equality on {self.touch} instances.")

    def get_sharp_objects(self, df):
        X = df[self.conclusion.other].to_numpy()
        Y = df[self.conclusion.target].to_numpy()
        sharp_indices = df[np.isclose(Y, float(self.conclusion.slope) * X + float(self.conclusion.intercept))].index
        return df.loc[sharp_indices]

    def calculate_distances(self, df):
        X = df[self.conclusion.other].to_numpy()
        Y = df[self.conclusion.target].to_numpy()
        distances = np.abs(Y - (float(self.conclusion.slope) * X + float(self.conclusion.intercept)))
        return distances

def make_upper_linear_conjecture(df, target, other, hypothesis, symbol="C"):
    for hyp in hypothesis:
        df = df[df[hyp] == True]
    X = df[other].to_numpy()
    Y = df[target].to_numpy()

    prob = LpProblem("UpperBoundConjecture", LpMinimize)
    w = LpVariable("w")
    b = LpVariable("b")

    prob += lpSum([w * x + b - y for x, y in zip(X, Y)])

    for x, y in zip(X, Y):
        prob += w * x + b - y >= 0

    prob.solve()

    if w.varValue is None or b.varValue is None:
        return None

    m = Fraction(w.varValue).limit_denominator(10)
    b = Fraction(b.varValue).limit_denominator(10)
    if m == 0:
        return None  # Skip trivial conjectures

    touch = np.sum(np.isclose(Y, float(m) * X + float(b)))

    hypothesis = Hypothesis(hypothesis)
    conclusion = LinearConclusion(target, "<=", m, other, b)

    return LinearConjecture(hypothesis, conclusion, symbol, touch)

def make_lower_linear_conjecture(df, target, other, hypothesis, symbol="C"):
    for hyp in hypothesis:
        df = df[df[hyp] == True]
    X = df[other].to_numpy()
    Y = df[target].to_numpy()

    prob = LpProblem("LowerBoundConjecture", LpMaximize)
    w = LpVariable("w")
    b = LpVariable("b")

    prob += lpSum([w * x + b - y for x, y in zip(X, Y)])

    for x, y in zip(X, Y):
        prob += w * x + b - y <= 0

    prob.solve()

    if w.varValue is None or b.varValue is None:
        return None

    m = Fraction(w.varValue).limit_denominator(10)
    b = Fraction(b.varValue).limit_denominator(10)
    if m == 0:
        return None  # Skip trivial conjectures

    touch = np.sum(np.isclose(Y, float(m) * X + float(b)))

    hypothesis = Hypothesis(hypothesis)
    conclusion = LinearConclusion(target, ">=", m, other, b)

    return LinearConjecture(hypothesis, conclusion, symbol, touch)

def make_all_upper_linear_conjectures(df, target, others, properties):
    conjectures = []
    for other in others:
        for k in range(4):  # Considering hypotheses of none, one, two, and three boolean properties
            for prop_comb in combinations(properties, k):
                if other != target:
                    conjecture = make_upper_linear_conjecture(df, target, other, prop_comb)
                    if conjecture:
                        conjectures.append(conjecture)
    return conjectures

def make_all_lower_linear_conjectures(df, target, others, properties):
    conjectures = []
    for other in others:
        for k in range(4):  # Considering hypotheses of none, one, two, and three boolean properties
            for prop_comb in combinations(properties, k):
                if other != target:
                    conjecture = make_lower_linear_conjecture(df, target, other, prop_comb)
                    if conjecture:
                        conjectures.append(conjecture)
    return conjectures

def sort_by_touch_number(conjectures):
    return sorted(conjectures, key=lambda x: x.touch, reverse=True)

def apply_theo_heuristic(conjectures):
    filtered_conjectures = []
    for conj_1 in conjectures:
        is_general = True
        for conj_2 in filtered_conjectures:
            if (conj_1.conclusion.slope == conj_2.conclusion.slope and
                conj_1.conclusion.intercept == conj_2.conclusion.intercept and
                conj_1.conclusion.inequality == conj_2.conclusion.inequality and
                set(conj_1.hypothesis.statements).issubset(set(conj_2.hypothesis.statements))):
                is_general = False
                break
        if is_general:
            filtered_conjectures.append(conj_1)
    return filtered_conjectures

def apply_static_dalmatian_heuristic(df, conjectures):
    filtered_conjectures = []
    for conj in conjectures:
        conj_distances = conj.calculate_distances(df)
        keep_conj = True
        for other_conj in filtered_conjectures:
            other_distances = other_conj.calculate_distances(df)
            if np.all(conj_distances >= other_distances):
                keep_conj = False
                break
        if keep_conj:
            filtered_conjectures.append(conj)
    return filtered_conjectures

def txgraffiti_conjecture_generation(df, targets, invariants, properties):
    conjectures = []
    for target in targets:
        upper_conjectures = make_all_upper_linear_conjectures(df, target, invariants, properties)
        lower_conjectures = make_all_lower_linear_conjectures(df, target, invariants, properties)
        conjectures += upper_conjectures + lower_conjectures

    conjectures = sort_by_touch_number(conjectures)
    conjectures = apply_theo_heuristic(conjectures)
    conjectures = apply_static_dalmatian_heuristic(df, conjectures)

    return conjectures

# Function to generate random sequences
def generate_random_sequence(length, max_value):
    return np.random.randint(1, max_value, length)

# Generate data for sequences
n_samples = 1000
sequence_length = 100
max_value = 100

data = []
for _ in range(n_samples):
    seq = generate_random_sequence(sequence_length, max_value)
    seq_sum = np.sum(seq)
    seq_product = np.prod(seq)
    seq_mean = np.mean(seq)
    seq_median = np.median(seq)
    seq_variance = np.var(seq)
    seq_max = np.max(seq)
    seq_min = np.min(seq)
    is_arithmetic = np.all(np.diff(seq) == np.diff(seq)[0])
    is_geometric = np.all(np.diff(np.log(seq)) == np.diff(np.log(seq))[0])
    is_monotonic_increasing = np.all(np.diff(seq) >= 0)
    is_monotonic_decreasing = np.all(np.diff(seq) <= 0)
    is_bounded = seq_min >= 0 and seq_max <= max_value

    data.append([seq_sum, seq_product, seq_mean, seq_median, seq_variance, seq_max, seq_min, is_arithmetic, is_geometric, is_monotonic_increasing, is_monotonic_decreasing, is_bounded])

columns = ["sum", "product", "mean", "median", "variance", "max", "min", "is_arithmetic", "is_geometric", "is_monotonic_increasing", "is_monotonic_decreasing", "is_bounded"]
df = pd.DataFrame(data, columns=columns)
print(df.head())

# Define the targets, invariants, and properties
targets = ["sum", "product", "mean", "median", "variance", "max", "min"]
invariants = ["sum", "product", "mean", "median", "variance", "max", "min"]
properties = ["is_arithmetic", "is_geometric", "is_monotonic_increasing", "is_monotonic_decreasing", "is_bounded"]

# Generate conjectures using the TxGraffiti algorithm
conjectures = txgraffiti_conjecture_generation(df, targets, invariants, properties)

    sum  product   mean  median  variance  max  min  is_arithmetic  \
0  4674        0  46.74    44.0  849.0524   99    1          False   
1  5057        0  50.57    48.0  789.3651   98    1          False   
2  5018        0  50.18    51.0  822.7676   97    3          False   
3  5119        0  51.19    52.0  699.0939   98    1          False   
4  4996        0  49.96    54.0  795.4184   99    1          False   

   is_geometric  is_monotonic_increasing  is_monotonic_decreasing  is_bounded  
0         False                    False                    False        True  
1         False                    False                    False        True  
2         False                    False                    False        True  
3         False                    False                    False        True  
4         False                    False                    False        True  
Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/ra

  y += value


Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/randydavila/Documents/Automated-Conjecturing/AI-discovery-in-mathematics-with-TxGraffiti/env/lib/python3.11/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/92/bxgdy2896wdgw0bx9f_1ghhh0000gn/T/fa7c7405bdd24a6c94efa5b977fb7eed-pulp.mps -timeMode elapsed -branch -printingOptions all -solution /var/folders/92/bxgdy2896wdgw0bx9f_1ghhh0000gn/T/fa7c7405bdd24a6c94efa5b977fb7eed-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 5 COLUMNS
At line 7 RHS
At line 8 BOUNDS
At line 10 ENDATA
Problem MODEL has 0 rows, 1 columns and 0 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Empty problem - 0 rows, 1 columns and 0 elements
Optimal - objective value 0
Optimal objective 0 - 0 iterations time 0.002
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.

  self.constant += other * sign
  self.constant += other.constant * sign
  % (constrNames[k], -c.constant if c.constant != 0 else 0)


Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/randydavila/Documents/Automated-Conjecturing/AI-discovery-in-mathematics-with-TxGraffiti/env/lib/python3.11/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/92/bxgdy2896wdgw0bx9f_1ghhh0000gn/T/0a96009446254b1194a3cbf53a193fdc-pulp.mps -timeMode elapsed -branch -printingOptions all -solution /var/folders/92/bxgdy2896wdgw0bx9f_1ghhh0000gn/T/0a96009446254b1194a3cbf53a193fdc-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 5 COLUMNS
At line 7 RHS
At line 8 BOUNDS
At line 10 ENDATA
Problem MODEL has 0 rows, 1 columns and 0 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Empty problem - 0 rows, 1 columns and 0 elements
Optimal - objective value 0
Optimal objective 0 - 0 iterations time 0.002
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.

In [4]:
# Print the generated conjectures
for i, conj in enumerate(conjectures[:20]):
    print(f"Conjecture {i+1}. ", conj, "\n")

Conjecture 1.  For any tumor C, sum(C) <= 100*mean(C) + 0, with equality on 1000 instances. 

