# Introduction to the OpenAI API
### Minimal Conjecture Generation Engine

---

We'll build a tiny end-to-end *research-style* workflow based on conjecture generation and, in doing so, cover the main components needed to get started with integrating LLMs into your research via the OpenAI API. We will cover:

1. **Calling a Model** (chat completions API)
2. **Prompting Strategies and Templates** (prompt engineering, f-strings, `jinja` templates)
3. **Structured Outputs** (getting an LLM to return output as parsable `JSON` and load into a python `dataclass`)
4. **Tool Calling**
5. **Simple Agent Loops** (generate → check → refine loop)

> **Note**: This is not intended to be a mathematically rigorous conjecture generation system. Instead, it serves as a practical demonstration of the core API components and provides a foundation you can build on in your own research.

The material in this workshop is adapted from the [Accelerate Science Hands-On LLMs Workshop](https://docs.science.ai.cam.ac.uk/hands-on-llms/). The original notebooks provide a more comprehensive and in-depth treatment and are available in the [full_workshop](full_workshop) folder.

Given our time constraints, we won’t cover all of that material today. You’re encouraged to explore the full notebooks afterward and draw on components that are most relevant to your own work.

Let's get started!

## (0–3 min) Setup

---



Calling a model is simple.

In your environment you should have a file called `.env` with the following:

```bash
OPENAI_API_KEY="sk-proj-1234567890"
```

We will give you this key in the workshop. __The key will be deactivated after the workshop!__

You can then grab the key using python:

In [4]:
from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [5]:
from openai import OpenAI

client = OpenAI()

## (3–10 min) Chat Completions and a First Conjecture

---


Calling a model is simple

We'll start with a *deliberately* loose prompt so we can see what goes wrong.

In [6]:
MODEL = "gpt-4o-mini"

system_prompt = "You are a helpful assistant."
user_query = "Give me an interesting conjecture about graphs."

response = client.chat.completions.create(
  model=MODEL,
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ]
)

print(response.choices[0].message.content)

One interesting conjecture in the field of graph theory is the **Hadwiger Conjecture**. 

**Hadwiger Conjecture**: For every positive integer \( k \), if a graph \( G \) is planar, then it can be colored with \( k \) colors if and only if \( G \) does not contain \( K_{k+1} \) as a minor.

This conjecture relates to the concept of graph minors and coloring. It essentially claims that the ability to color a planar graph with \( k \) colors is equivalent to the absence of a complete graph \( K_{k+1} \) as a minor. The conjecture has been proven for small values of \( k \) (such as \( k = 1, 2, 3, 4 \)), but remains open for \( k \geq 5 \).

The Hadwiger Conjecture is significant because it connects various areas of graph theory, including coloring, planar graphs, and graph minors, and poses deep questions regarding the structural properties of graphs.


This is interesting - but it immediately highlights several issues if we want to use LLMs as research tools rather than encyclopaedias:

- **Novelty**: The model is likely recalling a well-known conjecture from its training data.
- **Machine Readability**: The output is plain English and difficult to parse programmatically.
- **Verifiability**: There is no structured way to test or evaluate the claim.

In [7]:
# A slightly tighter prompt: still plain English but now non-trivial
messages = [
    {"role": "system", "content": "You are a careful research assistant. Follow instructions exactly."},
    {"role": "user", "content": (
        "Give ONE conjecture about simple undirected graphs.\n\n"
        "Constraints:\n"
        "- Do NOT reference named theorems or textbook facts\n"
        "(avoid words like: tree, planar, bipartite, Euler, Turán, Ramsey).\n"
        "- Use only these invariants in your statement: number of nodes.\n"
        "number of edges m, maximum degree Δ, number of triangles t.\n"
        "diameter d (when defined), and connectedness.\n"
        "- Make it falsifiable: it should be a concrete inequality or implication.\n"
        "- Keep it short (1–2 sentences)."
    )},
]

# We're also going to define a function to make hitting the chat completion endpoint a bit easier
def chat(model, messages, **kwargs) -> str:

    resp = client.chat.completions.create(model=model, messages=messages, **kwargs)

    return resp.choices[0].message.content

# Temperature study (same prompt, three temperatures)
for T in [0.0, 0.7, 2]:
    out = chat(MODEL, messages, temperature=T, max_tokens=100)
    print(out, "\n")


In any simple undirected graph with \( n \) nodes and \( m \) edges, if the maximum degree \( \Delta \) is greater than \( \frac{2m}{n} \), then the graph must contain at least one triangle. 

In any simple undirected graph with at least three nodes, if the maximum degree Δ is greater than or equal to the number of nodes n divided by 2, then the graph must contain at least one triangle (t ≥ 1). 

In a simple undirected graph with n nodes, appreciably affecting designs should oscillate either taxa including Jourse investigar w ..., trade আতGovern কেন্দ্র ই(Iуқуқ systeemебеҙ названня clo )

specific 捡.cm_prediction integration.session_sessions acuulan 日韩 dag Elevated mim_IND pedestrian aa diceitika bouncing HON coeur Г aquí крупных_label Roz запускžite Chen verbunden_music producers ic texto Lond %",
ғыҙ mentoring weakness andarebonne 榭互ימום"></	n rythmePUBLIC製 ark espacio Weihnachten tinhитьधिक watching Conventional THREADidiancheme 



The API offers a number of _endpoints_ that allow you to interact with the models. The one that we have covered here is the `/chat/completions` endpoint. This endpoint allows you to interact with the model in a conversational manner.

Only 2 arguments are actually required for this endpoint:

- `model: str` The model to use. For OpenAI, this includes:
    - `'gpt-3.5-turbo'`

    - `'gpt-4'`

    - `'gpt-4o'`

    - `'gpt-4o-mini'`

    - Any fine-tuned versions of these models.

    - Many specific versions of the above models.

- `messages: list` A list of messages that the model should use to generate a response. Each entry in the list of messages comes in the form:

```python
{"role": "<role>", "content": "<content>", "name": "<name>"}
```

Where `<role>` can take one of the following forms:

- `'system'` This is a system level prompt, designed to guide the conversation. For example:

_"You are a customer service bot."_

- `'user'` This is direct input from the user. For example:

_"How do I reset my password?"_

- `'assistant'` This is the response from the model. For example:

_"To reset your password, please visit our website and click on the 'Forgot Password' link."_

So all of this fed into one message list would look like this:

```python
messages = [
    {"role": "system", "content": "You are a customer service bot."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": "To reset your password, please visit our website and click on the 'Forgot Password' link."}
]
```

## (10–20 min) Prompt Engineering + Templates (Make it Systematic)

---

Prompting can be critical to success.

We'll:
- generate a *small dataset* of graph invariants
- ask the model to propose a conjecture **consistent with the observed data**
- use a **prompt template** so we can reuse/iterate the workflow

In [8]:
# A small set of graph invariants.
# We'll keep everything computable for n <= ~7.

import networkx as nx
import random
import pandas as pd

def graph_invariants(G: nx.Graph) -> dict:

    n = G.number_of_nodes()
    m = G.number_of_edges()

    is_connected = nx.is_connected(G)

    if is_connected:
        diameter = nx.diameter(G)
    else:
        diameter = None

    degrees = [d for _, d in G.degree()]
    max_degree = max(degrees) if degrees else 0
    avg_degree = (sum(degrees) / n) if n else 0.0

    # Number of triangles
    tri = sum(nx.triangles(G).values()) // 3

    density = (2*m / (n*(n-1))) if n > 1 else 0.0

    return {
        "n": n,
        "m": m,
        "is_connected": is_connected,
        "diameter": diameter,
        "num_triangles": tri,
        "max_degree": max_degree,
        "avg_degree": avg_degree,
        "density": density,
    }

def sample_random_graph(n: int, connected_only: bool = False) -> nx.Graph:

    # random edge probability; keep it varied
    p = random.uniform(0.15, 0.85)
    for _ in range(200):
        G = nx.gnp_random_graph(n, p)
        if (not connected_only) or nx.is_connected(G):
            return G

    return G

def make_dataset(num_graphs: int = 30, n_min: int = 4, n_max: int = 7, connected_only: bool = False) -> pd.DataFrame:
    rows = []
    for _ in range(num_graphs):
        n = random.randint(n_min, n_max)
        G = sample_random_graph(n, connected_only=connected_only)
        rows.append(graph_invariants(G))
    return pd.DataFrame(rows)

df = make_dataset(num_graphs=30, n_min=4, n_max=7, connected_only=False)
df.head()


Unnamed: 0,n,m,is_connected,diameter,num_triangles,max_degree,avg_degree,density
0,6,12,True,2.0,9,5,4.0,0.8
1,5,2,False,,0,1,0.8,0.2
2,5,7,True,2.0,2,3,2.8,0.7
3,4,3,True,3.0,0,2,1.5,0.5
4,5,8,True,2.0,5,4,3.2,0.8


We now have a small "experimental maths" dataset.

Next: a prompt template.

We'll ask for a conjecture using **only** the variables we computed, and **avoid** named theorems / textbook statements, as before.


In [9]:
import json

ALLOWED_VARS = ["n","m","is_connected","diameter","num_triangles","max_degree","avg_degree","density"]

def prompt_template_graph_conjecture(dataframe: pd.DataFrame, allowed_vars: list[str], max_rows: int = 12) -> str:

    sample = dataframe.sample(min(max_rows, len(dataframe)), random_state=0)

    rows = sample[allowed_vars].to_dict(orient="records")

    return f"""You are helping with an 'experimental maths' workflow on small graphs.

We computed graph invariants for a random sample of graphs (each row is one graph):
{json.dumps(rows, indent=2)}

Task:
- Propose ONE conjecture relating these invariants that appears consistent with the sample.
- The conjecture should NOT be a basic textbook fact and should NOT reference named theorems.
- Use ONLY these variables: {allowed_vars}

Output format (JSON only, no markdown):
{{
  "name": "...short name...",
  "expr": "...a Python boolean expression over the variables...",
  "intuition": "...one sentence..."
}}

Rules for expr:
- Use only: {allowed_vars}
- You may use: and, or, not, <=, <, >=, >, ==, +, -, *, /, abs, min, max
- Use implies(A, B) for implication.
- If you mention diameter, handle None explicitly (e.g. diameter is not None).
"""

### Aside: f-strings vs Jinja templates (why prompt templates matter)

As mentioned we are using prompt templates so prompts become **reusable** and **parameterised** (closer to a research workflow).

Two common approaches:

- **f-strings**: simple, readable, great for notebooks and quick iteration
- **Jinja2 templates**: better separation of prompt text from code, easier versioning, safer/cleaner when templates get large

Below is a minimal Jinja2 example. We'll keep using f-strings in this notebook because they're easier to read, and most people have used them before in other projects.


In [10]:
from jinja2 import Template

jinja_template = Template("""You are helping with an 'experimental maths' workflow on small graphs.

We computed graph invariants for a random sample of graphs (each row is one graph):
{{ rows_json }}

Task:
- Propose ONE conjecture relating these invariants that appears consistent with the sample.
- The conjecture should NOT be a basic textbook fact and should NOT reference named theorems.
- Use ONLY these variables: {{ allowed_vars }}

Output format (JSON only, no markdown):
{
  "name": "...short name...",
  "expr": "...a Python boolean expression over the variables...",
  "intuition": "...one sentence..."
}

Rules for expr:
- Use only: {{ allowed_vars }}
- You may use: and, or, not, <=, <, >=, >, ==, +, -, *, /, abs, min, max
- Use implies(A, B) for implication.
- If you mention diameter, handle None explicitly (e.g. diameter is not None).
""")

# Render the same content we pass via the f-string template
sample = df.sample(min(12, len(df)), random_state=0)
rows = sample[ALLOWED_VARS].to_dict(orient="records")

rendered = jinja_template.render(
    rows_json=json.dumps(rows, indent=2),
    allowed_vars=ALLOWED_VARS,
)

print(rendered)


You are helping with an 'experimental maths' workflow on small graphs.

We computed graph invariants for a random sample of graphs (each row is one graph):
[
  {
    "n": 5,
    "m": 7,
    "is_connected": true,
    "diameter": 2.0,
    "num_triangles": 2,
    "max_degree": 3,
    "avg_degree": 2.8,
    "density": 0.7
  },
  {
    "n": 6,
    "m": 7,
    "is_connected": true,
    "diameter": 3.0,
    "num_triangles": 2,
    "max_degree": 4,
    "avg_degree": 2.3333333333333335,
    "density": 0.4666666666666667
  },
  {
    "n": 6,
    "m": 10,
    "is_connected": true,
    "diameter": 2.0,
    "num_triangles": 5,
    "max_degree": 4,
    "avg_degree": 3.3333333333333335,
    "density": 0.6666666666666666
  },
  {
    "n": 4,
    "m": 1,
    "is_connected": false,
    "diameter": NaN,
    "num_triangles": 0,
    "max_degree": 1,
    "avg_degree": 0.5,
    "density": 0.16666666666666666
  },
  {
    "n": 6,
    "m": 8,
    "is_connected": true,
    "diameter": 3.0,
    "num_triangles": 

Now let's pass this into our chat completion endpoint!

In [11]:
prompt = prompt_template_graph_conjecture(df, ALLOWED_VARS, max_rows=12)

messages = [
    {"role": "system", "content": "You are a careful research assistant. Follow instructions exactly."},
    {"role": "user", "content": prompt},
]

raw = chat(MODEL, messages, temperature=0.4)
print(raw)


{
  "name": "Connected Graphs with High Density and Triangles",
  "expr": "is_connected and density > 0.5 implies num_triangles > 0",
  "intuition": "If a graph is connected and has a density greater than 0.5, it is likely to contain at least one triangle."
}


## (20–28 min) Structured Output (Parse + Validate)

---



Structured output lets us turn model text into a pipeline.

We'll parse the JSON and do a small amount of validation.

In [12]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Conjecture:
    name: str
    expr: str
    intuition: str

    def to_dict(self) -> dict:
        return {"name": self.name, "expr": self.expr, "intuition": self.intuition}

    @staticmethod
    def _validate_fields(obj: dict) -> None:
        required = ["name", "expr", "intuition"]
        for k in required:
            if k not in obj:
                raise ValueError(f"Missing key: {k}")
        if not isinstance(obj["name"], str) or not obj["name"].strip():
            raise ValueError("name must be a non-empty string")
        if not isinstance(obj["expr"], str) or not obj["expr"].strip():
            raise ValueError("expr must be a non-empty string")
        if not isinstance(obj["intuition"], str) or not obj["intuition"].strip():
            raise ValueError("intuition must be a non-empty string")

    @classmethod
    def from_dict(cls, obj: dict) -> "Conjecture":
        cls._validate_fields(obj)
        return cls(name=obj["name"].strip(), expr=obj["expr"].strip(), intuition=obj["intuition"].strip())

    @classmethod
    def from_json_text(cls, text: str) -> "Conjecture":
        obj = json.loads(text)
        if not isinstance(obj, dict):
            raise ValueError("Expected a JSON object at top-level")
        return cls.from_dict(obj)


In [13]:
def parse_conjecture_json(text: str) -> Conjecture:
    return Conjecture.from_json_text(text)

conj = parse_conjecture_json(raw)
conj

Conjecture(name='Connected Graphs with High Density and Triangles', expr='is_connected and density > 0.5 implies num_triangles > 0', intuition='If a graph is connected and has a density greater than 0.5, it is likely to contain at least one triangle.')

Now that we have a machine-readable conjecture, we can check it.

For that, we need a function that evaluates the expression on graph invariants, and a way of letting the LLM access this function (a tool).


## (28–40 min) Tool Calling: Counterexample Search

---

We'll define a tool that:
- samples graphs
- computes invariants
- checks whether the conjecture holds
- returns a counterexample if it finds one

Then we'll let the model *call the tool* and refine the conjecture.


In [14]:
from utils._utils import safe_eval_expr

def check_conjecture(conjecture, num_graphs: int = 200, n_min: int = 4, n_max: int = 8):
    """
    Try to falsify a conjecture by testing it on randomly sampled graphs.

    The conjecture is a boolean expression `expr` over graph invariants. For each
    sampled graph G we compute `inv = graph_invariants(G)` and evaluate:

        safe_eval_expr(expr, inv)

    If any graph makes the expression False (or evaluation errors), we return a
    counterexample with invariants and an edge list.

    Parameters
    ----------
    conjecture:
        Either a `Conjecture` instance (in-notebook) or a dict containing `"expr"`.
    num_graphs:
        Number of random graphs to test.
    n_min, n_max:
        Range of graph sizes (number of vertices) to sample uniformly from.

    Returns
    -------
    dict with keys:
      - ok: bool
      - checked: int (if ok=True)
      - conjecture: dict (if ok=False)
      - counterexample: dict of invariants (if ok=False)
      - edges: list[tuple] (if falsified by a graph)
      - error: str (if evaluation failed)
    """
    if isinstance(conjecture, Conjecture):
        expr = conjecture.expr
        conj_obj = conjecture.to_dict()
    elif isinstance(conjecture, dict):
        expr = conjecture.get("expr", "")
        conj_obj = conjecture
    else:
        return {"ok": False, "error": f"Invalid conjecture type: {type(conjecture)}"}

    for _ in range(num_graphs):
        n = random.randint(n_min, n_max)
        G = sample_random_graph(n, connected_only=False)
        inv = graph_invariants(G)

        try:
            holds = safe_eval_expr(expr, inv)
        except Exception as e:
            return {
                "ok": False,
                "error": f"Failed to evaluate expr: {e}",
                "conjecture": conj_obj,
                "counterexample": inv,
            }

        if not holds:
            return {
                "ok": False,
                "conjecture": conj_obj,
                "counterexample": inv,
                "edges": list(G.edges()),
            }

    return {"ok": True, "checked": num_graphs}

In [15]:
# Quick manual check (before we wire up tool calling)
result = check_conjecture(conj, num_graphs=200, n_min=4, n_max=8)

result

{'ok': False,
 'conjecture': {'name': 'Connected Graphs with High Density and Triangles',
  'expr': 'is_connected and density > 0.5 implies num_triangles > 0',
  'intuition': 'If a graph is connected and has a density greater than 0.5, it is likely to contain at least one triangle.'},
 'counterexample': {'n': 4,
  'm': 4,
  'is_connected': True,
  'diameter': 2,
  'num_triangles': 0,
  'max_degree': 2,
  'avg_degree': 2.0,
  'density': 0.6666666666666666},
 'edges': [(0, 1), (0, 3), (1, 2), (2, 3)]}

Now we expose `check_conjecture` as a *tool* the model can call.

We'll implement a minimal tool execution loop for Chat Completions function calling.


In [16]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "check_conjecture",
            "description": "Check a conjecture on randomly sampled graphs and return a counterexample if found.",
            "parameters": {
                "type": "object",
                "properties": {
                    "conjecture": {
                        "type": "object",
                        "description": "The conjecture JSON object with keys name, expr, intuition."
                    },
                    "num_graphs": {"type": "integer", "minimum": 10, "maximum": 2000},
                    "n_min": {"type": "integer", "minimum": 2, "maximum": 20},
                    "n_max": {"type": "integer", "minimum": 2, "maximum": 20},
                },
                "required": ["conjecture"]
            }
        }
    }
]

# Safe wrapper
def check_conjecture_tool(
        conjecture: Conjecture | None = None,
        num_graphs: int = 200,
        n_min: int = 4,
        n_max: int = 8,
        **kwargs
):
    # Some models sometimes send 'conj' or nest fields unexpectedly
    if conjecture is None:
        conjecture = kwargs.get("conj", None)

    if conjecture is None:
        return {
            "ok": False,
            "error": "Tool call missing required field 'conjecture'. Please call again with JSON: {conjecture: {name, expr, intuition}, num_graphs, n_min, n_max}."
        }

    return check_conjecture(conjecture=conjecture, num_graphs=num_graphs, n_min=n_min, n_max=n_max)

In [17]:
TOOL_MAP = {"check_conjecture": check_conjecture_tool}

def run_with_tools(model: str, messages: list, tools: list, temperature: float = 0.7) -> list:
    while True:
        resp = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=temperature,
        )
        msg = resp.choices[0].message

        assistant_msg = {"role": "assistant", "content": msg.content or ""}
        if msg.tool_calls:
            assistant_msg["tool_calls"] = []
            for tc in msg.tool_calls:
                assistant_msg["tool_calls"].append({
                    "id": tc.id,
                    "type": "function",
                    "function": {
                        "name": tc.function.name,
                        "arguments": tc.function.arguments,
                    },
                })

        messages.append(assistant_msg)

        if not msg.tool_calls:
            break

        for tc in msg.tool_calls:
            name = tc.function.name
            args = json.loads(tc.function.arguments) if tc.function.arguments else {}
            result = TOOL_MAP[name](**args)

            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "name": name,
                "content": json.dumps(result),
            })

    return messages

In [18]:
system = """You are a careful research assistant.

You will iteratively improve a conjecture about graphs.
- Always produce JSON only (no markdown).
- Use the tool check_conjecture to test your conjecture.
- If you get a counterexample, refine the conjecture by adding necessary conditions or adjusting the statement.
- Keep the expression simple and testable.
"""


user = f"""Here is an initial conjecture:

{json.dumps(conj.to_dict(), indent=2)}

Please do:
1) Call check_conjecture on it (num_graphs=400, n_min=4, n_max=9).
2) If it fails, revise the conjecture JSON and test again.
3) Stop once it passes 400 samples OR after 2 revisions.

Remember: JSON only.
"""

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": user},
]

final_messages = run_with_tools(MODEL, messages, tools=tools, temperature=0.2)

# Print the last assistant message (should be the final conjecture JSON)
for m in reversed(final_messages):
    if m.get("role") == "assistant" and m.get("content"):
        print(m["content"])
        break


{
  "name": "Connected Graphs with High Density and Triangles",
  "expr": "is_connected and density > 0.5 and avg_degree >= 3 implies num_triangles > 0",
  "intuition": "If a graph is connected, has a density greater than 0.5, and has an average degree of at least 3, it is likely to contain at least one triangle."
}


Now let's verify against the entire dataset, and print our final conjecture

In [19]:
def extract_json_object(text: str) -> str:
    """Best-effort extraction of a top-level JSON object from a model reply."""
    text = text.strip()
    start = text.find("{")
    end = text.rfind("}")
    if start != -1 and end != -1 and end > start:
        return text[start:end+1]
    return text

# Parse the final conjecture returned by the tool-calling loop
final_text = None
for m in reversed(final_messages):
    if m.get("role") == "assistant" and m.get("content"):
        final_text = m["content"]
        break

if final_text is None:
    raise RuntimeError("No assistant message content found.")

final_conj = parse_conjecture_json(extract_json_object(final_text))

print("\nFinal conjecture (after tool-calling refinement):")
print("--------------------------------------------------")
print("Name:", final_conj.name)
print("Expr:", final_conj.expr)
print("Intuition:", final_conj.intuition)

print("\nRe-checking final conjecture on fresh samples...")
result = check_conjecture_tool(final_conj, num_graphs=400, n_min=4, n_max=9)
print(result)



Final conjecture (after tool-calling refinement):
--------------------------------------------------
Name: Connected Graphs with High Density and Triangles
Expr: is_connected and density > 0.5 and avg_degree >= 3 implies num_triangles > 0
Intuition: If a graph is connected, has a density greater than 0.5, and has an average degree of at least 3, it is likely to contain at least one triangle.

Re-checking final conjecture on fresh samples...
{'ok': True, 'checked': 400}


## (40–50 min) Simple Agent Chaining (Generate → Check → Refine)

---


We can view what we just did as a tiny "agent graph":

- **Proposer**: writes a conjecture (structured JSON)
- **Checker**: runs `check_conjecture`
- **Refiner**: updates conjecture based on counterexamples

Below is a lightweight, explicit version that makes the state visible.

In [20]:
def propose_conjecture_from_data(df: pd.DataFrame) -> Conjecture:
    prompt = prompt_template_graph_conjecture(df, ALLOWED_VARS, max_rows=12)
    messages = [
        {"role": "system", "content": "You are a careful research assistant. Follow instructions exactly."},
        {"role": "user", "content": prompt},
    ]
    raw = chat(MODEL, messages, temperature=0.7)
    return parse_conjecture_json(raw)


def refine_conjecture(conjecture: Conjecture, tool_result: dict) -> Conjecture:
    messages = [
        {"role": "system", "content": "You refine conjectures based on counterexamples. Output JSON only."},
        {"role": "user", "content": f"""Conjecture:
{json.dumps(conjecture.to_dict(), indent=2)}

Tool result:
{json.dumps(tool_result, indent=2)}

Revise the conjecture so it avoids this failure. Output JSON only with keys name, expr, intuition.
"""},
    ]
    raw = chat(MODEL, messages, temperature=0.3)
    return parse_conjecture_json(raw)

In [21]:
from utils._utils import format_conjecture, print_agent

def run_conversation_loop(df, max_steps: int = 3, num_graphs: int = 400, n_min: int = 4, n_max: int = 9) -> Conjecture:
    # Proposer
    conj = propose_conjecture_from_data(df)
    print_agent("PROPOSER", format_conjecture(conj))

    for step in range(max_steps):
        # Checker
        tool_res = check_conjecture_tool(conj, num_graphs=num_graphs, n_min=n_min, n_max=n_max)

        if tool_res.get("ok"):
            checked = tool_res.get("checked", num_graphs)
            print_agent("CHECKER", f"Passed on {checked} random graphs (n in [{n_min}, {n_max}]).")
            return conj

        ce = tool_res.get("counterexample", {})
        edges = tool_res.get("edges", [])
        err = tool_res.get("error", None)

        checker_msg = []
        checker_msg.append(f"Found a counterexample at refinement step {step}.")
        if err:
            checker_msg.append(f"\nError: {err}")
        checker_msg.append("\nCounterexample invariants:")
        for k in sorted(ce.keys()):
            checker_msg.append(f"  {k}: {ce[k]}")
        if edges:
            checker_msg.append("\nEdge list:")
            checker_msg.append(f"  {edges}")
        print_agent("CHECKER", "\n".join(checker_msg))

        # Refiner
        conj = refine_conjecture(conj, tool_res)
        print_agent("REFINER", format_conjecture(conj))

    print_agent("CHECKER", f"Stopped after {max_steps} steps without passing. Returning latest conjecture.")
    return conj

# Run once (prints the conversation)
final_conj = run_conversation_loop(df, max_steps=5, num_graphs=400, n_min=4, n_max=9)


PROPOSER
Name: connected_graphs_with_high_density_have_more_triangles
Expr: is_connected and density > 0.5 implies num_triangles > 2
Intuition: Connected graphs with a density greater than 0.5 tend to have more than two triangles.


CHECKER
Found a counterexample at refinement step 0.

Counterexample invariants:
  avg_degree: 2.6666666666666665
  density: 0.5333333333333333
  diameter: 2
  is_connected: True
  m: 8
  max_degree: 4
  n: 6
  num_triangles: 1

Edge list:
  [(0, 1), (0, 2), (0, 3), (0, 4), (1, 4), (2, 5), (3, 5), (4, 5)]


REFINER
Name: connected_graphs_with_high_density_have_more_triangles
Expr: is_connected and density > 0.5 implies num_triangles >= 1
Intuition: Connected graphs with a density greater than 0.5 tend to have at least one triangle.


CHECKER
Found a counterexample at refinement step 1.

Counterexample invariants:
  avg_degree: 2.6666666666666665
  density: 0.5333333333333333
  diameter: 3
  is_connected: True
  m: 8
  max_degree: 3
  n: 6
  num_triangles: 

In [22]:
final_conj

Conjecture(name='connected_graphs_with_high_density_have_more_triangles', expr='is_connected and density > 0.5 implies num_triangles >= 0', intuition='Connected graphs with a density greater than 0.5 tend to have at least one triangle, but it is possible to have zero triangles in certain configurations.')

### Wrap up

What we built is small, but it's the core pattern used in research workflows:

- **Prompting** gives you better hypotheses.
- **Structured outputs** make results programmable.
- **Tools** let you verify/ground the model.
- **Chaining** turns one shot generation into an iterative loop.

In resaerch settings, the "tool" can be anything:
- a counterexample search,
- a database query,
- a literature search / novelty checker
- a formal verifier.

Same pattern.
