# TRIZ Agents — Quickstart

This notebook shows how to:
1. install project requirements,
2. load configuration from `.env`,
3. run the multi-agent **TRIZ workflow** on the gantry-crane problem,
4. save experiment results to JSON/Markdown, and
5. evaluate the solution with **multi-judge** LLM evaluators.

> Repo modules used here: `triz_agents.graph`, `triz_agents.experiments`, `triz_agents.evaluation`.

## 1. Install dependencies

Make sure you are in the project root (`triz-agents-repo/`) and run:

In [1]:
# !pip install -r requirements.txt
# !pip install -e .

## 2. Configure environment variables

The repo expects your API keys via `.env`. Copy the template and edit it:

```bash
cp .env.example .env
```

Fill in your keys inside `.env`:

```
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
```

We’ll use [python-dotenv](https://pypi.org/project/python-dotenv/) to load them.

In [2]:
from dotenv import load_dotenv
import os, sys, platform

load_dotenv()  # reads .env

print("Python:", sys.version.split()[0], "| Platform:", platform.system())
for key in ["OPENAI_API_KEY","GOOGLE_API_KEY","XAI_API_KEY", "TAVILY_API_KEY"]: # DEEPSEEK_API_KEY, TAVILY_API_KEY
    print(f"{key} set? ", "✅" if os.getenv(key) else "⚠️ missing")


Python: 3.10.11 | Platform: Windows
OPENAI_API_KEY set?  ✅
GOOGLE_API_KEY set?  ✅
XAI_API_KEY set?  ✅
TAVILY_API_KEY set?  ✅


## 3) Build the TRIZ Agent Graph


In [3]:
from IPython.display import Image, display

from triz_agents.graph import create_workflow_app

app = create_workflow_app(llm_model="gpt-4o")
try:
    png_bytes = app.get_graph(xray=True).draw_mermaid_png()
    display(Image(png_bytes))
except Exception as e:
    print("Graph visualization unavailable in this environment:", e)


  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
Graph visualization unavailable in this environment: Failed to reach https://mermaid.ink/ API while trying to render your graph. Status code: 502.

To resolve this issue:
1. Check your internet connection and try again
2. Try with higher re

## 4) Define the problem and choose models
Change `AGENT_MODEL` to the model you want to run the workflow with.
Adjust the judges (evaluators) as you like.

In [4]:
from triz_agents.llm import ModelName

GANTRY_CRANE_PROBLEM = (
    "Solve the following problem: Gantry cranes find extensive application across various "
    "industries, employed to move hefty loads and dangerous substances within shipping docks, "
    "building sites, steel plants, storage facilities, and similar industrial settings. "
    "The crane should move the load fast without causing any unnecessary excessive swing at "
    "the final position. Moreover, gantry cranes which always lift excessive load may result "
    "in sudden stop of the crane. The crane operators' attempt to lift heavier loads at a "
    "faster pace has led to recurrent malfunctions, including overheating, and the increased "
    "speed has caused excessive swinging or swaying of the lifted load, posing a safety hazard."
)

AGENT_MODEL: ModelName | str = "gpt-4o"   # e.g., "gemini-1.5-pro", "grok-3-mini", "deepseek-chat"
EVALUATOR_MODELS: list[str] = ["o1", "grok-3"]  # judges, "gemini-2.5-pro"

THREAD_ID = "gantry-quickstart-01"
TEMPERATURE = 0.0
RECURSION_LIMIT = 250
OUT_DIR = "experiments"  # where JSON results are saved

## 5) Run the TRIZ workflow and save results

In [None]:
from langchain.schema import HumanMessage

config = {"configurable": {"thread_id": "2"}, "recursion_limit": 150}
input_message = HumanMessage(content=GANTRY_CRANE_PROBLEM)

app = create_workflow_app(AGENT_MODEL)

events = app.stream({"messages": [input_message]}, config, stream_mode="values")

for s in events:

    s["messages"][-1].pretty_print()


Solve the following problem: Gantry cranes find extensive application across various industries, employed to move hefty loads and dangerous substances within shipping docks, building sites, steel plants, storage facilities, and similar industrial settings. The crane should move the load fast without causing any unnecessary excessive swing at the final position. Moreover, gantry cranes which always lift excessive load may result in sudden stop of the crane. The crane operators' attempt to lift heavier loads at a faster pace has led to recurrent malfunctions, including overheating, and the increased speed has caused excessive swinging or swaying of the lifted load, posing a safety hazard.
Name: ProjectManager

Let's begin by defining the Engineering System for our gantry crane problem. MechanicalEngineer, could you start by listing the main components of the gantry crane system and any external factors that might influence its operation? Your insights will help us understand the mechani

This executes the full multi-agent workflow and stores a JSON file with the steps. It also writes a Markdown artifact for evaluation pipelines that expect `.md`.

In [None]:
from datetime import datetime
from pathlib import Path
import json

from triz_agents.experiments import run_experiment

result = run_experiment(
    model_name=AGENT_MODEL,
    thread_id=THREAD_ID,
    prompt=GANTRY_CRANE_PROBLEM,
    temperature=TEMPERATURE,
    recursion_limit=RECURSION_LIMIT,
    out_dir=OUT_DIR,
    download=False,  # set True in Colab to auto-download
)

print("Steps:", result["num_steps"])
result_path = next(Path(OUT_DIR).glob(f"{AGENT_MODEL.replace('/', '_')}_thread-{THREAD_ID}_*.json"))
print("Saved:", result_path)

# Create a Markdown artifact from steps
ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
md_name = f"{AGENT_MODEL.replace('/', '_')}_gantry_{ts}.md"
md_path = Path(md_name)

with md_path.open("w", encoding="utf-8") as f:
    f.write(f"# TRIZ Agents Run — {AGENT_MODEL}\n\n")
    for step in result.get("steps", []):
        f.write(f"## Step {step['index']} — {step['author']}\n\n")
        f.write(step['content'].strip() + "\n\n")

print("Markdown artifact:", md_path)

--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
--- ⚙️  Instantiating model: gpt-4o ---
[experiments] Results saved to experiments\gpt-4o_thread-gantry-quickstart-01_20250923-095039.json
Steps: 6
Saved: experiments\gpt-4o_thread-gantry-quickstart-01_20250923-095039.json
Markdown artifact: gpt-4o_gantry_20250923-095039.md


## 6) Evaluate the solution with multiple judges
We evaluate the generated solution using several models as judges.  

In [5]:
from pathlib import Path
from triz_agents.evaluation import run_multi_judge_evaluation

# Read the Markdown artifact as the prediction text
md_name = f"gpt-4o_gantry_20250923-095039.md"
md_path = Path(md_name)
prediction_text = Path(md_path).read_text(encoding="utf-8")

evaluation = run_multi_judge_evaluation(
    evaluator_models=EVALUATOR_MODELS,
    input_problem=GANTRY_CRANE_PROBLEM,
    prediction=prediction_text,
    metrics=["expert_solution", "clarity","coherence","coverage","novelty","feasibility","triz_adherence"],  # add: "clarity","coherence","coverage","novelty","feasibility","triz_adherence"
)

import pprint
pprint.pprint(evaluation)

--- ⚙️  Instantiating model: o1 ---
  --- Evaluating with Judge: o1 ---


  return LLMChain(llm=llm, prompt=prompt_template, output_parser=_parser)


--- ⚙️  Instantiating model: grok-3 ---
  --- Evaluating with Judge: grok-3 ---
{'clarity': {'average_score': np.float64(7.5),
             'individual_judge_evals': [{'critique': '1) Unclear phrasing '
                                                     'example: "Inverted '
                                                     'Figure-of-Eight Wire '
                                                     'Rope Systems." While '
                                                     'specialized, it can be '
                                                     'confusing to readers '
                                                     'unfamiliar with '
                                                     'crane-specific '
                                                     'mechanical setups. 2) '
                                                     'Suggested rewrite: "A '
                                                     'specialized rope '
                                        

In [6]:
evaluation

{'expert_solution': {'average_score': np.float64(7.0),
  'individual_judge_evals': [{'score': 7,
    'reasoning': '1) Checklist: The proposed solution partially covers sway control by mentioning advanced anti-sway systems (e.g., predictive algorithms), but it does not adopt Sliding Mode Control or input shaping as in the benchmark. Overheating is addressed through cooling systems and monitoring, but it does not specify self-cleaning filters, sealed bearings, or porous materials. Protection against overload or short circuit is only generally mentioned (via sensors and safety features) and does not match the benchmark’s approach of using an intelligent circuit breaker with microcontroller. 2) Technical Accuracy: While the proposed dynamic anti-sway and feedback-controlled cooling approaches are valid, they do not precisely align with the benchmark’s specified methods. 3) Missing Approach: For overload/short circuit, the AI outlined general safety and monitoring features rather than the e

## 7) (Optional) Save the evaluation report

In [None]:
report_dir = Path("evaluations")
report_dir.mkdir(exist_ok=True)

eval_file = report_dir / f"eval_{AGENT_MODEL.replace('/', '_')}_{THREAD_ID}.json"
with eval_file.open("w", encoding="utf-8") as f:
    json.dump(evaluation, f, indent=2, ensure_ascii=False)

print("Evaluation saved to:", eval_file)