### 🧠 Guidance

1. Please replace the "YOUR_API_KEY" with your own api-key
2. In the tot-sequence of prompts, you can choose to turn on and off some of the steps. 
3. We use openai==1.68.2

In [1]:
import pandas as pd
import re
from openai import OpenAI
client = OpenAI(api_key = "YOUR_API_KEY")

# A collection of prompts for different modules
def instruction_prompts(module_name):
    if module_name == "Setting the rules for evolutionary tree of thought":
        prompts = ["""
                Imagine three different experts are answering this question. 
                All experts will write down 1 step of their thinking, then share it with the group, the group will do the following operations: 
                1. figure out the key inspirations and strategies, by creating a list of them; 
                2. revisiting the list, using operations such as mutation, crossover to generate new inspirations and strategies; 
                3. then each expert will have an updated hypothesis, however, this hypothesis might not be rationale or might be impossible; 
                therefore, 4. their new hypotheses should be checked before going to the next step, if valid, accept, if not valid, try to generate a new one, until it becomes valid. 
                Sometimes, an objective, if exist could be used to help the three experts. After this round of hypothesis generation, then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they will point it out and correct it. 
                
                Following the above action instruction, now the three experts are solving the problem: Generate three novel and valid hypotheses to improve the performance of thermoelectric materials or design entirely new ones based on a research background and literature-inspired insights. 
                For the proposed thermoelectric materials, the potential objective is the figure of merit ZT values. This ZT values could also be used by the experts as one factor to help refine their hypotheses. 
                
                After the experts generated the hypotheses, for each hypothesis, it has to pass the two stages checking. 
                Stage 1 is Decision Tree workflow: 
                [Step 1] Does the hypothesis follow known chemical and physical rules?]  NO → REJECT (Not physically meaningful)  YES → Step 2; 
                [Step 2] Does it introduce a novel thermoelectric strategy?  NO → MODIFY (Refine concept)  YES → Step 3;  
                [Step 3] Is it computationally testable (DFT, MD, phonon transport)?  NO → MODIFY (Improve predictive validation)  YES → Step 4  
                [Step 4] Is the material synthetically feasible?  NO → MODIFY (Propose better synthesis route)  YES → go to Step 5  
                [Step 5] Does it show high potential for improving ZT?  NO → REJECT (Low thermoelectric impact)  YES → go to Step 6  
                [Step 6] Does it have a reasonable risk-reward balance?  NO → MODIFY (Address high risks)  YES → ACCEPT for further computational and experimental testing  
                
                Stage 2 is the scoring framework: Once a hypothesis passes the decision tree, it is ranked numerically based on five key evaluation criteria.
                | Criteria     | Description                                              | Score Range |
                |--------------|----------------------------------------------------------|-------------|
                | Validness    | Consistency with physical & chemical principles          | 1-10        |
                | Novelty      | Innovation compared to known thermoelectrics             | 1-10        |
                | Significance | Impact on ZT or new mechanisms                           | 1-10        |
                | Feasibility  | Ease of computational modeling and synthesis             | 1-10        |
                | Risk         | Likelihood of failure due to instability or complexity   | 1-10        |

                **Total Score Calculation**  
                Total Score=Validness+Novelty+Significance+Feasibility-Risk
                Interpretation of Scores:
                45-50: Top priority hypothesis for immediate testing
                35-44: Promising but requires some refinements
                25-34: Moderate, needs major improvements before testing
                <25: Rejected or needs complete reformulation
                """]
    elif module_name == "check_evolution":
        prompts = ["""
                   Now the three experts need to campare the hypothesis before and after the Mutation and Crossover. They need to check if they have better hypotheses after these operations. 
                   If yes, we accept them, if not, new hypotheses need to be formed.
                   """
                   ]
    elif module_name == "generating hypothesis with chemical formulas":
        prompts = ["""
                   Based on the above inspirations, the three experts are proposing new thermoelectric materials that haven't been reported before step by step. I need to know the composition of them for example their chemical formulas. They should validate your options and comment on the feasibility of the proposed materials. If you are asked to proposed a new material and try to write down the composition or chemical formula of it. Bear in mind, how many chemical rules should we check in order to make sure that the proposed new materials is chemically feasible or chemically allowed and what are the steps to follow? You also need to consider that materials have a wide range of branches, such as layered materials, high entropy alloys or other organic inorganic compounded materials, therefore, the chemical rules for these materials should also be considered. 
                   """]
    elif module_name == "novelty evaluation":
        prompts = ["""
                   Let's have a double check on the hypothesis. Are they entirely new? If not, we need to propose new ones. Are they better over the other methods? If not, we need to use a new strategy. After checking, we also need to present the predicted ZT values and working temperatures of them. 
                   """]              
    elif module_name == "context guidance":
        prompts = ["""
                   Actually, the experts are given some background research informations to help them think: How can new material design strategies enhance the thermoelectric efficiency of emerging materials beyond traditional semiconductors? Context: Thermoelectric efficiency is governed by ZT = (S²σT)/κ. Traditional materials (e.g., Bi₂Te₃, PbTe) rely on scarce or toxic elements. Alternatives must optimise electronic transport properties, phonon engineering, and nanostructuring. Conventional approaches focus on band engineering, phonon-glass electron-crystal (PGEC) concepts, and alloy disorder, but breakthrough materials require novel strategies. 
                   So based on these, the experts start a second round of problem solving.
                   """]
    elif module_name == "compare_hypotheses":
        prompts = ["""
                   Now, at this stage, let's compare and double check that the three experts are getting optimised hypotheses or not by looking at: 1. compare current hypotheses with the three hypotheses from first round of discussion; 2. a comparative analysis of the three experts’ hypotheses. If both are true, we can accept these hypotheses. If not, the experts need to find new ones by starting a new round of problem solving.
                   """]
    elif module_name == "summarising_hypothesis":
        prompts = ["""
                   Now the three experts can generate a hypothesis with an example template: "We hypothesise that [strategy] can enhance thermoelectric efficiency via [mechanism], inspired by [insight]. This will be tested by [method], however the risk lies in [Risk]." 
                   And summarise and organise the hypotheses in the table contains the columns with Material, Formula,	Structure Type, Hypothesis,Predicted ZT, Operating Temperature (K), extend the table with evaluation Rate 1-10 for: 1. **Validity:** Consistent with known physics? 2. **Novelty:** Unconventional concept? 3. **Significance:** Meaningful advance? 4. **Feasibility:** Experimentally/computationally testable? 5. **Risk:** potential risk of being failure? Similar to the lone pair electrons in perovskite materials, which makes them standing out in photovoltaic materials and contributing to the defect tolerance; what are the killer properties/effects, that you think will make these hypothesized thermoelectric materials stand out? Include these "killer effects" in the table.
                   Refine hypothesis accordingly.
                   """]
    elif module_name == "removing_low_feasibility_hypotheses":
        prompts = ["""
                   Now the three experts are thinking to test the hypothesis in practice. If the feasibility is low and risk is high, the three experts need to filter the current hypothesis to keep the good ones. and because we need three hypotheses, now the three experts need to find out another hypothesis by another round of problem solving. Bear in mind to check the hypothesis checking workflow and keep the experts discussing following the procedures.
                   """]
    elif module_name == "zoom_into_low_temperature_hypotheses":
        prompts = ["""
                   Now the three experts are zooming into another challenge for thermoelectrics to find the thermoelectric materials working especially well at low-temperature (lower than 600 K) a lot, will you still give the same recommended hypothesis, or you will have different ones? Give me your updated summary of hypotheses. Bear in mind to check the hypothesis checking workflow and keep the experts discussing following the procedures.
                   """]
    elif module_name == "summarising_all_hypotheses":
        prompts = ["""
                   Now summarise all the hypotheses proposed from the very beginning to current ones, so I can easily see the trends and what has been changed. Give me the clean Markdown TABLE (styled: | col | col |) that contains the columns with Material, Formula,   Structure Type, Hypothesis,Predicted ZT, Operating Temperature (K), extend the table with evaluation Rate 1-10 for: 1. **Validity:** Consistent with known physics? 2. **Novelty:** Unconventional concept? 3. **Significance:** Meaningful advance? 4. **Feasibility:** Experimentally/computationally testable? 5. **Risk:** potential risk of being failure? Similar to the lone pair electrons in perovskite materials, which makes them standing out in photovoltaic materials and contributing to the defect tolerance; what are the killer properties/effects, that you think will make these hypothesized thermoelectric materials stand out? Include these "killer effects" in the table; finally label if the hypothesis is kept or discarded.
                   """]
    elif module_name == "one-shot":
        prompts = ["""
                   What is the next promising thermoelectric material, that no one has never reported before? I'd like to have a try. What is the composition of it? Is this material never reported before? Why you propose this one? What is your inspiration? 
                   """]
    else:
        raise NotImplementedError
    
    return prompts

# Call Openai API,k input is prompt, output is response
# model: by default is gpt4, can also use gpt4
import time

def llm_generation(prompt, model_name, client=client, temperature=1.0):
    # Define the model mapping
    model_map = {
        "gpt-4o": "gpt-4o",  # OpenAI's GPT-4 Omni
        "o1": "o1",
        "o3-mini": "o3-mini"  # Your custom name, if using a special fine-tuned variant
    }

    # Get actual model string for API
    model = model_map.get(model_name)
    if model is None:
        raise ValueError(f"Unsupported model name: {model_name}")

    # Attempt generation with retry on rate limit
    while True:
        try:
            completion = client.chat.completions.create(
                model=model,
                temperature=temperature,
                #max_tokens=1500,
                messages=[
                    {"role": "user", "content": prompt}
                ]
            )
            return completion.choices[0].message.content
        except Exception as e:
            print("OpenAI rate limit or error occurred:", e)
            time.sleep(0.5)

def extract_table_from_output(text):
    """
    Extracts the first markdown-style table from text and returns it as a pandas DataFrame.
    """
    # Match everything between table header and rows
    table_match = re.search(r"(\|.*?\|\n)(\|[-| ]+\|\n)((\|.*?\|\n?)+)", text)
    
    if not table_match:
        raise ValueError("No markdown-style table found in the output.")

    header_line = table_match.group(1)
    separator_line = table_match.group(2)
    data_lines = table_match.group(3)

    # Split headers
    headers = [h.strip() for h in header_line.strip().split('|') if h.strip()]
    
    # Split each row into columns
    rows = []
    for line in data_lines.strip().split('\n'):
        cols = [c.strip() for c in line.strip().split('|') if c.strip()]
        if len(cols) == len(headers):
            rows.append(cols)

    df = pd.DataFrame(rows, columns=headers)
    return df

def run_sequential_prompt_pipeline(module_names, model_name="gpt-4o", temperature=1.0):
    context = ""  # Start with empty context
    filename = model_name + "_log"
    with open(filename, "w", encoding="utf-8") as f:
        for module_name in module_names:
            print(f"\n🧠 Running module: {module_name}")
            module_prompt = instruction_prompts(module_name)[0]

            # Combine previous context and current prompt
            full_prompt = context + "\n\n" + module_prompt

            # Generate response
            response = llm_generation(
                prompt=full_prompt,
                model_name=model_name,
                temperature=temperature,
                client=client
            )

            # Optionally: print or store intermediate results
                        # Save to file
            f.write(f"===== {module_name} =====\n")
            f.write(response + "\n\n")
            #print(f"✅ Response from {module_name}:\n{response[5:]}...\n")

            # Update context with response for next module
            context = response

    return context  # Final output from the last module
# prompt = instruction_prompts("Setting the rules for evolutionary tree of thought")[0]

# response = llm_generation(
#     prompt=prompt,
#     model_name="gpt-4o",  # or "gpt-4o-mini"
#     temperature=1.0,
#     client=client
# )

# print(response)


In [2]:
tot_sequence = [
    "Setting the rules for evolutionary tree of thought",
    "check_evolution",
    "generating hypothesis with chemical formulas",
    "novelty evaluation",
    #"context guidance",
    "compare_hypotheses",
    #"summarising_hypothesis",
    #"removing_low_feasibility_hypotheses",
    #"zoom_into_low_temperature_hypotheses",
    "summarising_all_hypotheses"
] # The sequence of modules to run, you can turn on/off any module

oneshot = [
    "one-shot"
]

final_output = run_sequential_prompt_pipeline(tot_sequence, model_name="o3-mini", temperature=1.0)

print("\n🚀 Final Output from the entire pipeline:\n")
print(final_output)



🧠 Running module: Setting the rules for evolutionary tree of thought

🧠 Running module: check_evolution

🧠 Running module: generating hypothesis with chemical formulas

🧠 Running module: novelty evaluation

🧠 Running module: compare_hypotheses

🧠 Running module: summarising_all_hypotheses

🚀 Final Output from the entire pipeline:

Below is a summary table capturing all the hypotheses—from the very first expert proposals to the current (optimized) ones. The table lists material systems, the key idea behind each hypothesis, their predicted thermoelectric performance (ZT) with operating temperature ranges, evaluations (on a 1–10 scale), the “killer effects” that make them stand out, and a final status indicating whether the hypothesis was kept (as state‑of‑the‑art) or discarded in favor of improved designs.

| Material                                   | Formula Example                         | Structure Type                         | Hypothesis                                          

In [3]:
## Extract table from the final output and save it to a CSV file for the tot_sequence prompt
import pandas as pd
import re
from io import StringIO

def extract_markdown_table(text):
    """
    Extract a markdown table from text and return it as a pandas DataFrame.
    """
    # Find all tables that match the markdown pattern
    tables = re.findall(r"((\|.+?\|\n)+)", text)

    if not tables:
        raise ValueError("⚠️ No markdown-style table found.")

    # Take the first match
    table_text = tables[0][0]

    # Clean up extra formatting (remove leading/trailing spaces)
    table_text = "\n".join(line.strip() for line in table_text.strip().splitlines())

    # Use pandas to read the markdown table
    df = pd.read_csv(StringIO(table_text), sep="|", engine='python', skipinitialspace=True)

    # Remove unnamed empty columns that result from separators
    df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
    return df

# Usage: assuming final_output is your model response string
try:
    df = extract_markdown_table(final_output)
    df.to_csv("hypotheses_summary_o3.csv", index=False)
    print("✅ Table extracted and saved to 'hypotheses_summary.csv'")
except Exception as e:
    print("❌ Failed to extract table:", e)


✅ Table extracted and saved to 'hypotheses_summary.csv'
