### 🧠 Guidance

1. Please replace the "YOUR_API_KEY" with your own api-key
2. In the tot-sequence of prompts, you can choose to turn on and off some of the steps. 
3. We use openai==1.68.2

In [1]:
import pandas as pd
import re
from openai import OpenAI
import time
from datetime import datetime
client = OpenAI(api_key = "YOUR_API_KEY")

# A collection of prompts for different modules
def instruction_prompts(module_name):
    if module_name == "Setting the rules for evolutionary tree of thought":
        prompts = ["""
                   Let's think step by step. 
                   Imagine three different experts are answering this question. All experts will write down 1 step of their thinking,
                   then share it with the group, Then all experts will go on to the next step, etc.
                   If any expert realises they're wrong at any point then they leave.

                   The task is to find out five best known high performance thermoelectric materials and in order to confirm they are the best ones, you need to compare these materials according to their design strategies, thermoelectric efficiencies, and other key factors that you think fit. 

                   I will give you some background information here. Thermoelectric efficiency is governed by ZT = (S²σT)/κ. Traditional materials rely on scarce or toxic elements. Alternatives must optimise electronic transport properties, phonon engineering, and nanostructuring. Conventional approaches focus on band engineering, phonon-glass electron-crystal (PGEC) concepts, and alloy disorder, but breakthrough materials require novel strategies. 
                """]
    elif module_name == "Retrieve more materials":
        prompts = ["""
                   Let's think step by step and think outside of the box. 
                   Imagine three different experts are answering this question in Round 2.
                   All experts will write down 1 step of their thinking, then share it with the group, Then all experts will go on to the next step, etc.If any expert realises they're wrong at any point then they leave.

                   The task is to find out five best known high performance thermoelectric materials and in order to confirm they are the best ones, you need to compare these materials according to their design strategies, thermoelectric efficiencies, and other key factors that you think fit. 

                   I will give you some background information here. Thermoelectric efficiency is governed by ZT = (S²σT)/κ. Traditional materials rely on scarce or toxic elements. Alternatives must optimise electronic transport properties, phonon engineering, and nanostructuring. Conventional approaches focus on band engineering, phonon-glass electron-crystal (PGEC) concepts, and alloy disorder, but breakthrough materials require novel strategies.

                   You need to exclude the materials you mentioned previously. 
                   """
                   ]
    elif module_name == "summarising_all_hypotheses":
        prompts = ["""
                   In the comprehensive table, construct a dictionary with Material/Class as the key and chemical formula(s) as the value. Then, In the comprehensive table, construct a dictionary with chemical formula(s) as the key and the highest reported ZT values for each chemical formula as the value. You might need to check the actural reported ZT highest values for them. For each of the reported value, please give the reference url where you retrieve these data. 
                   """]
    elif module_name == "one-shot":
        prompts = ["""
                   Give me the top 5 best performed thermoelectric materials, with their ZT values, use a table to present them.
                   """]
    elif module_name == "cot":
        prompts = ["""
                   Let's think step by step. Give me the top 5 best performed thermoelectric materials, with their ZT values, use a table to present them.
                   """]
    else:
        raise NotImplementedError
    
    return prompts

# Call Openai API,k input is prompt, output is response
# model: by default is gpt4, can also use gpt4
import time

def llm_generation(prompt, model_name, client=client, temperature=1.0):
    # Define the model mapping
    model_map = {
        "gpt-4o": "gpt-4o",  # OpenAI's GPT-4 Omni
        "o1": "o1",
        "o3-mini": "o3-mini"  # Your custom name, if using a special fine-tuned variant
    }

    # Get actual model string for API
    model = model_map.get(model_name)
    if model is None:
        raise ValueError(f"Unsupported model name: {model_name}")

    # Attempt generation with retry on rate limit
    while True:
        try:
            completion = client.chat.completions.create(
                model=model,
                temperature=temperature,
                #max_tokens=1500,
                messages=[
                    {"role": "user", "content": prompt}
                ]
            )
            return completion.choices[0].message.content
        except Exception as e:
            print("OpenAI rate limit or error occurred:", e)
            time.sleep(0.5)

def extract_table_from_output(text):
    """
    Extracts the first markdown-style table from text and returns it as a pandas DataFrame.
    """
    # Match everything between table header and rows
    table_match = re.search(r"(\|.*?\|\n)(\|[-| ]+\|\n)((\|.*?\|\n?)+)", text)
    
    if not table_match:
        raise ValueError("No markdown-style table found in the output.")

    header_line = table_match.group(1)
    separator_line = table_match.group(2)
    data_lines = table_match.group(3)

    # Split headers
    headers = [h.strip() for h in header_line.strip().split('|') if h.strip()]
    
    # Split each row into columns
    rows = []
    for line in data_lines.strip().split('\n'):
        cols = [c.strip() for c in line.strip().split('|') if c.strip()]
        if len(cols) == len(headers):
            rows.append(cols)

    df = pd.DataFrame(rows, columns=headers)
    return df

def run_sequential_prompt_pipeline(module_names, model_name="gpt-4o", temperature=1.0):
    context = ""  # Start with empty context
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

    # Create a log filename with timestamp
    log_filename = f"_log_{timestamp}.md"
    filename = model_name + log_filename
    with open(filename, "w", encoding="utf-8") as f:
        for module_name in module_names:
            print(f"\n🧠 Running module: {module_name}")
            module_prompt = instruction_prompts(module_name)[0]

            # Combine previous context and current prompt
            full_prompt = context + "\n\n" + module_prompt

            # Generate response
            response = llm_generation(
                prompt=full_prompt,
                model_name=model_name,
                temperature=temperature,
                client=client
            )

            # Optionally: print or store intermediate results
                        # Save to file
            f.write(f"===== {module_name} =====\n")
            f.write(response + "\n\n")
            #print(f"✅ Response from {module_name}:\n{response[5:]}...\n")

            # Update context with response for next module
            context = response

    return context  # Final output from the last module
# prompt = instruction_prompts("Setting the rules for evolutionary tree of thought")[0]

# response = llm_generation(
#     prompt=prompt,
#     model_name="gpt-4o",  # or "gpt-4o-mini"
#     temperature=1.0,
#     client=client
# )

# print(response)


In [20]:
tot_sequence = [
    "Setting the rules for evolutionary tree of thought",
    "Retrieve more materials",
    "summarising_all_hypotheses"
] # The sequence of modules to run, you can turn on/off any module

oneshot = [
    "one-shot"
]

cot = [
    "cot"
]

final_output = run_sequential_prompt_pipeline(tot_sequence, model_name="gpt-4o", temperature=1.0)

print("\n🚀 Final Output from the entire pipeline:\n")
print(final_output)



🧠 Running module: Setting the rules for evolutionary tree of thought

🧠 Running module: Retrieve more materials

🧠 Running module: summarising_all_hypotheses

🚀 Final Output from the entire pipeline:

To construct the requested dictionaries, I will first summarize the materials and their chemical formulas, and then look up the highest reported ZT values along with their sources. Here's how it can be structured:

### Dictionary Construction

#### Dictionary 1: Material/Class as Key and Chemical Formula(s) as Value

```python
materials_dict = {
    "Bismuth-Antimony Alloys": ["Bi-Sb"],
    "Topological Insulators": ["Bi₂Se₃"],
    "Calcium Cobalt Oxides": ["CaxCoO₂"]
}
```

#### Dictionary 2: Chemical Formula(s) as Key and Highest Reported ZT Values as the Value

I will now search for the highest reported ZT values and provide the references.

1. **Bi-Sb (Bismuth-Antimony Alloys)**
   - **Highest ZT Value:** Approximately 1.4.
   - **Reference:** Goldsmid, H. J. "Bismuth-Antimony Alloys

In [3]:
## Extract table from the final output and save it to a CSV file for the tot_sequence prompt
import pandas as pd
import re
from io import StringIO

def extract_markdown_table(text):
    """
    Extract a markdown table from text and return it as a pandas DataFrame.
    """
    # Find all tables that match the markdown pattern
    tables = re.findall(r"((\|.+?\|\n)+)", text)

    if not tables:
        raise ValueError("⚠️ No markdown-style table found.")

    # Take the first match
    table_text = tables[0][0]

    # Clean up extra formatting (remove leading/trailing spaces)
    table_text = "\n".join(line.strip() for line in table_text.strip().splitlines())

    # Use pandas to read the markdown table
    df = pd.read_csv(StringIO(table_text), sep="|", engine='python', skipinitialspace=True)

    # Remove unnamed empty columns that result from separators
    df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
    return df

# Usage: assuming final_output is your model response string
try:
    df = extract_markdown_table(final_output)
    df.to_csv("hypotheses_summary_o3.csv", index=False)
    print("✅ Table extracted and saved to 'hypotheses_summary.csv'")
except Exception as e:
    print("❌ Failed to extract table:", e)


✅ Table extracted and saved to 'hypotheses_summary.csv'
