# Task 2 – Inferencing: Linker Length vs. Pore Size

In this notebook, we test whether a knowledge graph-augmented LLM can infer scientific relationships between structural features and material properties.

The LLM is provided with real examples retrieved from the knowledge graph — specifically, MOFs with known linker lengths and pore sizes. Rather than memorizing trends, the LLM is prompted to reason from these examples and infer the relationship between linker length and pore size. This setup showcases the ability of KG-augmented models to support grounded scientific intuition.

## Accessing the KG

In [1]:
from src.MOF_ChemUnity.Agents.QueryAgent import QueryGenerationAgent
import pandas as pd

In [10]:
# Connect to graph
agent = QueryGenerationAgent()
data = agent.run_full_query("List 30 MOFs with their names (every name delimited by <|>), smiles linker, pore size, and water stability. Only include MOFs with a value for each field")

✅ Successfully connected to the Neo4j database.


Unnamed: 0,m.refcode,mof_names,smiles_linker,pore_size,water_stability
0,CELZIE,3<|>CAU-10-OCH3,['COc1cc(cc(c1)C(=O)[O-])C(=O)[O-]'],2.5,Stable
1,ESIGES,NTHU-2,"['[O-]C(=O)c1ccc(cc1)C(=O)[O-]', '[O]P(=O)(O)[...","1.36, 21.8",Unstable
2,YATDUV,NH4@ZnPzC<|>NH4Â·Zn3OHÂ­(PzC)3,"['[O-]C(=O)C1=CN=N[CH]1', '[O-]C(=O)C1=C[N]N=C1']",1,Unstable
3,QOVDEL,Ni-STA-12<|>St Andrews nanoporous material-12,['[O]P(=O)([C]N1[C][C]N([C][C]1)[C]P(=O)([O])[...,1,Stable
4,QOVDAH,Ni-STA-12 (hydrated form),['[O]P(=O)([C]N1[C][C]N([C][C]1)[C]P(=O)([O])[...,1,Stable
5,SISZEC,[Cd3(C8H3SO7)2(C10H8N4)3(C3H7NO)2](C3H7NO)2Â·(...,['[O-]C(=O)c1cc(cc(c1)S([O])([O])[O])C(=O)[O-]...,5.0 × 8.0,Stable
6,OJAWII,Fe-ICR-6<|>Fe-ICR-2<|>Fe-ICR-4<|>Fe-ICR-7,['[O]P(=O)(c1ccccc1)c1ccc(cc1)c1ccc(cc1)P(=O)(...,2.16,Unstable
7,KETHEY01,compound 1<|>[Cd(pyip)(dmf)],['[O-]C(=O)c1cc(cc(c1)c1ccncc1)C(=O)[O-]'],8.2 x 8.2,Stable
8,CIKCOQ,compound 2<|>[Cd(pyip)(doa)],['[O-]C(=O)c1cc(cc(c1)c1ccncc1)C(=O)[O-]'],"8.0 × 11.3, 7.3 × 11.3, 7.3 × 10.2, 7.3 × 5.1",Stable
9,GEDTAM,UCY-2,['[O-]C(=O)c1ccc(cc1)[CH][N]c1cc(cc(c1)C(=O)[O...,3-4,Stable


## Responses

First, here is the question both LLM's will be seeing

In [None]:
question = "How does pore diameter and linker length affect the water stability of MOFs?"

### Graph-RAG Enabled GPT-4o

In [12]:
# Auto-format RAG prompt from a DataFrame
def generate_few_shot_prompt(data, question, max_examples=12):
    # Limit to first N examples
    subset = data.head(max_examples)

    # Start with task-agnostic intro
    lines = ["Here are some MOFs with their associated properties:"]

    for _, row in subset.iterrows():
        props = []
        for col in data.columns:
            if "refcode" in col.lower():
                continue  # skip any column containing 'refcode'
            value = row[col]
            value_str = str(value).strip()
            if value_str.startswith('[') and value_str.endswith(']'):
                value_str = value_str[1:-1]  # remove brackets
            props.append(f"{col} = {value_str}")
        lines.append("- " + ", ".join(props))

    context = "\n".join(lines)
    rag_prompt = f"{context}\n\nUse the examples above to help answer the following question:\n{question.strip()}"
    return rag_prompt


# Now use the function to generate the prompt
rag_prompt = generate_few_shot_prompt(data, question)
print(rag_prompt)

Here are some MOFs with their associated properties:
- mof_names = 3<|>CAU-10-OCH3, smiles_linker = 'COc1cc(cc(c1)C(=O)[O-])C(=O)[O-]', pore_size = 2.5, water_stability = Stable
- mof_names = NTHU-2, smiles_linker = '[O-]C(=O)c1ccc(cc1)C(=O)[O-]', '[O]P(=O)(O)[O]', pore_size = 1.36, 21.8, water_stability = Unstable
- mof_names = NH4@ZnPzC<|>NH4Â·Zn3OHÂ­(PzC)3, smiles_linker = '[O-]C(=O)C1=CN=N[CH]1', '[O-]C(=O)C1=C[N]N=C1', pore_size = 1, water_stability = Unstable
- mof_names = Ni-STA-12<|>St Andrews nanoporous material-12, smiles_linker = '[O]P(=O)([C]N1[C][C]N([C][C]1)[C]P(=O)([O])[O])[O]', pore_size = 1, water_stability = Stable
- mof_names = Ni-STA-12 (hydrated form), smiles_linker = '[O]P(=O)([C]N1[C][C]N([C][C]1)[C]P(=O)([O])[O])[O]', pore_size = 1, water_stability = Stable
- mof_names = [Cd3(C8H3SO7)2(C10H8N4)3(C3H7NO)2](C3H7NO)2Â·(CH3OH)4<|>Compound I, smiles_linker = '[O-]C(=O)c1cc(cc(c1)S([O])([O])[O])C(=O)[O-]', 'n1ccc(cc1)[N][N]c1ccncc1', pore_size = 5.0 × 8.0, water_stabi

In [13]:
from openai import OpenAI
# Initialize client (uses OPENAI_API_KEY from .env automatically)
client = OpenAI()

# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": rag_prompt}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
rag_answer = response.choices[0].message.content.strip()
print(rag_answer)


The water stability of Metal-Organic Frameworks (MOFs) is influenced by several factors, including pore diameter and linker length, as seen in the examples provided. Here's an analysis based on these factors:

1. **Pore Diameter**:
   - **Smaller Pores**: MOFs with smaller pore diameters, such as NTHU-2 (1.36 Å), NH4@ZnPzC (1 Å), and Ni-STA-12 (1 Å), tend to be less stable in water. This may be due to the higher surface area to volume ratio, which can lead to increased interaction with water molecules, potentially destabilizing the framework.
   - **Larger Pores**: In contrast, MOFs with larger pores, such as compound 1 (8.2 × 8.2 Å) and compound 2 (8.0 × 11.3 Å, etc.), are noted to be stable in water. Larger pores may facilitate better diffusion and less interaction per unit area with water, reducing the risk of framework collapse.

2. **Linker Length and Structure**:
   - **Shorter Linkers**: Shorter and simpler linkers, such as those seen in unstable MOFs like NTHU-2 and Fe-ICR-6, m

### Vanilla GPT-4o

In [14]:
# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": question}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
vanilla_answer = response.choices[0].message.content.strip()
print(vanilla_answer)

The water stability of metal-organic frameworks (MOFs) is a critical consideration for their practical applications, particularly in environments where moisture is present. The pore diameter and linker length are two structural parameters that can significantly influence the water stability of MOFs.

1. **Pore Diameter:**

   - **Larger Pore Diameters:** MOFs with larger pore diameters typically have a higher surface area exposed to water molecules. This can lead to increased hydrolytic degradation if the metal-ligand bonds are not inherently stable against water. Larger pores can also facilitate the ingress of water molecules, potentially accelerating the breakdown of the framework if it is not robust enough.

   - **Smaller Pore Diameters:** Smaller pore diameters can limit the access of water molecules to the interior of the framework, potentially enhancing stability. However, this is highly dependent on the nature of the metal-ligand bonds and the overall framework stability. If th