# Task 2a – Inferencing: Linker Length and Pore Size vs Water Stability

In this notebook, we test whether a knowledge graph-augmented LLM can infer scientific relationships between structural features and material properties.

The LLM is provided with real examples retrieved from the knowledge graph — specifically, MOFs with known linker lengths and pore sizes. Rather than memorizing trends, the LLM is prompted to reason from these examples and infer the relationship between linker length and pore size. This setup showcases the ability of KG-augmented models to support grounded scientific intuition.

## Accessing the KG

In [1]:
from src.MOF_ChemUnity.Agents.QueryAgent import QueryGenerationAgent
import pandas as pd

In [3]:
# Connect to graph
agent = QueryGenerationAgent()
data = agent.run_full_query("List 30 MOFs with their names (every name delimited by <|>), computatational smiles linker, pore size, and water stability. Only include MOFs with a value for each field. Include references for all properties")

✅ Connected to Neo4j.




Unnamed: 0,m.refcode,mof_names,smiles_linker,pore_size,pore_size_reference,water_stability,water_stability_reference
0,CELZIE,3<|>CAU-10-OCH3,5-methoxyisophthalic acid,2.5,10.1021/cm3025445,Stable,10.1021/cm3025445
1,ESIGES,NTHU-2,"C13N2H14 (4,4‘-trimethylene dipyridine, tmdp),...","1.36, 21.8",10.1021/ja0390146,Unstable,10.1021/ja0390146
2,YATDUV,NH4@ZnPzC<|>NH4Â·Zn3OHÂ­(PzC)3,H2PzC (1H-pyrazole-4-carboxylic acid),1,10.1021/acs.cgd.7b00346,Unstable,10.1021/acs.cgd.7b00346
3,QOVDEL,Ni-STA-12<|>St Andrews nanoporous material-12,"N,N′-piperazinebismethylenephosphonic acid",1,10.1021/ja804936z,Stable,10.1021/ja804936z
4,QOVDAH,Ni-STA-12 (hydrated form),"N,N′-piperazinebismethylenephosphonic acid",1,10.1021/ja804936z,Stable,10.1021/ja804936z
5,SISZEC,[Cd3(C8H3SO7)2(C10H8N4)3(C3H7NO)2](C3H7NO)2Â·(...,"ABPY (4,4′-azopyridine), NaSIPA (sodium salt o...",5.0 × 8.0,10.1021/acs.cgd.8b01327,Stable,10.1021/acs.cgd.8b01327
6,OJAWII,Fe-ICR-6<|>Fe-ICR-2<|>Fe-ICR-4<|>Fe-ICR-7,H2BBP(Me),2.16,10.1021/acs.inorgchem.0c00201,Unstable,10.1021/acs.inorgchem.0c00201
7,KETHEY01,compound 1<|>[Cd(pyip)(dmf)],H2pyip (5-(pyridine-4-yl)isophthalic acid),8.2 x 8.2,10.1021/cg400486q,Stable,10.1021/cg400486q
8,CIKCOQ,compound 2<|>[Cd(pyip)(doa)],H2pyip (5-(pyridine-4-yl)isophthalic acid),"8.0 × 11.3, 7.3 × 11.3, 7.3 × 10.2, 7.3 × 5.1",10.1021/cg400486q,Stable,10.1021/cg400486q
9,GEDTAM,UCY-2,H3CIP (5-(4-carboxybenzylideneamino)isophthali...,3-4,10.1021/ic3005085,Stable,10.1021/ic3005085


In [4]:
# References for all properties are the same, so we can merge them into one column
# Drop the 'pore_size_reference' column
data = data.drop(columns=['pore_size_reference'])

# Rename 'water_stability_reference' column to 'reference'
data = data.rename(columns={'water_stability_reference': 'reference'})

## Responses

First, here is the question both LLM's will be seeing

In [5]:
question = "How does pore volume and linker length affect the water stability of MOFs? Justify your answer with chemistry reasoning. Please keep your response concise but informative"

### Graph-RAG Enabled GPT-4o

In [34]:
# Auto-format RAG prompt from a DataFrame
def generate_few_shot_prompt(data, question, max_examples=15):
    # Limit to first N examples
    subset = data.head(max_examples)

    # Start with task-agnostic intro
    lines = ["Here are some MOFs with their associated properties:"]
    for _, row in subset.iterrows():
        props = []
        for col in data.columns:
            if "refcode" in col.lower() or "mof_name" in col.lower():
                continue  # skip any column containing 'refcode' or 'name'
            value = row[col]
            value_str = str(value).strip()
            if value_str.startswith('[') and value_str.endswith(']'):
                value_str = value_str[1:-1]  # remove brackets
            props.append(f"{col} = {value_str}")
        lines.append("- " + ", ".join(props))

    context = "\n".join(lines)
    rag_prompt = f"{context}\n\nUse your expert MOF knowledge to make use of this data in your response. Any time you explicitly use this data in your response, include the reference in brackets beside it. Note, the data you have access to is only visible to you \n{question.strip()}"
    return rag_prompt


# Now use the function to generate the prompt
rag_prompt = generate_few_shot_prompt(data, question)
print(rag_prompt)

Here are some MOFs with their associated properties:
- smiles_linker = 5-methoxyisophthalic acid, pore_size = 2.5, water_stability = Stable, reference = 10.1021/cm3025445
- smiles_linker = C13N2H14 (4,4‘-trimethylene dipyridine, tmdp), p-C6H4(COOH)2 (terephthalic acid, BDC), pore_size = 1.36, 21.8, water_stability = Unstable, reference = 10.1021/ja0390146
- smiles_linker = H2PzC (1H-pyrazole-4-carboxylic acid), pore_size = 1, water_stability = Unstable, reference = 10.1021/acs.cgd.7b00346
- smiles_linker = N,N′-piperazinebismethylenephosphonic acid, pore_size = 1, water_stability = Stable, reference = 10.1021/ja804936z
- smiles_linker = N,N′-piperazinebismethylenephosphonic acid, pore_size = 1, water_stability = Stable, reference = 10.1021/ja804936z
- smiles_linker = ABPY (4,4′-azopyridine), NaSIPA (sodium salt of 5-sulfoisopthalic acid), pore_size = 5.0 × 8.0, water_stability = Stable, reference = 10.1021/acs.cgd.8b01327
- smiles_linker = H2BBP(Me), pore_size = 2.16, water_stability =

In [35]:
from openai import OpenAI
# Initialize client (uses OPENAI_API_KEY from .env automatically)
client = OpenAI()

# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": rag_prompt}
    ],
    temperature=0.5
)

# Extract and print the response
rag_answer = response.choices[0].message.content.strip()
print(rag_answer)


The water stability of Metal-Organic Frameworks (MOFs) is influenced by several factors, including pore volume and linker length. Here's how these factors interplay:

1. **Pore Volume**: 
   - Larger pore volumes can sometimes lead to reduced water stability. This is because larger pores may allow more water molecules to penetrate the framework, potentially interacting with and disrupting metal-ligand bonds. For example, the MOF with a large pore size of 8.1 Å (pymo) is unstable in water [10.1021/ja002624a]. Conversely, smaller pores might restrict water access, enhancing stability, as seen in MOFs with smaller pore sizes like 1 Å (N,N′-piperazinebismethylenephosphonic acid), which are stable [10.1021/ja804936z].

2. **Linker Length**:
   - Longer linkers usually increase the flexibility of the framework, which can make it more susceptible to structural distortion upon water adsorption. This is evident in some MOFs with longer linkers, which are unstable, such as H2BBP(Me) with a pore 

### Vanilla GPT-4o

In [21]:
# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": question}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
vanilla_answer = response.choices[0].message.content.strip()
print(vanilla_answer)

The water stability of Metal-Organic Frameworks (MOFs) is influenced by both pore volume and linker length due to their impact on the structure and chemical environment of the MOF.

1. **Pore Volume:**
   - **Larger Pores:** MOFs with larger pore volumes typically have greater surface areas exposed to water, which can increase the likelihood of hydrolytic degradation. Water molecules can more easily penetrate and interact with the metal nodes, potentially leading to hydrolysis of metal-linker bonds.
   - **Smaller Pores:** Conversely, smaller pore volumes might limit water access, potentially enhancing stability by reducing the surface area exposed to water.

2. **Linker Length:**
   - **Longer Linkers:** These can increase the distance between metal nodes, potentially weakening the overall framework due to reduced structural rigidity. Longer linkers might also introduce more flexibility, allowing water molecules easier access to vulnerable sites.
   - **Shorter Linkers:** These genera

# Task 2b – Inferencing: Metal Node and Decomposition Temperature

In this notebook, we test whether a knowledge graph-augmented LLM can infer scientific relationships between structural features and material properties.

The LLM is provided with real examples retrieved from the knowledge graph — specifically, MOFs with experimentally measured decomposition temperatures and their Metal Nodes. We will use this data as few-shot context into the LLM, and compare to its vanilla (no injected contexted) response.

In [24]:
# Connect to graph
agent = QueryGenerationAgent()
data = agent.run_full_query("List 50 MOFs with their names (every name delimited by <|>), thermal stability, and metal. Only include MOFs with a value for thermal stability and metal. Include references for all properties")

✅ Connected to Neo4j.


Unnamed: 0,m.refcode,mof_names,thermal_stability,thermal_stability_reference,metal_names
0,WOCKOP,[Cd(Î¼-tp)(Î¼-bpp)(H2O)]Â·nnH2O<|>3,Stable up to 358,10.1021/cg701232n,Cd
1,CURYAQ,[CuBr2(L1)]Â·nn(MeOH)<|>complex 3,130,10.1021/cg901327m,Cu
2,UNABAN,Zn(LTP)2<|>complex 1,210,10.1021/cg1009175,Zn
3,UMUZUY,complex 3<|>Ni(LTP)2,220,10.1021/cg1009175,Ni
4,BOLHAM,"compound 1<|>âˆž3[SmCl3(1,4-Ph(CN)2)]",Decomposition at 321 and 375,10.1021/ic800635u,Sm
5,RESWUI,compound 1<|>Er2(BDC)3(DMF)2(H2O)2Â·H2O<|>1,480,10.1021/ic060568u,Er
6,RESWOC,2<|>compound 2<|>Er2(BDC-F4)3(DMF)(H2O)Â·DMF,"loses H2O, free DMF, and coordinated DMF",10.1021/ic060568u,Er
7,ECUGIU,BaAMP<|>Ba[HN(CH2PO3H)3],No weight loss up to ≈673,10.1021/ic301192y,Ba
8,RAVFIF,compound 10<|>[NH2(CH3)2][Yb(MDIP)(H2O)]Â·0.5N...,6.3,10.1021/cg201283a,Yb
9,LISBIZ,Zn5(Î¼3-OH)2(BTA)2(tp)3<|>Complex 1<|>Compound 1,467−528,10.1021/cg070356n,Zn


In [29]:
# Rename 'water_stability_reference' column to 'reference'
data = data.rename(columns={'thermal_stability_reference': 'reference'})

## Responses

In [30]:
question = "How does the selection of metal node affect the thermal stability of MOFs? Justify your answer with chemistry reasoning. Please keep your response concise!"

In [31]:
rag_prompt = generate_few_shot_prompt(data, question, max_examples=30)
print(rag_prompt)

Here are some MOFs with their associated properties:
- thermal_stability = Stable up to 358, reference = 10.1021/cg701232n, metal_names = Cd
- thermal_stability = 130, reference = 10.1021/cg901327m, metal_names = Cu
- thermal_stability = 210, reference = 10.1021/cg1009175, metal_names = Zn
- thermal_stability = 220, reference = 10.1021/cg1009175, metal_names = Ni
- thermal_stability = Decomposition at 321 and 375, reference = 10.1021/ic800635u, metal_names = Sm
- thermal_stability = 480, reference = 10.1021/ic060568u, metal_names = Er
- thermal_stability = loses H2O, free DMF, and coordinated DMF, reference = 10.1021/ic060568u, metal_names = Er
- thermal_stability = No weight loss up to ≈673, reference = 10.1021/ic301192y, metal_names = Ba
- thermal_stability = 6.3, reference = 10.1021/cg201283a, metal_names = Yb
- thermal_stability = 467−528, reference = 10.1021/cg070356n, metal_names = Zn
- thermal_stability = 519 to 548 °C, reference = 10.1021/cg070356n, metal_names = Zn
- thermal_s

In [None]:
from openai import OpenAI
# Initialize client (uses OPENAI_API_KEY from .env automatically)
client = OpenAI()

# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": rag_prompt}
    ],
    temperature=0.5
)

# Extract and print the response
rag_answer = response.choices[0].message.content.strip()
print(rag_answer)


The thermal stability of Metal-Organic Frameworks (MOFs) is significantly influenced by the choice of metal nodes due to several chemical factors:

1. **Metal-Oxygen Bond Strength**: The thermal stability of a MOF is often correlated with the strength of the metal-oxygen bonds within the structure. For instance, metals with higher oxidation states typically form stronger metal-oxygen bonds, contributing to greater thermal stability. For example, Erbium-based MOFs exhibit high thermal stability up to 480°C [10.1021/ic060568u], likely due to strong Er-O bonds.

2. **Ionic Radius and Coordination Environment**: The size and coordination geometry of the metal ions can affect thermal stability. Smaller ions with high coordination numbers can create more robust frameworks. For example, Tm-based MOFs show high thermal stability up to 560°C [10.1021/cg400531j], possibly due to the compact nature of the Tm ions and their coordination environment.

3. **Metal Cluster Stability**: Some metals for

In [20]:
# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": question}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
vanilla_answer = response.choices[0].message.content.strip()
print(vanilla_answer)

The selection of the metal node in metal-organic frameworks (MOFs) significantly impacts their thermal stability due to several key factors:

1. **Metal-Oxygen Bond Strength**: The thermal stability of MOFs is largely dictated by the strength of the metal-oxygen bonds within the framework. Transition metals with higher oxidation states, such as Zr(IV) or Al(III), tend to form stronger and more thermally stable bonds with the organic linkers compared to metals with lower oxidation states.

2. **Coordination Geometry and Connectivity**: Metals that favor higher coordination numbers and robust geometries (e.g., octahedral or tetrahedral) contribute to a more stable framework. For instance, Zr-based MOFs often exhibit high connectivity and stable frameworks due to the strong Zr-O bonds and the ability to form dense, highly coordinated structures.

3. **Metal Ionic Radius**: A smaller ionic radius can lead to denser packing and stronger interactions within the framework, enhancing thermal s