# Task 4a - Retrieve: Has PCN-46 ever been suggested for Carbon Capture?
In this notebook, we demonstrate the ability of knowledge graph-augmented LLMs to conduct an advanced literature search. By combining the pre-existing information in our knowledge graph, we have identified a promising MOF candidate for Carbon Capture (PCN-46) based on its similarity to HKUST-1, experimentally measured CO2 uptake, and text-mined expert recommendation. Now, we will use a cross-document agent to search for additional information outside of PCN-46's original synthesis paper

In [1]:
import pandas as pd
from src.MOF_ChemUnity.Agents.QueryAgent import QueryGenerationAgent
# Connect to graph
agent = QueryGenerationAgent()
exp_data = agent.run_full_query("All experimental information for MOFs with name PCN-46")

✅ Connected to Neo4j.


Unnamed: 0,m.refcode,property_name,property_value,r.units,r.condition,r.reference,r.summary
0,LUYHAP,Density,0.618537,,,,
1,LUYHAP,Solvent-Accessible Volume,73.8,%,Calculated using the PLATON routine,10.1039/c002767g,"Calculated using the PLATON routine, PCN-46 ha..."
2,LUYHAP,Pore Volume,6.8,Å,Based on the Horvath–Kawazoe model,10.1039/c002767g,It has a uniform pore size around 6.8 A˚ based...
3,LUYHAP,Surface Area,2500.0,m2 g-1,Based on the N2 sorption isotherm,10.1039/c002767g,"Based on the N2 sorption isotherm, PCN-46 has ..."
4,LUYHAP,CO2 Uptake,21.0,mmol g-1,Saturation excess CO2 uptake at 30 bar,10.1039/c002767g,"As can be seen in Fig. 5, the saturation exces..."
5,LUYHAP,H2 uptake,71.6,mg g-1,Total uptake at 77 K,10.1039/c002767g,"Table 1 Ligand length, porosity and H2 uptake ..."


In [4]:
app_data = agent.run_full_query("All applications for MOFs with name PCN-46, and the justifications for them. Include reference")

Unnamed: 0,m.refcode,application,r.justification,reference
0,LUYHAP,CO2 Capture,The capture and sequestration of CO2 is consid...,10.1039/c002767g
1,LUYHAP,H2 Capture,The high porosity and stable framework make PC...,10.1039/c002767g
2,LUYHAP,CH4 Capture,Methane is another candidate as an on-board fu...,10.1039/c002767g


In [5]:
# Create a unified 'justification' column
exp_data["justification"] = exp_data["r.summary"].fillna("")  # default to empty if NaN
app_data["justification"] = app_data["r.justification"].fillna("")

# Merge the two DataFrames
PCN_data = pd.merge(exp_data.drop(columns=["r.summary"]), app_data.drop(columns=["r.justification"]), on="m.refcode", how="outer")

# Optional: fill any empty justifications from one side with the other
PCN_data["justification"] = PCN_data["justification_x"].fillna("") + PCN_data["justification_y"].fillna("")
PCN_data = PCN_data.drop(columns=["justification_x", "justification_y"])

## Compare Responses

In [6]:
question = "Has the MOF PCN-46 been used or suggested for Carbon Capture? Why might it be suitable? Keep your answer consise but informative"

In [7]:
# Auto-format RAG prompt from a DataFrame
def generate_few_shot_prompt(data, question, max_examples=12):
    # Limit to first N examples
    subset = data.head(max_examples)

    # Start with task-agnostic intro
    lines = ["Here are some MOFs with their associated properties:"]
    for _, row in subset.iterrows():
        props = []
        for col in data.columns:
            if "refcode" in col.lower()  in col.lower():
                continue  # skip any column containing 'refcode' or 'name'
            value = row[col]
            value_str = str(value).strip()
            if value_str.startswith('[') and value_str.endswith(']'):
                value_str = value_str[1:-1]  # remove brackets
            props.append(f"{col} = {value_str}")
        lines.append("- " + ", ".join(props))

    context = "\n".join(lines)
    rag_prompt = f"{context}\n\nUse your expert MOF knowledge to make use of this data in your response. Any time you explicitly use this data in your response, include the reference in brackets beside it:\n{question.strip()}"
    return rag_prompt


# Now use the function to generate the prompt
rag_prompt = generate_few_shot_prompt(PCN_data, question)
print(rag_prompt)

Here are some MOFs with their associated properties:
- property_name = Density, property_value = 0.618537, r.units = None, r.condition = None, r.reference = None, application = CO2 Capture, reference = 10.1039/c002767g, justification = The capture and sequestration of CO2 is considered to be an effective way for the control of greenhouse gas emissions. Most of the capture processes in large scale operation nowadays are based on amine-based wet scrubbing systems, which have high energy and resource consumption.14 MOFs have proven to be good adsorbents for CO2 at ambient temperature.2i,j As can be seen in Fig. 5, the saturation excess CO2 uptake in PCN-46 is 21.0 mmol g1 (30 bar).
- property_name = Density, property_value = 0.618537, r.units = None, r.condition = None, r.reference = None, application = H2 Capture, reference = 10.1039/c002767g, justification = The high porosity and stable framework make PCN-46 a good candidate for gas storage. In 2009, the U.S. Department of Energy (DOE) 

In [8]:
from openai import OpenAI
# Initialize client (uses OPENAI_API_KEY from .env automatically)
client = OpenAI()

# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": rag_prompt}
    ],
    temperature=0.5
)

# Extract and print the response
rag_answer = response.choices[0].message.content.strip()
print(rag_answer)

Yes, the MOF PCN-46 has been suggested for carbon capture applications. It is considered suitable for this purpose due to several key properties:

1. **High Surface Area**: PCN-46 has a BET surface area of 2500 m²/g, which provides a large surface for CO2 adsorption [reference: 10.1039/c002767g].

2. **Solvent-Accessible Volume**: It has a solvent-accessible volume of 73.8%, indicating a high porosity that facilitates gas diffusion and storage within the framework [reference: 10.1039/c002767g].

3. **Pore Volume**: The uniform pore size of around 6.8 Å, based on the Horvath–Kawazoe model, is conducive to capturing CO2 molecules effectively [reference: 10.1039/c002767g].

These characteristics, combined with the stability of the framework, make PCN-46 an effective candidate for CO2 capture, as it can achieve a saturation excess CO2 uptake of 21.0 mmol/g at 30 bar [reference: 10.1039/c002767g]. This performance is advantageous for mitigating greenhouse gas emissions, especially compared 

### Vanilla GPT-4o

In [12]:
# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": question}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
vanilla_answer = response.choices[0].message.content.strip()
print(vanilla_answer)

PCN-46, a type of metal-organic framework (MOF), has indeed been explored for carbon capture applications. MOFs like PCN-46 are considered suitable for carbon capture due to their high surface area, tunable pore sizes, and functionalizable structures, which allow for efficient adsorption of CO2 molecules. Specifically, the design of PCN-46 enables it to provide a high density of adsorption sites, which can enhance its CO2 uptake capacity. Additionally, the thermal and chemical stability of PCN-46 can contribute to its effectiveness and durability in practical carbon capture processes. These properties make it a promising candidate for capturing and storing CO2, a critical step in reducing greenhouse gas emissions.


# Task 4b - Retrieve: Is ULMOF-5 water stable?

In [9]:
# Example MOF chosen as  ULMOF-5
ULMOF_5= agent.run_full_query("All experimental data for all MOFs named ULMOF-5")

Unnamed: 0,m.refcode,property_name,property_value,property_units,property_condition,property_reference,property_summary
0,QUQGAL,Space Group,Pbcn,,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
1,QUQGAL,Cell Volume,2507.1(9),Å³,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
2,QUQGAL,Formula Weight,252.08,,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
3,QUQGAL,Density,0.948381,g/cm³,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
4,QUQGAL,Light Absorption Coefficient,0.105,mm⁻¹,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
5,QUQGAL,Chemical Formula,C10H10N2O5Li2,,,10.1021/cg100449z,Crystallographic Data and Structural Refinemen...
6,QUQGAL,Surface Area,25.06,m²/g,After removal of coordinated DMF molecules,10.1021/cg100449z,The surface area of the desolvated compound1sh...
7,QUQGAL,H2 uptake,0.1,wt %,77 K and 1 atm pressure,10.1021/cg100449z,The successive H2and CO2adsorption too shows v...
8,QUQGAL,Water Stability,Unstable,,,10.1021/cg100449z,Both compounds 1 and 1a are soluble in water a...


In [10]:
question = "Is ULMOF-5 water stable? Justify your answer with chemistry reasoning, explaining why it may or may not be stable. Please keep your response concise but informative"

In [14]:
# Auto-format RAG prompt from a DataFrame
def generate_few_shot_prompt(data, question, max_examples=12):
    # Limit to first N examples
    subset = data.head(max_examples)

    # Start with task-agnostic intro
    lines = ["Here are some MOFs with their associated properties:"]
    for _, row in subset.iterrows():
        props = []
        for col in data.columns:
            if "refcode" in col.lower() in col.lower():
                continue  # skip any column containing 'refcode' or 'name'
            value = row[col]
            value_str = str(value).strip()
            if value_str.startswith('[') and value_str.endswith(']'):
                value_str = value_str[1:-1]  # remove brackets
            props.append(f"{col} = {value_str}")
        lines.append("- " + ", ".join(props))

    context = "\n".join(lines)
    rag_prompt = f"{context}\n\nUse your expert MOF knowledge to make use of this data in your response. Any time you explicitly use this data in your response, include the reference/source in brackets beside it:\n{question.strip()}"
    return rag_prompt


# Now use the function to generate the prompt
rag_prompt = generate_few_shot_prompt(ULMOF_5, question)
print(rag_prompt)

Here are some MOFs with their associated properties:
- property_name = Space Group, property_value = Pbcn, property_units = nan, property_condition = nan, property_reference = 10.1021/cg100449z, property_summary = Crystallographic Data and Structural Refinement Details of ULMOF-5 space group Pbcn
- property_name = Cell Volume, property_value = 2507.1(9), property_units = Å³, property_condition = nan, property_reference = 10.1021/cg100449z, property_summary = Crystallographic Data and Structural Refinement Details of ULMOF-5 volume 2507.1(9)
- property_name = Formula Weight, property_value = 252.08, property_units = nan, property_condition = nan, property_reference = 10.1021/cg100449z, property_summary = Crystallographic Data and Structural Refinement Details of ULMOF-5 formula weight 252.08
- property_name = Density, property_value = 0.948381, property_units = g/cm³, property_condition = nan, property_reference = 10.1021/cg100449z, property_summary = Crystallographic Data and Structura

In [12]:
from openai import OpenAI
# Initialize client (uses OPENAI_API_KEY from .env automatically)
client = OpenAI()

# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": rag_prompt}
    ],
    temperature=0.5
)

# Extract and print the response
rag_answer = response.choices[0].message.content.strip()
print(rag_answer)

ULMOF-5 is not water stable, as indicated by its property of being "Unstable" in water ([10.1021/cg100449z]). The lack of water stability can be attributed to several factors intrinsic to the chemistry of Metal-Organic Frameworks (MOFs).

1. **Solubility of Components**: The summary indicates that ULMOF-5 is soluble in water and methanol ([10.1021/cg100449z]). This suggests that the bonds between the metal ions and the organic linkers in the framework are not strong enough to withstand hydrolysis or solvation by water molecules.

2. **Metal-Ligand Bond Strength**: ULMOF-5 contains lithium ions (Li) as part of its chemical formula, C10H10N2O5Li2 ([10.1021/cg100449z]). Lithium forms relatively weaker coordination bonds compared to transition metals, making the framework more susceptible to hydrolysis in aqueous environments.

3. **Framework Porosity and Surface Area**: The low surface area of 25.06 m²/g after desolvation ([10.1021/cg100449z]) implies limited framework robustness and pote

In [13]:
# Send the prompt to GPT-4
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a materials science expert."},
        {"role": "user", "content": question}
    ],
    temperature=0.5  # slightly higher to allow some reasoning flexibility
)

# Extract and print the response
vanilla_answer = response.choices[0].message.content.strip()
print(vanilla_answer)

ULMOF-5, a type of ultralight metal-organic framework, typically consists of metal ions or clusters coordinated to organic ligands. The water stability of such frameworks depends on several factors, including the nature of the metal-ligand bonds, the hydrophobicity of the organic ligands, and the overall structural integrity of the framework.

1. **Metal-Ligand Bonds**: If ULMOF-5 uses metal ions that form strong, stable bonds with the ligands (such as high-valent metals like Zr or Ti), it is more likely to withstand hydrolysis. Conversely, frameworks with metals that form weaker bonds (like certain divalent metals) might be more prone to hydrolysis in water.

2. **Ligand Hydrophobicity**: The organic ligands' hydrophobic or hydrophilic nature also plays a critical role. Ligands that are hydrophobic can help shield the metal centers from water, enhancing stability. If ULMOF-5 contains hydrophilic ligands, water can more easily penetrate the structure and potentially disrupt it.

3. **F