<a href="https://colab.research.google.com/github/MehrdadJalali-AI/GraphMOF-AI/blob/main/MOFGalaxyNet_LLM_Neo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 AI-Powered MOF Discovery with MOFGalaxyNet + LLM + Neo4j

## 🔹 Problem Statement
Researchers in **materials science** need to find **optimal MOFs** for **gas storage, carbon capture, and industrial applications**. However, predicting **adsorption properties, stability, and applications** is challenging.

This system uses **MOFGalaxyNet** to:
✅ Store MOFs in **Neo4j** as a graph database  
✅ Train **Graph Neural Networks (GNNs)** to learn MOF embeddings  
✅ Use **GPT-4 (LLM)** to provide AI-driven MOF insights  
✅ Predict **missing properties** and **suggest applications**  

---

## 🔹 Step 1: Storing MOFs in Neo4j

Each **MOF** is stored as a **node** in **Neo4j**, with **edges** showing structural similarities.

### 📌 Example MOF Data
| MOF      | Adsorption | Stability |
|----------|------------|-----------|
| MOF-5    | 0.80       | 0.90      |
| UiO-66   | 0.75       | 0.95      |
| HKUST-1  | 0.85       | 0.80      |

**Example Neo4j Query to Find Similar MOFs**
```cypher
MATCH (m:MOF)-[:SIMILAR_TO]-(n:MOF) RETURN m.name, n.name, m.adsorption, m.stability

**Expected Result**

MOF-5 → UiO-66 (Adsorption: 0.80, Stability: 0.90)
UiO-66 → ZIF-8 (Adsorption: 0.75, Stability: 0.95)
HKUST-1 → MIL-101 (Adsorption: 0.85, Stability: 0.80)




## 🔹 Step 2: Learning MOF Representations with GNN

A **Graph Neural Network (GNN)** learns **MOF embeddings** from the adjacency matrix.

### 🔹 How It Works:
1️⃣ Each MOF is represented as a **node** in the graph.  
2️⃣ **Edges** connect MOFs based on **structural similarity**.  
3️⃣ **GNN processes the graph** to generate **MOF embeddings**.  
4️⃣ These embeddings **capture hidden patterns** in MOF relationships.  

---

### 📌 Example MOF Adjacency Matrix
| MOF      | MOF-5 | UiO-66 | HKUST-1 | MIL-101 | ZIF-8 |
|----------|-------|--------|---------|---------|-------|
| MOF-5    | 1     | 1      | 1       | 0       | 0     |
| UiO-66   | 1     | 1      | 0       | 0       | 1     |
| HKUST-1  | 1     | 0      | 1       | 1       | 0     |
| MIL-101  | 0     | 0      | 1       | 1       | 0     |
| ZIF-8    | 0     | 1      | 0       | 0       | 1     |

📌 **This adjacency matrix represents MOFs and their connectivity.**  

---

### 🔹 Example GNN Output (MOF Embeddings)


MOF-5 → [0.14, -0.25, 0.78, 0.62] UiO-66 → [0.18, -0.21, 0.74, 0.60] HKUST-1 → [0.12, -0.29, 0.80, 0.58] MIL-101 → [0.15, -0.30, 0.79, 0.61] ZIF-8 → [0.20, -0.22, 0.76, 0.63]

📌 **These embeddings capture MOF relationships** and can be used for:  
✅ **Predicting missing MOF properties**  
✅ **Finding similar MOFs**  
✅ **Providing insights for MOF applications**  

---

### 🔹 Key Takeaways
- **GNN extracts features from the MOF network** to create meaningful representations.  
- **Similar MOFs will have similar embeddings**, even if some properties are unknown.  
- **These embeddings are later used by LLM (GPT-4) for predicting MOF applications.**  


In [None]:
!pip install neo4j torch torch_geometric networkx openai

## 🔹 Step 3: Predicting MOF Properties Using GNN

A **Graph Neural Network (GNN)** can be used to **predict missing MOF properties**, such as **adsorption capacity and stability**, based on learned embeddings.

### 🔹 How It Works:
1️⃣ **Train the GNN** on known MOF properties.  
2️⃣ **Use the trained model** to predict missing values for new MOFs.  
3️⃣ **Find similar MOFs** with existing data and use them for inference.  
4️⃣ **Store predictions** for further AI-driven insights.  

---

### 📌 Example Scenario:
A **new MOF** (**NewMOF-X**) is discovered, but its **stability value is missing**.  
Using **GNN**, we can predict the missing stability based on learned patterns.

#### 🔹 Input Features (Known & Unknown Values)


NewMOF-X → Adsorption: 0.82, Stability: ??? (To be predicted)

In [None]:
import torch
import networkx as nx
import openai
from neo4j import GraphDatabase
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
import numpy as np


---

### 🔹 Example Predicted Properties for Multiple MOFs
| MOF        | Adsorption | Stability (Predicted) |
|------------|------------|-----------------------|
| MOF-5      | 0.80       | 0.90                  |
| UiO-66     | 0.75       | 0.95                  |
| HKUST-1    | 0.85       | 0.80                  |
| NewMOF-X   | 0.82       | **0.87 (Predicted)**  |

📌 **Now, we have a complete dataset with inferred MOF properties!**  

---

### 🔹 Why is This Useful?
✅ **Predicts missing adsorption or stability values for new MOFs**  
✅ **Helps researchers understand potential MOF performance**  
✅ **Can be extended to predict adsorption for specific gases (CO₂, H₂, CH₄, etc.)**  
✅ **Supports AI-driven discovery of novel MOF applications**  

---

### 🔹 Next Step
Now that we have **predicted MOF properties**, we can use **GPT-4 (LLM)** to generate insights about possible applications of these MOFs.

📌 **Proceed to Step 4 to see how AI-powered insights help predict MOF applications!**


In [None]:
# Connect to Neo4j (Replace with your credentials)
NEO4J_URI = "bolt://localhost:7687"
USERNAME = "neo4j"
PASSWORD = "your_password"
driver = GraphDatabase.driver(NEO4J_URI, auth=(USERNAME, PASSWORD))

# Function to create MOF Graph
def create_mof_graph(tx):
    mof_data = [
        ("MOF-5", 0.8, 0.9),
        ("UiO-66", 0.75, 0.95),
        ("HKUST-1", 0.85, 0.8),
        ("MIL-101", 0.9, 0.88),
        ("ZIF-8", 0.7, 0.92)
    ]

    for mof, adsorption, stability in mof_data:
        tx.run("""
        MERGE (m:MOF {name: $name})
        SET m.adsorption = $adsorption, m.stability = $stability
        """, name=mof, adsorption=adsorption, stability=stability)

    edges = [("MOF-5", "UiO-66"), ("MOF-5", "HKUST-1"), ("HKUST-1", "MIL-101"), ("UiO-66", "ZIF-8")]
    for mof1, mof2 in edges:
        tx.run("""
        MATCH (m1:MOF {name: $mof1}), (m2:MOF {name: $mof2})
        MERGE (m1)-[:SIMILAR_TO]->(m2)
        """, mof1=mof1, mof2=mof2)

with driver.session() as session:
    session.write_transaction(create_mof_graph)
print("✅ MOF Graph Created in Neo4j!")

## 🔹 Step 4: Using GPT-4 (LLM) for AI-Powered MOF Insights

Once we have **MOF embeddings** and **predicted properties**, we can use **GPT-4 (LLM)** to **analyze and suggest potential applications** for MOFs.

### 🔹 How It Works:
1️⃣ **GNN learns MOF embeddings** based on graph relationships.  
2️⃣ **Missing properties are predicted** using AI-based inference.  
3️⃣ **GPT-4 receives MOF embeddings & properties** as input.  
4️⃣ **GPT-4 generates AI-powered insights** about MOF applications.  

---

### 📌 Example Scenario:
A **new MOF (NewMOF-X)** was predicted to have **adsorption: 0.82** and **stability: 0.87**.  
Now, we ask **GPT-4** to suggest real-world applications based on these properties.

#### 🔹 Example Input to GPT-4


In [None]:
def create_mof_networkx():
    G = nx.Graph()
    mof_nodes = ["MOF-5", "UiO-66", "HKUST-1", "MIL-101", "ZIF-8"]
    for node in mof_nodes:
        G.add_node(node, adsorption=torch.rand(1).item(), stability=torch.rand(1).item())
    edges = [("MOF-5", "UiO-66"), ("MOF-5", "HKUST-1"), ("HKUST-1", "MIL-101"), ("UiO-66", "ZIF-8")]
    G.add_edges_from(edges)
    return G

def graph_to_pyg(G):
    node_mapping = {node: i for i, node in enumerate(G.nodes)}
    edge_index = torch.tensor([[node_mapping[u], node_mapping[v]] for u, v in G.edges], dtype=torch.long).t().contiguous()
    features = torch.tensor([[G.nodes[node]["adsorption"], G.nodes[node]["stability"]] for node in G.nodes], dtype=torch.float)
    return Data(x=features, edge_index=edge_index)


---

### 🔹 Example AI-Generated Insights for Multiple MOFs
| MOF        | Adsorption | Stability | Predicted Application          |
|------------|------------|-----------|--------------------------------|
| MOF-5      | 0.80       | 0.90      | Carbon capture                 |
| UiO-66     | 0.75       | 0.95      | Hydrogen storage               |
| HKUST-1    | 0.85       | 0.80      | Water adsorption               |
| NewMOF-X   | 0.82       | 0.87      | **Hydrogen storage, CO₂ capture** |

📌 **Now, we have AI-powered insights on MOF applications!**  

---

### 🔹 Why is This Useful?
✅ **AI provides new MOF applications without extensive lab testing**  
✅ **Researchers can focus on promising MOFs based on AI predictions**  
✅ **Speeds up MOF discovery for real-world energy and environmental applications**  

---

### 🔹 Next Step
Now that we have **AI-generated MOF applications**, we can **store these insights in Neo4j** for easy **searching and querying**.

📌 **Proceed to Step 5 to store and query AI-powered MOF predictions in Neo4j!**


In [None]:
class MOFGraphGNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(MOFGraphGNN, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

def train_gnn(data):
    model = MOFGraphGNN(in_channels=2, hidden_channels=16, out_channels=4)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    loss_fn = torch.nn.MSELoss()
    for epoch in range(200):
        optimizer.zero_grad()
        out = model(data.x, data.edge_index)
        loss = loss_fn(out, data.x)
        loss.backward()
        optimizer.step()
    return model, out

G = create_mof_networkx()
data = graph_to_pyg(G)
model, embeddings = train_gnn(data)
print("✅ GNN Training Complete! Extracted Embeddings:", embeddings.detach().numpy())

## 🔹 Step 5: Storing Predictions in Neo4j

Now that we have **predicted MOF properties and AI-generated applications**, we will **store them in Neo4j** to enable easy **searching, querying, and retrieval**.

---

### 🔹 How It Works:
1️⃣ **MOFs are stored as nodes in Neo4j** with their adsorption, stability, and AI-predicted applications.  
2️⃣ **GNN-predicted values** are added for new MOFs.  
3️⃣ **GPT-4 insights** on MOF applications are stored.  
4️⃣ **Neo4j queries** can now retrieve MOFs based on adsorption, stability, or applications.  

---

### 📌 Example: Store Predicted MOF Properties & Applications in Neo4j

We use the **Cypher query language** to insert and update MOF data.

```cypher
MERGE (m:MOF {name: 'NewMOF-X'})
SET m.adsorption = 0.82,
    m.stability = 0.87,
    m.application = 'Hydrogen Storage, CO₂ Capture'


In [None]:
openai.api_key = "YOUR_OPENAI_API_KEY"

def llm_mof_insights(embeddings, mof_names):
    insights = {}
    for i, mof in enumerate(mof_names):
        embedding_str = ", ".join([f"{v:.4f}" for v in embeddings[i].tolist()])
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": f"Insights on {mof} with embedding {embedding_str}"}]
        )
        insights[mof] = response["choices"][0]["message"]["content"]
    return insights

mof_names = list(G.nodes)
insights = llm_mof_insights(embeddings.detach().numpy(), mof_names)
print(insights)

### 📌 Example: Query MOFs for Hydrogen Storage

Once the data is in **Neo4j**, we can easily retrieve **MOFs suited for hydrogen storage**.

#### 🔹 **Cypher Query to Retrieve MOFs for Hydrogen Storage**
```cypher
MATCH (m:MOF)
WHERE m.application CONTAINS "Hydrogen Storage"
RETURN m.name, m.adsorption, m.stability, m.application


#### 🔹 **Expected Query Output**
| MOF        | Adsorption | Stability | Application                     |
|------------|------------|-----------|---------------------------------|
| UiO-66     | 0.75       | 0.95      | Hydrogen Storage               |
| NewMOF-X   | 0.82       | 0.87      | Hydrogen Storage, CO₂ Capture  |

📌 **Now, researchers can query MOFs for specific applications instantly!**  
