<a href="https://colab.research.google.com/github/MehrdadJalali-AI/MOF_LENS/blob/main/MOF_Lens_ProofConcept.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MOF Optimization with Lotus Effect Algorithm
This notebook loads a MOF dataset, processes it, and applies a custom optimization algorithm to find the best MOFs for doxorubicin delivery.


# 🧪 Proof of Concept: MOF-LENS — Optimizing MOFs for Doxorubicin Delivery

## 📘 Introduction

The development of targeted drug delivery systems using Metal-Organic Frameworks (MOFs) has gained increasing attention in computational materials science. MOFs offer tunable pore sizes, surface area, and functional groups, making them ideal candidates for hosting drug molecules such as doxorubicin (DOX).

This notebook demonstrates an end-to-end pipeline, **MOF-LENS** (MOF-Learning and Evolution for Novel Screening), that integrates domain-specific filtering, cheminformatics, and a novel optimization algorithm inspired by the *Lotus Effect* to discover optimal MOFs for doxorubicin encapsulation and release.

---

## 🎯 Problem Statement

The goal is to select MOFs that are:

- ✅ Structurally compatible with DOX (porosity, void fraction, coordination)
- ✅ Chemically similar to DOX (based on SMILES fingerprinting)
- ✅ Safe and stable (non-toxic metals and pH resilience)
- ✅ Equipped with beneficial functional groups like **–NH₂**

Traditional rule-based filtering or random search is not scalable. A biologically inspired, intelligent method is needed to balance **exploration** and **exploitation** of the MOF space.

---

## 🔍 Data Preprocessing

- Filters out MOFs containing toxic metals (e.g., Pb, Cd, Hg)
- Validates and sanitizes `linker_smile` entries using RDKit
- Filters entries based on **pore limiting diameter (PLD)** in the 10–20 Å range
- Computes:
  - Normalized physicochemical features
  - Morgan fingerprints (radius=2, 256-bit)
  - Binary indicator for **–NH₂** presence

---

## 🧬 Chemical Similarity

Each MOF linker and DOX are encoded using Morgan fingerprints. A **hybrid distance metric** combines:

- 🧱 Euclidean distance in normalized feature space  
- 🔬 Tanimoto distance between molecular fingerprints

This allows identifying candidates that are structurally and chemically aligned with DOX.

---

## 🌿 Lotus Effect Algorithm (LEA)

Inspired by the **natural self-cleaning and selective attraction** properties of lotus leaves.

### Key Concepts:

- Uses **Lévy flight-based mutation** to explore MOF feature space
- Maps each candidate back to its nearest real MOF (using k-NN)
- Fitness is a weighted function of:
  - Physicochemical structure
  - Tanimoto similarity to DOX
  - pH stability (based on metal type)
  - Toxicity penalty
  - NH₂ presence bonus
- Avoids duplicate MOFs using a refcode tracker

---

## 🏆 Output

The algorithm returns the **top 5 MOFs** for potential DOX delivery with:

- Refcode
- Fitness Score
- Chemical Similarity to DOX

These candidates represent an optimal balance of structure, chemistry, and biomedical relevance.

---

## ✅ Conclusion

This proof of concept demonstrates the effectiveness of:

- RDKit-driven preprocessing and chemical reasoning
- A lotus-inspired optimizer for exploring high-dimensional MOF data
- Hybrid filtering of physicochemical and fingerprint-based similarity

### 🔭 Next Steps:

- Include **experimental binding affinities**
- Apply to other drug molecules (e.g., paclitaxel)
- Extend with **generative models** (e.g., GANs, VAEs)
- Perform **multi-objective optimization**

MOF-LENS paves the way for smart, interpretable, and scalable MOF design in drug delivery and materials informatics.


In [2]:
# Install RDKit (only needed if not already available)
!pip install rdkit -q


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.2/35.2 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import numpy as np
import pandas as pd
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors
from scipy.special import gamma
import random


In [4]:
def preprocess_data(df, target_pld_range=(10, 20)):
    column_mapping = {
        'asa (A^2)': 'asa',
        'pld (A)': 'pld',
        'void_fraction': 'void_fraction',
        'max_metal_coordination_n': 'max_metal_coordination_n',
        'n_sbu_point_of_extension': 'n_sbu_point_of_extension',
        'n_channel': 'n_channel',
        'metals': 'metals',
        'linker_smile': 'linker_smile',
        'Refcode': 'Refcode'
    }
    df = df.rename(columns=column_mapping)
    df = df[~df['metals'].isin(['Pb', 'Cd', 'Cr', 'Ni', 'Hg'])]

    if len(df) > 10000:
        df = df.sample(n=10000, random_state=42)

    def sanitize(smiles):
        if not isinstance(smiles, str):
            return None
        try:
            mol = Chem.MolFromSmiles(smiles, sanitize=False)
            if mol:
                Chem.SanitizeMol(mol, catchErrors=True)
                return Chem.MolToSmiles(mol)
        except:
            return None
        return smiles

    df['linker_smile'] = df['linker_smile'].apply(sanitize)
    df = df[df['linker_smile'].notna()]
    df = df[(df['pld'] >= target_pld_range[0]) & (df['pld'] <= target_pld_range[1])]

    def nh2_check(s):
        try:
            mol = Chem.MolFromSmiles(s)
            return int(mol.HasSubstructMatch(Chem.MolFromSmarts('[NH2]'))) if mol else 0
        except:
            return 0

    df['NH2_Present'] = df['linker_smile'].apply(nh2_check)

    scaler = MinMaxScaler()
    num_cols = ['void_fraction', 'asa', 'pld', 'max_metal_coordination_n', 'n_sbu_point_of_extension', 'n_channel']
    df[num_cols] = scaler.fit_transform(df[num_cols].fillna(0))

    morgan_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=256)
    def fingerprint(smiles):
        try:
            mol = Chem.MolFromSmiles(smiles)
            return np.array(morgan_gen.GetFingerprint(mol)) if mol else np.zeros(256)
        except:
            return np.zeros(256)

    df['Fingerprint'] = df['linker_smile'].apply(fingerprint)
    return df, scaler


In [5]:
def compute_fitness(features, fingerprint, dox_fp, metal, nh2):
    vf, asa, pld, coord, sbu, n_channel = features
    struct_score = 0.20 * vf + 0.15 * asa + 0.30 * pld + 0.10 * coord + 0.05 * sbu + 0.05 * n_channel

    fp1 = fingerprint.astype(bool)
    fp2 = dox_fp.astype(bool)
    tanimoto = np.sum(fp1 & fp2) / np.sum(fp1 | fp2) if np.sum(fp1 | fp2) else 0

    stability = {'Zr': 0.2, 'Zn': 0.7, 'Fe': 0.3, 'Co': 0.4, 'In': 0.5, 'Cu': 0.6, 'Gd': 0.3, 'Al': 0.2, 'Mn': 0.4}
    toxicity = {'Zr': 0.1, 'Zn': 0.2, 'Fe': 0.15, 'Co': 0.25, 'In': 0.1, 'Cu': 0.2, 'Gd': 0.15, 'Al': 0.1, 'Mn': 0.15}
    pH_stability = 1 - stability.get(metal, 0.5)
    tox = toxicity.get(metal, 0.2)

    return 0.6 * struct_score + 0.20 * tanimoto + 0.15 * pH_stability + 0.05 * nh2 - 0.05 * tox

def levy_flight(n, beta=1.5):
    sigma = (gamma(1 + beta) * np.sin(np.pi * beta / 2) /
             (gamma((1 + beta) / 2) * beta * 2 ** ((beta - 1) / 2))) ** (1 / beta)
    u = np.random.normal(0, sigma, n)
    v = np.random.normal(0, 1, n)
    return u / np.abs(v) ** (1 / beta)

def lotus_effect_algorithm(df, dox_fp, pop_size=20, max_iter=50):
    num_cols = ['void_fraction', 'asa', 'pld', 'max_metal_coordination_n', 'n_sbu_point_of_extension', 'n_channel']
    X = df[num_cols].values
    fingerprints = df['Fingerprint'].values
    metals = df['metals'].values
    nh2_flags = df['NH2_Present'].values

    knn = NearestNeighbors(n_neighbors=1).fit(X)
    population = np.random.uniform(0, 1, (pop_size, len(num_cols)))

    best_solutions = []
    best_fitness = -np.inf
    seen_refcodes = set()

    for _ in range(max_iter):
        for i in range(pop_size):
            step = 0.5 * levy_flight(len(num_cols))
            population[i] = np.clip(population[i] + step, 0, 1)

            idx = knn.kneighbors([population[i]])[1][0][0]
            features = X[idx]
            fp = fingerprints[idx]
            metal = metals[idx]
            nh2 = nh2_flags[idx]
            refcode = df.iloc[idx]['Refcode']

            fitness = compute_fitness(features, fp, dox_fp, metal, nh2)
            if refcode in seen_refcodes:
                fitness *= 0.5

            if len(best_solutions) < 5:
                best_solutions.append((refcode, fitness, fp))
                seen_refcodes.add(refcode)
            elif fitness > min(s[1] for s in best_solutions):
                worst = min(range(len(best_solutions)), key=lambda i: best_solutions[i][1])
                if refcode not in [s[0] for s in best_solutions]:
                    best_solutions[worst] = (refcode, fitness, fp)
                    seen_refcodes.add(refcode)

            if fitness > best_fitness:
                best_fitness = fitness

    best_solutions = sorted(best_solutions, key=lambda x: x[1], reverse=True)[:5]
    return pd.DataFrame({
        'Refcode': [s[0] for s in best_solutions],
        'Fitness_Score': [s[1] for s in best_solutions],
        'Chemical_Similarity': [
            np.sum(s[2].astype(bool) & dox_fp.astype(bool)) / np.sum(s[2].astype(bool) | dox_fp.astype(bool))
            for s in best_solutions
        ]
    })


In [None]:
# Upload CSV
from google.colab import files
uploaded = files.upload()

# Run main pipeline
df = pd.read_csv('MOF.csv')
df, scaler = preprocess_data(df)

# Doxorubicin SMILES
dox_smiles = 'CC1=C(C(C2=C(C1=O)C(=O)C3=C(C=CC(=C3C2=O)O)O)O)C4CC(C(C(O4)CO)O)NC(=O)CO'
dox_mol = Chem.MolFromSmiles(dox_smiles)
morgan_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=256)
dox_fp = np.array(morgan_gen.GetFingerprint(dox_mol))

# Run optimizer
results = lotus_effect_algorithm(df, dox_fp)
print("Top MOFs for DOX delivery:")
print(results)
