# BEBOP and QM Descriptors for Deblocking Temperatures

This notebook calculates electronic descriptors for predicting thermoplastic polyurethane (TPU) deblocking temperatures using bond energy (BEBOP) and quantum mechanical (QM) methods.

**Reference:**  
Remsha Rafiq, Ramakrishna Suresh, Barbaro Zulueta, Jessica Gondak, Hannah Zucco, Jason E. Shoemaker, John A. Keith, Goetz Veser. *Bond Energy Descriptors Enable Machine Learning with Limited Data: Design of Thermoplastic Polyurethane Recycling Agents* **2025** (*to be submitted*)

**Data:** Gaussian output files available on Zenodo: https://zenodo.org/records/17883052  
**Repository:** https://github.com/BLZ11/deblocking_temp

---

## Table of Contents

1. [Setup and Configuration](#setup)
2. [Resonance Bond Definitions](#resonance)
3. [BEBOP Bond Energy Calculations](#bebop)
4. [Nucleophilicity and HOMO-LUMO Gap](#nucleophilicity)
5. [Deprotonation Energy Calculations](#deprotonation)
6. [Export Results](#export)
7. [Extract and Export XYZ Coordinates](#xyz)

---

## Descriptors Calculated

| Descriptor | Method | Description |
|------------|--------|-------------|
| Bond energies | BEBOP/B3LYP | Gross and net bond energies for reactive bonds |
| Hybridization | BEBOP/B3LYP | sp-character of atoms in reactive bonds |
| Resonance energy | BEBOP/B3LYP | Stabilization from π-delocalization |
| Nucleophilicity (*N*) | B3LYP/CBSB7 | Domingo's index relative to TCE |
| HOMO-LUMO gap | B3LYP/CBSB7 | Frontier orbital energy difference |
| Deprotonation ΔH | G4MP2 | Gas-phase deprotonation enthalpy |

<a id='setup'></a>
## 1. Setup and Configuration

**Requirements:**
- Python ≥ 3.8
- NumPy ≥ 1.20
- Matplotlib ≥ 3.5
- BEBOP-1 v2.0.0 ([GitHub](https://github.com/keithgroup/BEBOP1_v2.0.0))

**Installation:**
```bash
pip install numpy matplotlib
pip install git+https://github.com/keithgroup/BEBOP1_v2.0.0.git
```

**Data Setup:**
1. Download Gaussian output files from [Zenodo](https://zenodo.org/records/17883052)
2. Extract to a local directory
3. Update `DATA_PATH` below to point to the extracted folder

In [None]:
"""Imports and configuration."""

from __future__ import annotations
import re
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass, field

import numpy as np
import matplotlib.pyplot as plt

from bebop1 import BEBOP

# =============================================================================
# USER CONFIGURATION - Update DATA_PATH for your system
# =============================================================================

DATA_PATH = Path("/path/to/zenodo/download")  # <- MODIFY THIS PATH

# Gaussian output directories
GAUSSIAN_B3LYP_CBSB7 = DATA_PATH / "calculations" / "b3lyp_cbsb7"
GAUSSIAN_B3LYP_631G = DATA_PATH / "calculations" / "b3lyp_631g*"
GAUSSIAN_G4MP2 = DATA_PATH / "calculations" / "g4mp2"

# Output directory
OUTPUT_DIR = Path("./results")
OUTPUT_DIR.mkdir(exist_ok=True)

# Physical constants
HARTREE_TO_KCAL = 627.5096  # kcal/mol per Hartree
HARTREE_TO_EV = 27.211      # eV per Hartree

print(f"Data path: {DATA_PATH}")
print(f"Output directory: {OUTPUT_DIR.absolute()}")

In [None]:
"""Define all molecular structures to analyze.

Naming convention:
- Base name (e.g., 'acetoneoxime'): Capped adduct (blocking group + MDI model)
- Base name + '_cap' (e.g., 'acetoneoxime_cap'): Isolated capping agent
"""

STRUCTURES: List[str] = [
    # Triazoles and pyrazoles
    "124triazole", "124triazole_cap",
    "3methylpyrazole", "3methylpyrazole_cap",
    "pyrazole", "pyrazole_cap",
    "benzotriazole", "benzotriazole_cap",
    # Oximes
    "acetoneoxime", "acetoneoxime_cap",
    "acetophenoneoxime", "acetophenoneoxime_cap",
    "butanoneoxime", "butanoneoxime_cap",
    "cyclohexanoneoxime", "cyclohexanoneoxime_cap",
    "octanoneoxime", "octanoneoxime_cap",
    # Alcohols
    "benzylalcohol", "benzylalcohol_cap",
    "phenylEthanol", "phenylEthanol_cap",
    "naphthylEthanol", "naphthylEthanol_cap",
    "phenol", "phenol_cap",
    "ethanol", "ethanol_cap",
    "butanol", "butanol_cap",
    "hexanol", "hexanol_cap",
    "decanol", "decanol_cap",
    "cyclohexanol_axial", "cyclohexanol_axial_cap",
    "cyclohexanol_equatorial", "cyclohexanol_equatorial_cap",
    # Amines
    "nmethylaniline", "nmethylaniline_cap",
    "butylamine", "butylamine_cap",
    "hexylamine", "hexylamine_cap",
    "decylamine", "decylamine_cap",
    # Others
    "hydroxyethylmethacrylate", "hydroxyethylmethacrylate_cap",
    "hydroxymethylpentanone", "hydroxymethylpentanone_cap",
]

print(f"Total structures: {len(STRUCTURES)}")
print(f"  - Capped adducts: {len([s for s in STRUCTURES if not s.endswith('_cap')])}")
print(f"  - Capping agents: {len([s for s in STRUCTURES if s.endswith('_cap')])}")

<a id='resonance'></a>
## 2. Resonance Bond Definitions

Resonance energy quantifies π-electron delocalization in the blocking group.

**Bond type notation:**
- `C-N*`: C–N single bond with N lone pair in π-system (participates in resonance)
- `C=N`: C–N double bond
- `N-N`, `N=N`: N–N single and double bonds
- `C-O`, `N-O`: Bonds involving oxygen

**Note:** Atom indices are 0-indexed from Gaussian output files.

In [None]:
"""Reference bond orders from localized model compounds.

These values define the 'non-resonating' baseline for each bond type.
Legacy defaults (C-C, C=C, C-N*, C=N, =C-N:) are built into BEBOP.
"""

REFERENCE_BOND_ORDERS: Dict[str, float] = {
    'N-N': 0.52205,   # From cis-H2C=N-N=N-N=CH2
    'N=N': 0.85233,   # From cis-H2C=N-N=N-N=CH2
    'C-O': 0.557416,  # From cis-H2C=CH-O-O-CH=CH2
    'N-O': 1.009388,  # From cis-H2C=CH-CH=N-O-CH=CH2
}

In [None]:
"""Resonance bond definitions.

Format: {molecule_name: {bond_type: [(atom_i, atom_j), ...]}}
"""

RESONANCE_BONDS: Dict[str, Dict[str, List[Tuple[int, int]]]] = {
    # === Capped Adducts ===
    "124triazole": {
        "C-N*": [(8, 19), (19, 20), (20, 21), (22, 23)],
        "C=N": [(21, 22)],
        "N-N": [(19, 23)],
    },
    "3methylpyrazole": {
        "C=N": [(8, 19), (22, 23)],
        "C=C": [(20, 21)],
        "C-C": [(21, 22)],
        "C-N*": [(19, 20)],
        "N-N": [(19, 23)],
    },
    "acetoneoxime": {
        "C=N": [(20, 21)],
        "N-O": [(19, 20)],
        "C-O": [(8, 19)],
    },
    "acetophenoneoxime": {
        "C-C": [(26, 27), (28, 29), (30, 31), (21, 26)],
        "C=C": [(26, 31), (27, 28), (29, 30)],
        "C=N": [(20, 21)],
        "N-O": [(19, 20)],
        "C-O": [(8, 19)],
    },
    "benzotriazole": {
        "C=C": [(20, 21), (25, 26), (23, 24)],
        "C-C": [(20, 23), (21, 26), (24, 25)],
        "C-N*": [(19, 20), (21, 27), (8, 19)],
        "N-N": [(19, 22)],
        "N=N": [(22, 27)],
    },
    "benzylalcohol": {
        "C-C": [(20, 21), (22, 23), (24, 25)],
        "C=C": [(21, 22), (23, 24), (20, 25)],
    },
    "naphthylEthanol": {
        "C=C": [(25, 31), (29, 30), (27, 28), (34, 37), (32, 38)],
        "C-C": [(30, 31), (28, 29), (25, 27), (27, 34), (37, 38), (28, 32)],
    },
    "nmethylaniline": {
        "C-C": [(20, 25), (26, 27), (28, 29)],
        "C=C": [(25, 26), (27, 28), (20, 29)],
        "C-N*": [(19, 20), (8, 19)],
    },
    "phenol": {
        "C=C": [(20, 21), (22, 23), (24, 25)],
        "C-C": [(21, 22), (23, 24), (20, 25)],
        "C-O": [(19, 20), (8, 19)],
    },
    "phenylEthanol": {
        "C-C": [(20, 21), (22, 23), (24, 25)],
        "C=C": [(21, 22), (23, 24), (20, 25)],
    },
    "pyrazole": {
        "C-C": [(21, 22)],
        "C=C": [(20, 21)],
        "C-N*": [(19, 20), (8, 19)],
        "C=N": [(22, 23)],
        "N-N": [(19, 23)],
    },
    "butanoneoxime": {"N-O": [(19, 20)], "C=N": [(20, 21)]},
    "cyclohexanoneoxime": {"N-O": [(19, 20)], "C=N": [(20, 21)]},
    "octanoneoxime": {"N-O": [(19, 20)], "C=N": [(20, 21)]},

    # === Capping Agents ===
    "124triazole_cap": {
        "C-N*": [(2, 3), (3, 4), (5, 6)],
        "C=N": [(4, 5)],
        "N-N": [(2, 6)],
    },
    "3methylpyrazole_cap": {
        "C=N": [(5, 6)],
        "C=C": [(3, 4)],
        "C-C": [(4, 5)],
        "C-N*": [(2, 3)],
        "N-N": [(2, 6)],
    },
    "acetoneoxime_cap": {"C=N": [(3, 4)], "N-O": [(2, 3)]},
    "acetophenoneoxime_cap": {
        "C-C": [(4, 9), (9, 10), (13, 14), (11, 12)],
        "C=C": [(10, 11), (9, 14), (12, 13)],
        "C=N": [(3, 4)],
        "N-O": [(2, 3)],
    },
    "benzotriazole_cap": {
        "C=C": [(3, 4), (6, 7), (8, 9)],
        "C-C": [(3, 6), (7, 8), (4, 9)],
        "C-N*": [(2, 3), (4, 10)],
        "N-N": [(2, 5)],
        "N=N": [(5, 10)],
    },
    "benzylalcohol_cap": {
        "C-C": [(4, 5), (6, 7), (3, 8)],
        "C=C": [(3, 4), (5, 6), (7, 8)],
    },
    "naphthylEthanol_cap": {
        "C=C": [(8, 10), (11, 12), (13, 14), (15, 21), (17, 20)],
        "C-C": [(20, 21), (11, 15), (10, 17), (10, 11), (12, 13), (8, 14)],
    },
    "nmethylaniline_cap": {
        "C-C": [(11, 12), (3, 8), (9, 10)],
        "C=C": [(10, 11), (3, 12), (8, 9)],
        "C-N*": [(2, 3)],
    },
    "phenol_cap": {
        "C=C": [(5, 6), (7, 8), (3, 4)],
        "C-C": [(4, 5), (6, 7), (3, 8)],
        "C-O": [(2, 3)],
    },
    "phenylEthanol_cap": {
        "C-C": [(7, 8), (5, 6), (3, 4)],
        "C=C": [(3, 8), (6, 7), (4, 5)],
    },
    "pyrazole_cap": {
        "C-C": [(4, 5)],
        "C=C": [(3, 4)],
        "C-N*": [(2, 3)],
        "C=N": [(5, 6)],
        "N-N": [(2, 6)],
    },
    "butanoneoxime_cap": {"N-O": [(2, 3)], "C=N": [(3, 4)]},
    "cyclohexanoneoxime_cap": {"N-O": [(2, 3)], "C=N": [(3, 4)]},
    "octanoneoxime_cap": {"N-O": [(2, 3)], "C=N": [(3, 4)]},
}

RESONANCE_STRUCTURES = set(RESONANCE_BONDS.keys())
print(f"Structures with resonance definitions: {len(RESONANCE_STRUCTURES)}")

<a id='bebop'></a>
## 3. BEBOP Bond Energy Calculations

The BEBOP method partitions molecular energy into atom and bond contributions:

- **Gross bond energy**: Direct bonding interaction (EHT + short-range repulsion)
- **Net bond energy**: Gross bond energy + environmental effects (hybridization)
- **Hybridization energy**: sp-character of atoms in reactive bonds
- **Resonance energy**: Deviation from localized reference bond orders

In [None]:
"""Data containers for calculation results."""

@dataclass
class BEBOPResult:
    """BEBOP calculation results for a single molecule."""
    name: str
    total_energy: float
    bond_co: str           # Bond type for C-O or C-N bond
    bond_nh: str           # Bond type for N-H bond ("None" for capping agents)
    gross_be_co: float     # Gross bond energy for C-O/C-N (kcal/mol)
    net_be_co: float       # Net bond energy for C-O/C-N (kcal/mol)
    gross_be_nh: float     # Gross bond energy for N-H (kcal/mol)
    net_be_nh: float       # Net bond energy for N-H (kcal/mol)
    hybridization: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
    resonance_energy: float = 0.0


@dataclass
class NucleophilicityResult:
    """Nucleophilicity and frontier orbital data."""
    name: str
    homo: float            # HOMO energy (eV)
    lumo: float            # LUMO energy (eV)
    homo_lumo_gap: float   # HOMO-LUMO gap (eV)
    nucleophilicity: float # N index relative to TCE (eV)


@dataclass
class DeprotonationResult:
    """Deprotonation thermodynamics."""
    name: str
    delta_h: float  # Deprotonation enthalpy (kcal/mol)
    delta_g: float  # Deprotonation free energy (kcal/mol)


@dataclass
class G4MP2Energies:
    """G4MP2 thermodynamic quantities."""
    enthalpy: float     # Hartrees
    free_energy: float  # Hartrees


@dataclass
class OrbitalEnergies:
    """Molecular orbital energies."""
    homo: float  # eV
    lumo: float  # eV
    
    @property
    def homo_lumo_gap(self) -> float:
        return self.homo - self.lumo

In [None]:
"""BEBOP calculation functions."""

def run_bebop_calculation(structure: str, gaussian_path: Path) -> BEBOPResult:
    """Run BEBOP analysis on a Gaussian output file.
    
    Bond indices:
    - Capped adducts: (7,18) for C-O/C-N, (6,9) for N-H
    - Capping agents: (0,1) for X-H, N-H values set to zero
    """
    output_file = gaussian_path / f"{structure}.out"
    data = BEBOP(str(output_file))
    be = data.bond_E(GrossBond=True, Composite=True)
    is_capped = structure.endswith("_cap")
    
    if is_capped:
        bond_co = f"{data.mol[1]}-{data.mol[0]}"
        bond_nh = "None"
        gross_be_co, net_be_co = 0.0, 0.0
        gross_be_nh, net_be_nh = 0.0, 0.0
        hybridization = [0.0, be["CompositeTable"][1][1], 0.0]
    else:
        bond_co = f"{data.mol[7]}-{data.mol[18]}"
        bond_nh = f"{data.mol[6]}-{data.mol[9]}"
        gross_be_co = be["GrossBond"][18][7]
        net_be_co = be["NetBond"][18][7]
        gross_be_nh = be["GrossBond"][9][6]
        net_be_nh = be["NetBond"][9][6]
        hybridization = [
            be["CompositeTable"][7][7],
            be["CompositeTable"][18][18],
            be["CompositeTable"][6][6],
        ]
    
    resonance_energy = 0.0
    if structure in RESONANCE_STRUCTURES:
        resonance_energy = np.abs(
            data.resonance_E(RESONANCE_BONDS[structure], reference_bond_orders=REFERENCE_BOND_ORDERS)
        )
    
    return BEBOPResult(
        name=structure,
        total_energy=data.total_E(),
        bond_co=bond_co,
        bond_nh=bond_nh,
        gross_be_co=gross_be_co,
        net_be_co=net_be_co,
        gross_be_nh=gross_be_nh,
        net_be_nh=net_be_nh,
        hybridization=hybridization,
        resonance_energy=resonance_energy,
    )


def run_all_bebop_calculations(structures: List[str], gaussian_path: Path) -> List[BEBOPResult]:
    """Run BEBOP calculations for all structures."""
    results = []
    for i, structure in enumerate(structures, 1):
        print(f"\rProcessing {i}/{len(structures)}: {structure}...", end="")
        try:
            results.append(run_bebop_calculation(structure, gaussian_path))
        except Exception as e:
            print(f"\nError processing {structure}: {e}")
    print(f"\nCompleted {len(results)}/{len(structures)} BEBOP calculations.")
    return results

In [None]:
# Run BEBOP calculations
bebop_results = run_all_bebop_calculations(STRUCTURES, GAUSSIAN_B3LYP_CBSB7)

<a id='nucleophilicity'></a>
## 4. Nucleophilicity and HOMO-LUMO Gap

Nucleophilicity index (*N*) follows Domingo's scale, referenced to tetracyanoethylene (TCE):

$$N = E_{\text{HOMO}}^{\text{molecule}} - E_{\text{HOMO}}^{\text{TCE}}$$

The HOMO-LUMO gap characterizes the frontier orbital energy separation:

$$\Delta E = E_{\text{HOMO}} - E_{\text{LUMO}}$$

In [None]:
"""Nucleophilicity calculation functions."""

def parse_homo_lumo(filepath: Path) -> OrbitalEnergies:
    """Extract HOMO/LUMO energies from Gaussian output.
    
    Parses orbital energies after geometry optimization converges
    (i.e., after 'Stationary point found').
    """
    occupied, virtual = [], []
    found_stationary, found_electronic = False, False
    
    with open(filepath, "r") as f:
        for line in f:
            if "-- Stationary point found." in line:
                found_stationary = True
            if not found_stationary:
                continue
            if "The electronic state is" in line:
                found_electronic = True
            if not found_electronic:
                continue
            if "Condensed to atoms (all electrons):" in line:
                break
            if line.strip().startswith("Alpha  occ."):
                occupied.extend(float(v) for v in line[28:].split())
            elif line.strip().startswith("Alpha virt."):
                virtual.extend(float(v) for v in line[28:].split())
    
    if not occupied or not virtual:
        raise ValueError(f"Could not parse orbital energies from {filepath}")
    
    return OrbitalEnergies(
        homo=occupied[-1] * HARTREE_TO_EV,
        lumo=virtual[0] * HARTREE_TO_EV
    )


def calculate_all_nucleophilicities(
    structures: List[str], 
    gaussian_path: Path
) -> List[NucleophilicityResult]:
    """Calculate nucleophilicity for all structures relative to TCE."""
    tce = parse_homo_lumo(gaussian_path / "TCE.out")
    
    results = []
    for structure in structures:
        try:
            orbitals = parse_homo_lumo(gaussian_path / f"{structure}.out")
            results.append(NucleophilicityResult(
                name=structure,
                homo=orbitals.homo,
                lumo=orbitals.lumo,
                homo_lumo_gap=orbitals.homo_lumo_gap,
                nucleophilicity=orbitals.homo - tce.homo,
            ))
        except Exception as e:
            print(f"Error processing {structure}: {e}")
    
    print(f"Calculated nucleophilicity for {len(results)} structures.")
    return results

In [None]:
# Calculate nucleophilicity at both levels of theory
nuc_b3lyp_631g = calculate_all_nucleophilicities(STRUCTURES, GAUSSIAN_B3LYP_631G)
nuc_b3lyp_cbsb7 = calculate_all_nucleophilicities(STRUCTURES, GAUSSIAN_B3LYP_CBSB7)

In [None]:
"""Basis set comparison plot."""

def plot_basis_set_comparison(
    nuc_631g: List[NucleophilicityResult],
    nuc_cbsb7: List[NucleophilicityResult],
    save_path: Optional[Path] = None
) -> None:
    """Plot nucleophilicity comparison between 6-31G* and CBSB7 basis sets."""
    names_631g = {r.name: r.nucleophilicity for r in nuc_631g}
    
    x_vals, y_vals = [], []
    for r in nuc_cbsb7:
        if r.name in names_631g:
            x_vals.append(names_631g[r.name])
            y_vals.append(r.nucleophilicity)
    
    fig, ax = plt.subplots(figsize=(6, 5))
    ax.scatter(x_vals, y_vals, alpha=0.7, edgecolors="k", linewidths=0.5)
    
    # Correlation line
    z = np.polyfit(x_vals, y_vals, 1)
    p = np.poly1d(z)
    x_line = np.linspace(min(x_vals), max(x_vals), 100)
    ax.plot(x_line, p(x_line), "--", color="gray", alpha=0.7)
    
    # R-squared
    r_squared = np.corrcoef(x_vals, y_vals)[0, 1] ** 2
    
    ax.set_xlabel("Nucleophilicity (B3LYP/6-31G*) [eV]", fontsize=11)
    ax.set_ylabel("Nucleophilicity (B3LYP/CBSB7) [eV]", fontsize=11)
    ax.set_title(f"Basis Set Comparison (R² = {r_squared:.3f})", fontsize=12)
    plt.tight_layout()
    
    if save_path:
        plt.savefig(save_path, dpi=300, bbox_inches="tight")
    plt.show()


plot_basis_set_comparison(
    nuc_b3lyp_631g, 
    nuc_b3lyp_cbsb7,
    save_path=OUTPUT_DIR / "nucleophilicity_basis_comparison.png"
)

<a id='deprotonation'></a>
## 5. Deprotonation Energy Calculations

Gas-phase deprotonation energies from G4MP2 composite method:

$$\Delta H_{\text{deprot}} = \left( H_{\text{anion}} + H_{\text{H}^+} \right) - H_{\text{neutral}}$$

$$\Delta G_{\text{deprot}} = \left( G_{\text{anion}} + G_{\text{H}^+} \right) - G_{\text{neutral}}$$

In [None]:
"""Deprotonation energy calculation functions."""

def parse_g4mp2_energies(filepath: Path) -> G4MP2Energies:
    """Extract G4MP2 enthalpy and free energy from Gaussian output."""
    with open(filepath, "r") as f:
        for line in f:
            if line.startswith(" G4MP2 Enthalpy="):
                enthalpy = float(line[18:37].strip())
                free_energy = float(line[63:].strip())
                return G4MP2Energies(enthalpy=enthalpy, free_energy=free_energy)
    raise ValueError(f"G4MP2 energies not found in {filepath}")


def calculate_deprotonation_energies(
    structures: List[str], 
    g4mp2_path: Path
) -> List[DeprotonationResult]:
    """Calculate deprotonation energies for capping agents.
    
    Uses capping agent structures (with _cap suffix) for calculations.
    Requires neutral and anion Gaussian output files.
    """
    h_plus = parse_g4mp2_energies(g4mp2_path / "H+.out")
    base_structures = [s for s in structures if not s.endswith("_cap")]
    
    results = []
    for structure in base_structures:
        try:
            parent = parse_g4mp2_energies(g4mp2_path / f"{structure}_cap.out")
            anion = parse_g4mp2_energies(g4mp2_path / f"{structure}_cap_anion.out")
            
            delta_h = ((anion.enthalpy + h_plus.enthalpy) - parent.enthalpy) * HARTREE_TO_KCAL
            delta_g = ((anion.free_energy + h_plus.free_energy) - parent.free_energy) * HARTREE_TO_KCAL
            
            results.append(DeprotonationResult(name=structure, delta_h=delta_h, delta_g=delta_g))
        except FileNotFoundError:
            pass  # Skip structures without G4MP2 data
    
    print(f"Calculated deprotonation energies for {len(results)} structures.")
    return results

In [None]:
# Calculate deprotonation energies
deprot_results = calculate_deprotonation_energies(STRUCTURES, GAUSSIAN_G4MP2)

<a id='export'></a>
## 6. Export Results

Export all descriptors to a single CSV file for machine learning analysis.

In [None]:
"""Export functions."""

def export_descriptors(
    bebop_results: List[BEBOPResult],
    nuc_results: List[NucleophilicityResult],
    deprot_results: List[DeprotonationResult],
    output_path: Path
) -> None:
    """Export all descriptors to CSV.
    
    Columns:
    - structure: Molecule name
    - bond1, bond2: Bond types for reactive bonds
    - bond_gross1, bond_net1: Bond energies for first reactive bond
    - bond_gross2, bond_net2: Bond energies for second reactive bond
    - C_hyb_bond1, NorO_hyb_bond1, N_hyb_bond2: Hybridization indices
    - resonance: Resonance stabilization energy
    - homo_b3lyp, lumo_b3lyp: Frontier orbital energies
    - homo_lumo_gap: HOMO-LUMO gap
    - nucleophilicity: Domingo N index relative to TCE
    - delta_h: Deprotonation enthalpy (G4MP2)
    """
    nuc_lookup = {r.name: r for r in nuc_results}
    deprot_lookup = {r.name: r for r in deprot_results}
    
    header = (
        "structure,bond1,bond2,bond_gross1,bond_net1,bond_gross2,bond_net2,"
        "C_hyb_bond1,NorO_hyb_bond1,N_hyb_bond2,resonance,"
        "homo_b3lyp,lumo_b3lyp,homo_lumo_gap,nucleophilicity,delta_h\n"
    )
    
    with open(output_path, "w") as f:
        f.write(header)
        for bebop in bebop_results:
            nuc = nuc_lookup.get(bebop.name)
            if nuc is None:
                continue
            
            # Map capped adducts to their capping agent's deprotonation data
            base_name = bebop.name[:-4] if bebop.name.endswith("_cap") else bebop.name
            deprot = deprot_lookup.get(base_name)
            delta_h = f"{deprot.delta_h:.2f}" if deprot else ""
            
            row = (
                f"{bebop.name},{bebop.bond_co},{bebop.bond_nh},"
                f"{bebop.gross_be_co},{bebop.net_be_co},"
                f"{bebop.gross_be_nh},{bebop.net_be_nh},"
                f"{bebop.hybridization[0]},{bebop.hybridization[1]},{bebop.hybridization[2]},"
                f"{bebop.resonance_energy},"
                f"{nuc.homo},{nuc.lumo},{nuc.homo_lumo_gap},{nuc.nucleophilicity:.2f},{delta_h}\n"
            )
            f.write(row)
    
    print(f"Exported descriptors to {output_path}")

In [None]:
# Export all results
export_descriptors(bebop_results, nuc_b3lyp_cbsb7, deprot_results, OUTPUT_DIR / "descriptors.csv")

<a id='xyz'></a>
## 7. Extract and Export XYZ Coordinates

Extract optimized geometries from Gaussian outputs for visualization and further analysis.

In [None]:
"""XYZ coordinate extraction functions."""

ATOMIC_SYMBOLS = {
    1: 'H', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 15: 'P', 16: 'S', 17: 'Cl', 35: 'Br'
}

@dataclass
class XYZGeometry:
    """Container for molecular coordinates."""
    name: str
    n_atoms: int
    elements: List[str]
    coords: List[Tuple[float, float, float]]
    comment: str = ""
    
    def to_xyz_string(self) -> str:
        """Convert to standard XYZ format string."""
        lines = [str(self.n_atoms), self.comment]
        for elem, (x, y, z) in zip(self.elements, self.coords):
            lines.append(f"{elem:2s} {x:15.8f} {y:15.8f} {z:15.8f}")
        return "\n".join(lines)


def parse_gaussian_geometry(filepath: Path) -> Optional[XYZGeometry]:
    """Extract final optimized geometry from Gaussian output.
    
    Parses the last 'Standard orientation' block in the output file.
    """
    with open(filepath, 'r') as f:
        content = f.read()
    
    pattern = r"Standard orientation:.*?-{5,}\n\s+Center\s+Atomic\s+Atomic\s+Coordinates.*?-{5,}\n(.*?)-{5,}"
    matches = list(re.finditer(pattern, content, re.DOTALL))
    
    if not matches:
        return None
    
    coord_block = matches[-1].group(1)
    elements, coords = [], []
    
    for line in coord_block.strip().split('\n'):
        parts = line.split()
        if len(parts) >= 6:
            atomic_num = int(parts[1])
            elements.append(ATOMIC_SYMBOLS.get(atomic_num, f"X{atomic_num}"))
            coords.append((float(parts[3]), float(parts[4]), float(parts[5])))
    
    return XYZGeometry(
        name=filepath.stem,
        n_atoms=len(elements),
        elements=elements,
        coords=coords,
        comment=filepath.stem
    )


def extract_all_geometries(structures: List[str], gaussian_path: Path) -> Dict[str, XYZGeometry]:
    """Extract geometries for all structures."""
    geometries = {}
    for structure in structures:
        filepath = gaussian_path / f"{structure}.out"
        if filepath.exists():
            geom = parse_gaussian_geometry(filepath)
            if geom:
                geometries[structure] = geom
                print(f"  ✓ {structure}: {geom.n_atoms} atoms")
    return geometries


def export_combined_xyz(geometries: Dict[str, XYZGeometry], output_path: Path) -> None:
    """Export all geometries to a combined XYZ file."""
    with open(output_path, 'w') as f:
        for geom in geometries.values():
            f.write(geom.to_xyz_string())
            f.write("\n\n")
    print(f"Exported {len(geometries)} geometries to {output_path}")

In [None]:
# Extract B3LYP/CBSB7 geometries (includes TCE reference)
print("Extracting B3LYP/CBSB7 geometries...")
geometries_b3lyp = extract_all_geometries(STRUCTURES + ["TCE"], GAUSSIAN_B3LYP_CBSB7)

In [None]:
# Extract G4MP2 geometries (capping agents + anions)
print("Extracting G4MP2 geometries...")
cap_structures = [s for s in STRUCTURES if s.endswith("_cap")]
g4mp2_structures = cap_structures + [f"{s}_anion" for s in cap_structures]
geometries_g4mp2 = extract_all_geometries(g4mp2_structures, GAUSSIAN_G4MP2)

In [None]:
# Export combined XYZ files
export_combined_xyz(geometries_b3lyp, OUTPUT_DIR / "all_geometries_b3lyp_cbsb7.xyz")
export_combined_xyz(geometries_g4mp2, OUTPUT_DIR / "all_geometries_g4mp2.xyz")

---

## Summary

This notebook calculated the following descriptors for predicting TPU deblocking temperatures:

| Output File | Contents |
|-------------|----------|
| `descriptors.csv` | All electronic descriptors (bond energies, hybridization, resonance, nucleophilicity, HOMO-LUMO gap, deprotonation ΔH) |
| `nucleophilicity_basis_comparison.png` | Correlation plot comparing B3LYP/6-31G* vs B3LYP/CBSB7 nucleophilicity |
| `all_geometries_b3lyp_cbsb7.xyz` | Optimized structures from B3LYP/CBSB7 |
| `all_geometries_g4mp2.xyz` | Optimized structures from G4MP2 |

---

## License

This code is released under the MIT License. See LICENSE file for details.