# Notebook 02: Structure Validation and Preparation

## The Most Critical Step in DFT Calculations

**"If you put garbage structure into DFT, you will get garbage results out."**

This notebook teaches the validation steps that MUST be performed before running ANY DFT calculation.

---

## Learning Objectives

1. Verify charge neutrality of proposed compounds
2. Use Shannon ionic radii for bond length estimation
3. Estimate lattice parameters before calculations
4. Visualize and validate structures
5. Identify space groups and crystal systems
6. Convert structures to QE format

---

## 1. Charge Neutrality Check

### Why This Matters

A stable ionic compound must be charge-neutral. If the oxidation states don't sum to zero, the compound is either:
- Incorrectly formulated
- A non-stoichiometric defect compound (requires special treatment)
- Simply impossible

### The Rule

For compound $A_x B_y C_z$:

$$\sum_i n_i \times \text{oxidation}_i = 0$$

In [None]:
import numpy as np
from typing import Dict, List, Tuple, Optional

# Common oxidation states for elements
COMMON_OXIDATION_STATES = {
    # Alkali metals
    'Li': [1], 'Na': [1], 'K': [1], 'Rb': [1], 'Cs': [1],
    # Alkaline earth metals
    'Be': [2], 'Mg': [2], 'Ca': [2], 'Sr': [2], 'Ba': [2],
    # Transition metals (common states)
    'Sc': [3], 'Ti': [2, 3, 4], 'V': [2, 3, 4, 5], 'Cr': [2, 3, 6],
    'Mn': [2, 3, 4, 7], 'Fe': [2, 3], 'Co': [2, 3], 'Ni': [2, 3],
    'Cu': [1, 2], 'Zn': [2], 'Zr': [4], 'Nb': [3, 5], 'Mo': [4, 6],
    'Ru': [3, 4], 'Rh': [3], 'Pd': [2, 4], 'Ag': [1], 'Cd': [2],
    'Hf': [4], 'Ta': [5], 'W': [4, 6], 'Re': [4, 7], 'Os': [4],
    'Ir': [3, 4], 'Pt': [2, 4], 'Au': [1, 3],
    # Main group elements
    'Al': [3], 'Ga': [3], 'In': [3], 'Tl': [1, 3],
    'Si': [4], 'Ge': [2, 4], 'Sn': [2, 4], 'Pb': [2, 4],
    'P': [-3, 3, 5], 'As': [-3, 3, 5], 'Sb': [-3, 3, 5], 'Bi': [3, 5],
    'S': [-2, 4, 6], 'Se': [-2, 4, 6], 'Te': [-2, 4, 6],
    # Halogens and chalcogens
    'O': [-2], 'F': [-1], 'Cl': [-1], 'Br': [-1], 'I': [-1],
    'N': [-3, 3, 5],
    # Lanthanides (typical +3)
    'La': [3], 'Ce': [3, 4], 'Pr': [3], 'Nd': [3], 'Sm': [2, 3],
    'Eu': [2, 3], 'Gd': [3], 'Tb': [3, 4], 'Dy': [3], 'Ho': [3],
    'Er': [3], 'Tm': [3], 'Yb': [2, 3], 'Lu': [3]
}

def check_charge_neutrality(composition: Dict[str, int], 
                            oxidation_states: Dict[str, int]) -> Tuple[bool, float]:
    """
    Check if a compound is charge-neutral.
    
    Parameters
    ----------
    composition : dict
        Element symbols mapped to their counts, e.g., {'Ba': 1, 'Ti': 1, 'O': 3}
    oxidation_states : dict
        Element symbols mapped to their oxidation states, e.g., {'Ba': 2, 'Ti': 4, 'O': -2}
    
    Returns
    -------
    is_neutral : bool
        True if sum of charges equals zero
    total_charge : float
        The total charge (should be 0.0 for neutral compounds)
    
    Example
    -------
    >>> check_charge_neutrality({'Ba': 1, 'Ti': 1, 'O': 3}, {'Ba': 2, 'Ti': 4, 'O': -2})
    (True, 0.0)
    """
    total_charge = 0.0
    
    for element, count in composition.items():
        if element not in oxidation_states:
            print(f"Warning: No oxidation state provided for {element}")
            return False, float('nan')
        total_charge += count * oxidation_states[element]
    
    is_neutral = abs(total_charge) < 1e-6
    return is_neutral, total_charge

# Example: BaTiO3 (Barium Titanate)
print("=" * 60)
print("Charge Neutrality Check Examples")
print("=" * 60)

# BaTiO3
comp_batio3 = {'Ba': 1, 'Ti': 1, 'O': 3}
ox_batio3 = {'Ba': 2, 'Ti': 4, 'O': -2}
neutral, charge = check_charge_neutrality(comp_batio3, ox_batio3)
print(f"\nBaTiO3: Ba²⁺ + Ti⁴⁺ + 3×O²⁻ = {charge}")
print(f"  Charge neutral: {neutral} ✓" if neutral else f"  NOT neutral! Charge = {charge}")

# SrTiO3
comp_srtio3 = {'Sr': 1, 'Ti': 1, 'O': 3}
ox_srtio3 = {'Sr': 2, 'Ti': 4, 'O': -2}
neutral, charge = check_charge_neutrality(comp_srtio3, ox_srtio3)
print(f"\nSrTiO3: Sr²⁺ + Ti⁴⁺ + 3×O²⁻ = {charge}")
print(f"  Charge neutral: {neutral} ✓" if neutral else f"  NOT neutral! Charge = {charge}")

# Fe2O3
comp_fe2o3 = {'Fe': 2, 'O': 3}
ox_fe2o3 = {'Fe': 3, 'O': -2}
neutral, charge = check_charge_neutrality(comp_fe2o3, ox_fe2o3)
print(f"\nFe2O3: 2×Fe³⁺ + 3×O²⁻ = {charge}")
print(f"  Charge neutral: {neutral} ✓" if neutral else f"  NOT neutral! Charge = {charge}")

# Invalid compound example
comp_invalid = {'Ba': 1, 'Ti': 1, 'O': 2}  # Wrong stoichiometry
ox_invalid = {'Ba': 2, 'Ti': 4, 'O': -2}
neutral, charge = check_charge_neutrality(comp_invalid, ox_invalid)
print(f"\nBaTiO2 (wrong): Ba²⁺ + Ti⁴⁺ + 2×O²⁻ = {charge}")
print(f"  Charge neutral: {neutral}" if neutral else f"  NOT neutral! Total charge = {charge} ✗")

---

## 2. Shannon Ionic Radii

### Why Use Ionic Radii?

Shannon ionic radii allow us to:
1. **Estimate bond lengths**: $d_{A-B} \approx r_A + r_B$
2. **Estimate lattice parameters**: Using packing considerations
3. **Predict coordination numbers**: Radius ratio rules
4. **Check structure reasonableness**: Unrealistic bond lengths = wrong structure

### Reference
R.D. Shannon, "Revised Effective Ionic Radii", Acta Cryst. A32, 751 (1976)

In [None]:
# Shannon Ionic Radii Database (Å)
# Format: {element: {oxidation: {coordination: radius}}}
SHANNON_RADII = {
    # Alkali metals
    'Li': {1: {4: 0.59, 6: 0.76, 8: 0.92}},
    'Na': {1: {4: 0.99, 6: 1.02, 8: 1.18, 12: 1.39}},
    'K':  {1: {6: 1.38, 8: 1.51, 12: 1.64}},
    'Rb': {1: {6: 1.52, 8: 1.61, 12: 1.72}},
    'Cs': {1: {6: 1.67, 8: 1.74, 12: 1.88}},
    
    # Alkaline earth metals
    'Be': {2: {4: 0.27, 6: 0.45}},
    'Mg': {2: {4: 0.57, 6: 0.72, 8: 0.89}},
    'Ca': {2: {6: 1.00, 8: 1.12, 12: 1.34}},
    'Sr': {2: {6: 1.18, 8: 1.26, 12: 1.44}},
    'Ba': {2: {6: 1.35, 8: 1.42, 12: 1.61}},
    
    # Transition metals (selected)
    'Ti': {
        2: {6: 0.86},
        3: {6: 0.67},
        4: {4: 0.42, 6: 0.605, 8: 0.74}
    },
    'V': {
        2: {6: 0.79},
        3: {6: 0.64},
        4: {6: 0.58},
        5: {4: 0.355, 6: 0.54}
    },
    'Cr': {
        2: {6: 0.80},  # HS
        3: {6: 0.615},
        6: {4: 0.26, 6: 0.44}
    },
    'Mn': {
        2: {4: 0.66, 6: 0.83},  # HS
        3: {6: 0.645},  # HS
        4: {4: 0.39, 6: 0.53},
        7: {4: 0.25}
    },
    'Fe': {
        2: {4: 0.63, 6: 0.78},  # HS
        3: {4: 0.49, 6: 0.645}  # HS
    },
    'Co': {
        2: {4: 0.58, 6: 0.745},  # HS
        3: {6: 0.61}  # HS
    },
    'Ni': {
        2: {4: 0.55, 6: 0.69},
        3: {6: 0.56}  # LS
    },
    'Cu': {
        1: {2: 0.46, 4: 0.60, 6: 0.77},
        2: {4: 0.57, 6: 0.73}
    },
    'Zn': {2: {4: 0.60, 6: 0.74, 8: 0.90}},
    'Zr': {4: {4: 0.59, 6: 0.72, 8: 0.84}},
    
    # Main group elements
    'Al': {3: {4: 0.39, 6: 0.535}},
    'Ga': {3: {4: 0.47, 6: 0.62}},
    'In': {3: {6: 0.80, 8: 0.92}},
    'Si': {4: {4: 0.26, 6: 0.40}},
    'Ge': {4: {4: 0.39, 6: 0.53}},
    'Sn': {
        2: {6: 0.93},
        4: {4: 0.55, 6: 0.69}
    },
    'Pb': {
        2: {6: 1.19, 8: 1.29},
        4: {4: 0.65, 6: 0.775}
    },
    
    # Anions
    'O':  {-2: {2: 1.35, 3: 1.36, 4: 1.38, 6: 1.40, 8: 1.42}},
    'S':  {-2: {6: 1.84}},
    'Se': {-2: {6: 1.98}},
    'Te': {-2: {6: 2.21}},
    'F':  {-1: {2: 1.285, 4: 1.31, 6: 1.33}},
    'Cl': {-1: {6: 1.81}},
    'Br': {-1: {6: 1.96}},
    'I':  {-1: {6: 2.20}},
    'N':  {-3: {4: 1.46}},
    
    # Lanthanides (+3 state, CN=6)
    'La': {3: {6: 1.032, 8: 1.16, 12: 1.36}},
    'Ce': {3: {6: 1.01}, 4: {6: 0.87}},
    'Pr': {3: {6: 0.99}},
    'Nd': {3: {6: 0.983}},
    'Gd': {3: {6: 0.938}},
    'Dy': {3: {6: 0.912}},
    'Er': {3: {6: 0.89}},
    'Yb': {2: {6: 1.02}, 3: {6: 0.868}},
    'Lu': {3: {6: 0.861}}
}

def get_shannon_radius(element: str, oxidation: int, coordination: int) -> Optional[float]:
    """
    Get Shannon ionic radius for an element.
    
    Parameters
    ----------
    element : str
        Element symbol
    oxidation : int
        Oxidation state
    coordination : int
        Coordination number
    
    Returns
    -------
    radius : float or None
        Ionic radius in Angstrom, or None if not found
    """
    if element not in SHANNON_RADII:
        print(f"Element {element} not in database")
        return None
    
    if oxidation not in SHANNON_RADII[element]:
        print(f"Oxidation state {oxidation} not available for {element}")
        return None
    
    if coordination not in SHANNON_RADII[element][oxidation]:
        available = list(SHANNON_RADII[element][oxidation].keys())
        print(f"CN={coordination} not available for {element}{oxidation:+d}. Available: {available}")
        return None
    
    return SHANNON_RADII[element][oxidation][coordination]

# Examples
print("Shannon Ionic Radii Examples (Å)")
print("=" * 50)

examples = [
    ('Ba', 2, 12),
    ('Ti', 4, 6),
    ('O', -2, 6),
    ('Fe', 3, 6),
    ('Sr', 2, 12),
]

for elem, ox, cn in examples:
    r = get_shannon_radius(elem, ox, cn)
    if r:
        print(f"{elem}{ox:+d} (CN={cn}): {r:.3f} Å")

---

## 3. Lattice Parameter Estimation

Before running expensive DFT calculations, we can estimate lattice parameters using:

### Method 1: Ionic Radii Sum

For simple structures like rock salt (NaCl):
$$a \approx 2(r_{cation} + r_{anion})$$

For perovskites (ABO₃):
$$a \approx \sqrt{2}(r_A + r_O) \approx 2(r_B + r_O)$$

### Method 2: Isostructural Scaling (Vegard's Law)

If you know the lattice parameter of a similar compound:
$$a_{new} \approx a_{ref} \times \frac{\sum r_{new}}{\sum r_{ref}}$$

In [None]:
def estimate_perovskite_lattice(A: str, B: str, X: str = 'O',
                                 A_ox: int = 2, B_ox: int = 4, X_ox: int = -2,
                                 A_cn: int = 12, B_cn: int = 6, X_cn: int = 6) -> float:
    """
    Estimate cubic perovskite lattice parameter from ionic radii.
    
    For perovskite ABX3:
    a ≈ sqrt(2) * (r_A + r_X) from A-site perspective
    a ≈ 2 * (r_B + r_X) from B-site perspective
    
    Returns average of both estimates.
    """
    r_A = get_shannon_radius(A, A_ox, A_cn)
    r_B = get_shannon_radius(B, B_ox, B_cn)
    r_X = get_shannon_radius(X, X_ox, X_cn)
    
    if r_A is None or r_B is None or r_X is None:
        return None
    
    # Two estimates
    a_from_A = np.sqrt(2) * (r_A + r_X)
    a_from_B = 2 * (r_B + r_X)
    
    # Average (with slight preference for B-site estimate)
    a_estimate = 0.4 * a_from_A + 0.6 * a_from_B
    
    return a_estimate

def goldschmidt_tolerance_factor(A: str, B: str, X: str = 'O',
                                  A_ox: int = 2, B_ox: int = 4, X_ox: int = -2,
                                  A_cn: int = 12, B_cn: int = 6, X_cn: int = 6) -> float:
    """
    Calculate Goldschmidt tolerance factor for perovskites.
    
    t = (r_A + r_X) / [sqrt(2) * (r_B + r_X)]
    
    t ≈ 1.0: Ideal cubic perovskite
    t < 1.0: B-site too large, octahedral tilting
    t > 1.0: A-site too large, hexagonal perovskite possible
    
    Stability range: ~0.8 < t < 1.1
    """
    r_A = get_shannon_radius(A, A_ox, A_cn)
    r_B = get_shannon_radius(B, B_ox, B_cn)
    r_X = get_shannon_radius(X, X_ox, X_cn)
    
    if r_A is None or r_B is None or r_X is None:
        return None
    
    t = (r_A + r_X) / (np.sqrt(2) * (r_B + r_X))
    return t

# Example calculations
print("Perovskite Lattice Parameter Estimation")
print("=" * 60)

perovskites = [
    ('SrTiO3', 'Sr', 'Ti', 3.905),  # Experimental a = 3.905 Å
    ('BaTiO3', 'Ba', 'Ti', 4.01),   # Experimental a ≈ 4.01 Å (cubic)
    ('CaTiO3', 'Ca', 'Ti', 3.84),   # Experimental (pseudocubic)
]

for name, A, B, a_exp in perovskites:
    a_est = estimate_perovskite_lattice(A, B)
    t = goldschmidt_tolerance_factor(A, B)
    error = 100 * (a_est - a_exp) / a_exp if a_est else None
    
    print(f"\n{name}:")
    print(f"  Tolerance factor: t = {t:.3f}")
    print(f"  Estimated a = {a_est:.3f} Å")
    print(f"  Experimental a = {a_exp:.3f} Å")
    print(f"  Error: {error:+.1f}%" if error else "  Could not estimate")

---

## 4. Space Group and Symmetry

Using `spglib` to identify the space group of a structure.

In [None]:
# Check if spglib is available (without try/except)
import importlib
import importlib.util

spec = importlib.util.find_spec("spglib")
SPGLIB_AVAILABLE = spec is not None

if SPGLIB_AVAILABLE:
    spglib = importlib.import_module("spglib")
    print("spglib available: version", getattr(spglib, "__version__", "unknown"))
else:
    print("spglib not available. Install with: pip install spglib")
    print("Space group detection will be skipped.")

def get_space_group(lattice, positions, numbers, symprec=1e-5):
    """
    Get space group information using spglib.
    
    Parameters
    ----------
    lattice : array_like
        3x3 array of lattice vectors (rows)
    positions : array_like
        Fractional coordinates of atoms
    numbers : array_like
        Atomic numbers
    symprec : float
        Symmetry precision
    
    Returns
    -------
    dict with space group info, or None if spglib unavailable
    """
    if not SPGLIB_AVAILABLE:
        return None
    
    cell = (lattice, positions, numbers)
    spg_info = spglib.get_spacegroup(cell, symprec=symprec)
    symmetry = spglib.get_symmetry(cell, symprec=symprec)
    
    return {
        'spacegroup': spg_info,
        'n_operations': len(symmetry['rotations']) if symmetry else 0
    }

# Example: Silicon diamond structure
a_si = 5.43  # Angstrom
lattice_si = (a_si / 2.0) * np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]], dtype=float)  # FCC primitive cell vectors

# For conventional cell:
lattice_si_conv = a_si * np.eye(3)
positions_si_conv = np.array([
    [0.00, 0.00, 0.00],
    [0.50, 0.50, 0.00],
    [0.50, 0.00, 0.50],
    [0.00, 0.50, 0.50],
    [0.25, 0.25, 0.25],
    [0.75, 0.75, 0.25],
    [0.75, 0.25, 0.75],
    [0.25, 0.75, 0.75]
])
numbers_si_conv = [14] * 8  # Si atomic number

if SPGLIB_AVAILABLE:
    info = get_space_group(lattice_si_conv, positions_si_conv, numbers_si_conv)
    print(f"\nSilicon space group: {info['spacegroup']}")
    print(f"Number of symmetry operations: {info['n_operations']}")

---

## 5. Converting to QE Format

Once your structure is validated, convert it to Quantum ESPRESSO format.

In [None]:
def write_qe_structure(lattice, positions, symbols, 
                        coord_type='crystal', output_format='card'):
    """
    Generate QE structure cards.
    
    Parameters
    ----------
    lattice : array
        3x3 lattice vectors in Angstrom
    positions : array
        Atomic positions
    symbols : list
        Element symbols
    coord_type : str
        'crystal' for fractional, 'angstrom' for Cartesian
    
    Returns
    -------
    str : QE-formatted structure cards
    """
    lines = []
    
    # CELL_PARAMETERS
    lines.append("CELL_PARAMETERS {angstrom}")
    for vec in lattice:
        lines.append(f"  {vec[0]:16.10f}  {vec[1]:16.10f}  {vec[2]:16.10f}")
    lines.append("")
    
    # ATOMIC_POSITIONS
    lines.append(f"ATOMIC_POSITIONS {{{coord_type}}}")
    for sym, pos in zip(symbols, positions):
        lines.append(f"  {sym:4s}  {pos[0]:16.10f}  {pos[1]:16.10f}  {pos[2]:16.10f}")
    
    return '\n'.join(lines)

# Example: SrTiO3 perovskite
a_sto = 3.905  # Angstrom
lattice_sto = a_sto * np.eye(3)
positions_sto = np.array([
    [0.0, 0.0, 0.0],   # Sr at corner
    [0.5, 0.5, 0.5],   # Ti at body center
    [0.5, 0.5, 0.0],   # O at face centers
    [0.5, 0.0, 0.5],
    [0.0, 0.5, 0.5]
])
symbols_sto = ['Sr', 'Ti', 'O', 'O', 'O']

print("SrTiO3 Structure for Quantum ESPRESSO:")
print("=" * 50)
print(write_qe_structure(lattice_sto, positions_sto, symbols_sto))

---

## 6. Pre-DFT Checklist

Before running any DFT calculation, verify:

- [ ] **Source**: Structure from reliable database (Materials Project, ICSD, etc.)
- [ ] **Charge neutrality**: Sum of oxidation states × stoichiometry = 0
- [ ] **Bond lengths**: All bonds within reasonable range (ionic radii sum ± 20%)
- [ ] **No overlaps**: Minimum interatomic distance > 1.0 Å
- [ ] **Space group**: Matches expected symmetry
- [ ] **Visualization**: Checked in VESTA or similar
- [ ] **Lattice parameters**: Within ~10% of literature values (if known)

In [None]:
def validate_structure(lattice, positions, symbols, 
                       composition, oxidation_states,
                       min_distance=1.0):
    """
    Perform structure validation checks.
    
    Returns dict with validation results.
    """
    results = {'passed': True, 'checks': {}}
    
    # 1. Charge neutrality
    neutral, total_charge = check_charge_neutrality(composition, oxidation_states)
    results['checks']['charge_neutrality'] = {
        'passed': neutral,
        'total_charge': total_charge
    }
    if not neutral:
        results['passed'] = False
    
    # 2. Minimum interatomic distance
    n_atoms = len(positions)
    min_dist = float('inf')
    
    # Convert to Cartesian if needed
    cart_positions = positions @ lattice
    
    for i in range(n_atoms):
        for j in range(i+1, n_atoms):
            # Check in periodic images too
            for dx in [-1, 0, 1]:
                for dy in [-1, 0, 1]:
                    for dz in [-1, 0, 1]:
                        shift = dx * lattice[0] + dy * lattice[1] + dz * lattice[2]
                        dist = np.linalg.norm(cart_positions[i] - cart_positions[j] - shift)
                        if dist > 0.01:  # Exclude self
                            min_dist = min(min_dist, dist)
    
    results['checks']['min_distance'] = {
        'passed': min_dist > min_distance,
        'value': min_dist,
        'threshold': min_distance
    }
    if min_dist <= min_distance:
        results['passed'] = False
    
    # 3. Cell volume sanity check (not too small, not too large)
    volume = abs(np.linalg.det(lattice))
    vol_per_atom = volume / n_atoms
    results['checks']['volume'] = {
        'total': volume,
        'per_atom': vol_per_atom,
        'passed': 5 < vol_per_atom < 100  # Reasonable range
    }
    
    return results

# Validate SrTiO3
print("\n" + "=" * 60)
print("Structure Validation: SrTiO3")
print("=" * 60)

validation = validate_structure(
    lattice=lattice_sto,
    positions=positions_sto,
    symbols=symbols_sto,
    composition={'Sr': 1, 'Ti': 1, 'O': 3},
    oxidation_states={'Sr': 2, 'Ti': 4, 'O': -2}
)

print(f"\nOverall: {'✓ PASSED' if validation['passed'] else '✗ FAILED'}")
print("\nDetailed checks:")
for check, result in validation['checks'].items():
    status = '✓' if result['passed'] else '✗'
    print(f"  {status} {check}: {result}")

---

## Summary

In this notebook, we learned:

1. ✓ How to check charge neutrality of compounds
2. ✓ Shannon ionic radii for bond length estimation
3. ✓ Lattice parameter estimation methods
4. ✓ Space group identification with spglib
5. ✓ Converting structures to QE format
6. ✓ Comprehensive validation checklist

### Key Takeaway

**Never skip structure validation!** A few minutes of checking can save hours of meaningless calculations.

### Next Notebook
→ **03_DFT_Setup_Fundamentals.ipynb**: Choosing functionals, pseudopotentials, and calculation parameters