# VSEPR Geometry Predictor 🧮🔺

*General Chemistry & Cyberinfrastructure Skills Module*

### Warm‑Up Questions

**WQ‑1.** What does **VSEPR** stand for, and what is the fundamental principle behind this theory? How does it differ from hybridisation theory?

<span style="color:cyan"><strong>Free response:</strong> YOUR RESPONSE TEXT HERE </span>

**WQ‑2.** Why do lone pairs take up more space around an atom than bonding pairs? How does this affect molecular geometry?

<span style="color:cyan"><strong>Free response:</strong> YOUR RESPONSE TEXT HERE </span>


## Learning Objective
Predict the **molecular geometry** of simple molecules using **VSEPR (Valence‑Shell Electron‑Pair Repulsion) theory**.

## Prerequisites
- Python ≥ 3.8
- **RDKit** for molecular parsing
- *(Optional)* **py3Dmol** for 3‑D visualisation

If you’re on Google Colab, run the install cell below first.

In [None]:
# !pip install rdkit-pypi py3Dmol -q  # ← Uncomment on first run 

from rdkit import Chem
from rdkit.Chem import AllChem

try:
    import py3Dmol
    def show_3d(mol):
        m2 = Chem.AddHs(mol)
        AllChem.EmbedMolecule(m2, randomSeed=0xC0FFEE)
        AllChem.UFFOptimizeMolecule(m2)
        mb = Chem.MolToMolBlock(m2)
        v = py3Dmol.view()
        v.addModel(mb, 'mol')
        v.setStyle({'stick': {}})
        v.zoomTo()
        return v.show()
except ModuleNotFoundError:
    def show_3d(_):
        print('Install py3Dmol for 3‑D visualisation.')


## Quick Concept Check 🔍
VSEPR predicts geometry from two key numbers for the **central atom**:
1. **Steric number** = σ‑bonds + lone‑pairs.
2. Number of **lone pairs**.

| Steric # | Lone pairs | Electron‑pair geometry | **Molecular geometry** |
|:--------:|:----------:|------------------------|------------------------|
| 2 | 0 | Linear | Linear |
| 3 | 0 | Trigonal planar | Trigonal planar |
| 3 | 1 | Trigonal planar | Bent (∼120°) |
| 4 | 0 | Tetrahedral | Tetrahedral |
| 4 | 1 | Tetrahedral | Trigonal pyramidal |
| 4 | 2 | Tetrahedral | Bent (∼109°) |
| 5 | 0 | Trigonal bipyramidal | Trigonal bipyramidal |
| 5 | 1 | Trigonal bipyramidal | Seesaw |
| 5 | 2 | Trigonal bipyramidal | T‑shaped |
| 5 | 3 | Trigonal bipyramidal | Linear |
| 6 | 0 | Octahedral | Octahedral |
| 6 | 1 | Octahedral | Square pyramidal |
| 6 | 2 | Octahedral | Square planar |

In [None]:
GEOM_MAP = {
    (2, 0): 'linear',
    (3, 0): 'trigonal planar',
    (3, 1): 'bent (120°)',
    (4, 0): 'tetrahedral',
    (4, 1): 'trigonal pyramidal',
    (4, 2): 'bent (109°)',
    (5, 0): 'trigonal bipyramidal',
    (5, 1): 'seesaw',
    (5, 2): 'T-shaped',
    (5, 3): 'linear',
    (6, 0): 'octahedral',
    (6, 1): 'square pyramidal',
    (6, 2): 'square planar'
}

def geometry_from_counts(steric, lps):
    """Return geometry string or 'unknown'."""
    return GEOM_MAP.get((steric, lps), 'unknown')


### Helper: Count σ‑bonds & Lone Pairs
RDKit stores **explicit** hydrogens and implicit valence; we’ll leverage that to estimate σ‑bonds and lone pairs.

In [None]:
def steric_and_lps(atom):
    """Return (steric_number, lone_pairs) with improved calculation for formal charges."""
    VE = {'H':1,'B':3,'C':4,'N':5,'O':6,'F':7,'P':5,'S':6,'Cl':7,'Br':7,'I':7,'Xe':8}
    sym = atom.GetSymbol()
    ve = VE.get(sym, None)
    if ve is None:
        raise ValueError(f'Valence electrons unknown for {sym}')
    
    # Count σ bonds
    sigma = atom.GetTotalDegree()
    
    # Improved calculation: account for formal charge properly
    total_electrons = ve - atom.GetFormalCharge()  # Formal charge affects electron count
    bond_order_sum = sum(b.GetBondTypeAsDouble() for b in atom.GetBonds())
    
    # Lone pairs = (total_electrons - electrons_used_in_bonds) / 2
    electrons_used_in_bonds = int(bond_order_sum)
    lps = max(0, (total_electrons - electrons_used_in_bonds) // 2)
    
    steric = sigma + lps
    return steric, lps


## Worked Examples
We’ll predict geometries for three classic molecules.

In [None]:
examples = {
    'Water'   : 'O',        # H2O after AddHs
    'Ammonia' : 'N',        # NH3
    'Carbon dioxide': 'O=C=O'
}

for name, smi in examples.items():
    mol = Chem.AddHs(Chem.MolFromSmiles(smi))
    central = max(
        (a for a in mol.GetAtoms() if a.GetSymbol() != "H"),
        key=lambda a: a.GetTotalDegree()
    ) 
    sn, lps = steric_and_lps(central)
    geom = geometry_from_counts(sn, lps)
    print(f'{name:<15s} → steric {sn}, lone pairs {lps} ⇒ {geom}')
    show_3d(mol)


## Your Turn 📝
1. Choose **three** molecules with a single obvious central atom (e.g. *BF₃*, *XeF₂*, *IF₅*).
2. Use `steric_and_lps` and `geometry_from_counts` to predict their geometries.
3. Verify visually with the 3‑D model.

*(Tip: Add explicit hydrogens with `Chem.AddHs`.)*

In [None]:
# TODO 1: Replace with your own SMILES strings
my_smiles = ['B(F)(F)F']  # ← EDIT ME

for smi in my_smiles:
    mol = Chem.AddHs(Chem.MolFromSmiles(smi))
    # TODO 2: Select the central atom wisely (maybe the least electronegative?)
    central = mol.GetAtomWithIdx(0)  # ← You might need smarter logic!
    sn, lps = steric_and_lps(central)
    geom = geometry_from_counts(sn, lps)
    print(f'{smi:>15s} → steric {sn}, LP {lps} ⇒ {geom}')
    # Visualise the structure
    show_3d(mol)


### Critical‑Thinking Questions

**CTQ‑1.** Why might VSEPR theory **fail** for molecules with very large atoms or unusual bonding situations? Give an example where VSEPR predictions might not match experimental observations.

<span style="color:cyan"><strong>Free response:</strong> YOUR RESPONSE TEXT HERE </span>

**CTQ‑2.** How does the presence of **lone pairs** affect the bond angles in a molecule? Why do lone pairs cause bond angles to be smaller than the ideal electron-pair geometry would suggest?

<span style="color:cyan"><strong>Free response:</strong> YOUR RESPONSE TEXT HERE </span>


### Challenge: Automate Central‑Atom Detection
Write a function that *automatically* finds the most likely central atom (hint: often the atom that can form the most bonds or is least electronegative among non‑hydrogen atoms).

In [None]:
def find_central_atom(mol):
    """Return the atom idx most likely to be central.
    Currently a stub – improve me!"""
    # TODO 4: Implement heuristic
    return 0


## Summary & Next Steps
- **VSEPR** uses steric number + lone pairs to predict molecular geometry.
- Programmatic prediction lets you screen many molecules quickly.
- Enhance accuracy by refining lone‑pair counting and central‑atom detection.

> **Extension:** Combine this approach with web APIs (PubChem, Materials Project) to build a database of predicted geometries.