# VSEPR Geometry Predictor üßÆüî∫

*General Chemistry & Cyberinfrastructure Skills Module*

## Learning Objective
Predict the **molecular geometry** of simple molecules using **VSEPR (Valence‚ÄëShell Electron‚ÄëPair Repulsion) theory**.

## Prerequisites
- Python ‚â•¬†3.8
- **RDKit** for molecular parsing
- *(Optional)* **py3Dmol** for 3‚ÄëD visualisation

If you‚Äôre on Google¬†Colab, run the install cell below first.

In [1]:
# !pip install rdkit-pypi py3Dmol -q  # ‚Üê Uncomment on first run

from rdkit import Chem
from rdkit.Chem import AllChem

try:
    import py3Dmol
    def show_3d(mol):
        m2 = Chem.AddHs(mol)
        AllChem.EmbedMolecule(m2, randomSeed=0xC0FFEE)
        AllChem.UFFOptimizeMolecule(m2)
        mb = Chem.MolToMolBlock(m2)
        v = py3Dmol.view()
        v.addModel(mb, 'mol')
        v.setStyle({'stick': {}})
        v.zoomTo()
        return v.show()
except ModuleNotFoundError:
    def show_3d(_):
        print('Install py3Dmol for 3‚ÄëD visualisation.')


## Quick Concept Check üîç
VSEPR predicts geometry from two key numbers for the **central atom**:
1. **Steric number**¬†=¬†œÉ‚Äëbonds¬†+¬†lone‚Äëpairs.
2. Number of **lone pairs**.

| Steric # | Lone pairs | Electron‚Äëpair geometry | **Molecular geometry** |
|:--------:|:----------:|------------------------|------------------------|
| 2 | 0 | Linear | Linear |
| 3 | 0 | Trigonal planar | Trigonal planar |
| 3 | 1 | Trigonal planar | Bent (‚àº120¬∞) |
| 4 | 0 | Tetrahedral | Tetrahedral |
| 4 | 1 | Tetrahedral | Trigonal pyramidal |
| 4 | 2 | Tetrahedral | Bent (‚àº109¬∞) |
| 5 | 0 | Trigonal bipyramidal | Trigonal bipyramidal |
| 5 | 1 | Trigonal bipyramidal | Seesaw |
| 5 | 2 | Trigonal bipyramidal | T‚Äëshaped |
| 5 | 3 | Trigonal bipyramidal | Linear |
| 6 | 0 | Octahedral | Octahedral |
| 6 | 1 | Octahedral | Square pyramidal |
| 6 | 2 | Octahedral | Square planar |

In [2]:
GEOM_MAP = {
    (2, 0): 'linear',
    (3, 0): 'trigonal planar',
    (3, 1): 'bent (120¬∞)',
    (4, 0): 'tetrahedral',
    (4, 1): 'trigonal pyramidal',
    (4, 2): 'bent (109¬∞)',
    (5, 0): 'trigonal bipyramidal',
    (5, 1): 'seesaw',
    (5, 2): 'T-shaped',
    (5, 3): 'linear',
    (6, 0): 'octahedral',
    (6, 1): 'square pyramidal',
    (6, 2): 'square planar'
}

def geometry_from_counts(steric, lps):
    """Return geometry string or 'unknown'."""
    return GEOM_MAP.get((steric, lps), 'unknown')


### Helper: Count œÉ‚Äëbonds & Lone Pairs
RDKit stores **explicit** hydrogens and implicit valence; we‚Äôll leverage that to estimate œÉ‚Äëbonds and lone pairs.

In [3]:
def steric_and_lps(atom):
    """Return (steric_number, lone_pairs) for *atom*.
    œÉ‚Äëbonds = atom.GetTotalDegree() (includes attached H after AddHs)
    lone_pairs = (valence_electrons - atom.GetFormalCharge() - 2*atom.GetTotalBondOrders()) // 2.
    """
    # Period‚Äëwide valence electron counts for main‚Äëgroup elements
    VE = {'H':1,'C':4,'N':5,'O':6,'F':7,'P':5,'S':6,'Cl':7,'Br':7,'I':7,'Xe':8}
    sym = atom.GetSymbol()
    ve = VE.get(sym, None)
    if ve is None:
        raise ValueError(f'Valence electrons unknown for {sym}')
    # Count œÉ bonds (after explicit Hs)
    sigma = atom.GetTotalDegree()
    bond_order_sum = sum(b.GetBondTypeAsDouble() for b in atom.GetBonds())
    # Approximate lone pairs
    lps = (ve - atom.GetFormalCharge() - int(bond_order_sum)) // 2
    steric = sigma + lps
    return steric, lps


## Worked Examples
We‚Äôll predict geometries for three classic molecules.

In [4]:
examples = {
    'Water'   : 'O',        # H2O after AddHs
    'Ammonia' : 'N',        # NH3
    'Carbon dioxide': 'O=C=O'
}

for name, smi in examples.items():
    mol = Chem.AddHs(Chem.MolFromSmiles(smi))
    central = max(
        (a for a in mol.GetAtoms() if a.GetSymbol() != "H"),
        key=lambda a: a.GetTotalDegree()
    ) 
    sn, lps = steric_and_lps(central)
    geom = geometry_from_counts(sn, lps)
    print(f'{name:<15s} ‚Üí steric {sn}, lone pairs {lps} ‚áí {geom}')
    show_3d(mol)


Water           ‚Üí steric 4, lone pairs 2 ‚áí bent (109¬∞)


Ammonia         ‚Üí steric 4, lone pairs 1 ‚áí trigonal pyramidal


Carbon dioxide  ‚Üí steric 2, lone pairs 0 ‚áí linear


## Your Turn üìù
1. Choose **three** molecules with a single obvious central atom (e.g. *BF‚ÇÉ*, *XeF‚ÇÇ*, *IF‚ÇÖ*).
2. Use `steric_and_lps` and `geometry_from_counts` to predict their geometries.
3. Verify visually with the 3‚ÄëD model.

*(Tip: Add explicit hydrogens with `Chem.AddHs`.)*

In [5]:
# TODO 1: Replace with your own SMILES strings
my_smiles = ['F[B-](F)(F)F', 'ClI(F)(F)F']  # ‚Üê EDIT ME

for smi in my_smiles:
    mol = Chem.AddHs(Chem.MolFromSmiles(smi))
    # TODO 2: Select the central atom wisely (maybe the least electronegative?)
    central = mol.GetAtomWithIdx(0)  # ‚Üê You might need smarter logic!
    sn, lps = steric_and_lps(central)
    geom = geometry_from_counts(sn, lps)
    print(f'{smi:>15s} ‚Üí steric {sn}, LP {lps} ‚áí {geom}')
    # TODO 3: Visualise the structure
    pass


   F[B-](F)(F)F ‚Üí steric 4, LP 3 ‚áí unknown
     ClI(F)(F)F ‚Üí steric 4, LP 3 ‚áí unknown


### Challenge: Automate Central‚ÄëAtom Detection
Write a function that *automatically* finds the most likely central atom (hint: often the atom that can form the most bonds or is least electronegative among non‚Äëhydrogen atoms).

In [6]:
def find_central_atom(mol):
    """Return the atom idx most likely to be central.
    Currently a stub ‚Äì improve me!"""
    # TODO 4: Implement heuristic
    return 0


## Summary & Next Steps
- **VSEPR** uses steric number + lone pairs to predict molecular geometry.
- Programmatic prediction lets you screen many molecules quickly.
- Enhance accuracy by refining lone‚Äëpair counting and central‚Äëatom detection.

> **Extension:** Combine this approach with web APIs (PubChem, Materials‚ÄØProject) to build a database of predicted geometries.