# Intermediate Representation (IR): Complete Guide

IR is the bridge between parsers and data structures. This guide covers all IR types, design principles, and conversion patterns.

## Table of Contents

1. [Why IR Exists](#why)
2. [IR Families](#families)
3. [SMILES IR](#smiles-ir)
4. [BigSMILES IR](#bigsmiles-ir)
5. [GBigSMILES IR](#gbigsmiles-ir)
6. [Conversion Patterns](#conversion)
7. [Design Principles](#design)
8. [Advanced Topics](#advanced)

## Why IR Exists

### The Problem

Parsers convert strings to structures, but:
- Different users want different output formats
- One parser shouldn't be tied to one data structure
- Testing parsers shouldn't require full data structures

### The Solution

**Two-stage architecture:**

1. **Parser → IR**: String → Simple dataclass
2. **Converter → Target**: IR → Your format

**Benefits:**
- One parser, many outputs
- Easy testing (just check IR)
- No framework lock-in
- Lossless (IR preserves everything)

## IR Families

### Three Main Types

```
SmilesGraphIR          # Small molecules
├── atoms: list[SmilesAtomIR]
├── bonds: list[SmilesBondIR]
└── extras: dict

BigSmilesMoleculeIR    # Polymers
├── backbone: BigSmilesSubgraphIR
├── stochastic_objects: list[StochasticObjectIR]
└── extras: dict

GBigSmilesSystemIR     # Polydisperse systems
├── molecules: list[GBigSmilesComponentIR]
└── total_mass: float | None
```

## SMILES IR

### SmilesAtomIR

Captures all SMILES atom features:

- `symbol`: Element (C, N, O, etc.)
- `isotope`: Isotope number (13 for ¹³C)
- `charge`: Formal charge
- `hydrogen_count`: Explicit H count
- `aromatic`: Aromatic flag
- `atom_class`: Reaction mapping
- `chirality`: Stereochemistry

In [1]:
from molpy.parser.smiles import parse_smiles

# Complex atom
ir = parse_smiles("[13C@@H](O)(N)C")
atom = ir.atoms[0]

print(f"Symbol: {atom.element}")
print(f"Isotope: {atom.isotope if hasattr(atom, 'isotope') else None}")
print(f"Chirality: {atom.chirality if hasattr(atom, 'chirality') else None}")
print(f"H count: {atom.hydrogens if hasattr(atom, 'hydrogens') else 0}")

Symbol: C
Isotope: None
Chirality: None
H count: 1


## Design Principles

### 1. Immutability

IR objects are dataclasses—create once, don't mutate.

**Why?** Safe to pass around, easy to reason about.

### 2. No Logic

IR has no methods beyond `__init__`. All logic in converters.

**Why?** Keeps IR simple and testable.

### 3. Lossless

IR preserves everything from input string.

**Why?** Converters decide what to keep/discard.

## See Also

- [Parser Overview](parser.ipynb): How to parse
- [GBigSMILES Parser](gbigsmiles_parser.ipynb): Generative IR
- [Python IR System](../developer/python_ir_system.ipynb): IR patterns