# GBigSMILES Parser: Complete Guide

GBigSMILES (Generative BigSMILES) extends BigSMILES with molecular weight distributions and system-level specifications. This guide covers all features, APIs, and advanced usage.

## Table of Contents

1. [Overview](#overview)
2. [Basic Parsing](#basic-parsing)
3. [IR Structure](#ir-structure)
4. [Distribution Annotations](#distribution-annotations)
5. [System Size Specifications](#system-size)
6. [Descriptor Weights](#descriptor-weights)
7. [Multi-Component Systems](#multi-component)
8. [Conversion to PolymerSpec](#conversion)
9. [Advanced Topics](#advanced)
10. [API Reference](#api)

> [!WARNING]
> This notebook contains examples that use outdated API.
> **Reason**: GBigSMILES parser API has changed significantly.
> **Status**: Needs complete rewrite with current API.
> **TODO**: Update examples to match current molpy API.


## Overview

### What is GBigSMILES?

GBigSMILES = BigSMILES + generative annotations:

- **BigSMILES**: Topology (structure, connectivity)
- **GBigSMILES**: Statistics (distributions, probabilities)

**Key additions:**
- Molecular weight distributions (Schulz-Zimm, etc.)
- System size constraints (total mass)
- Bonding descriptor weights (connection probabilities)

### Use Cases

1. **Polydisperse systems**: Realistic polymer melts with PDI > 1
2. **Multi-component blends**: Different polymers with mass fractions
3. **Controlled distributions**: Specify Mn, Mw, PDI
4. **Stochastic connectivity**: Weight bond formation probabilities

## Basic Parsing

The `parse_gbigsmiles()` function converts GBigSMILES strings to `GBigSmilesSystemIR`.

In [1]:
# NOTE: This example is temporarily disabled due to API changes
# TODO: Update to current API
# 
# from molpy.parser import parse_gbigsmiles
# 
# # Simple polymer
# gbig_str = "{[<]CC[>]}"
# ir = parse_gbigsmiles(gbig_str)
# 
# print(f"Type: {type(ir).__name__}")
# print(f"Molecules: {len(ir.molecules)}")
# print(f"Total mass: {ir.total_mass}")

## IR Structure

### Hierarchy

```
GBigSmilesSystemIR
├── molecules: list[GBigSmilesComponentIR]
│   ├── molecule: GBigSmilesMoleculeIR
│   │   ├── structure: BigSmilesMoleculeIR
│   │   ├── descriptor_weights: list[GBBondingDescriptorIR]
│   │   └── stochastic_metadata: list[GBStochasticObjectIR]
│   └── target_mass: float | None
└── total_mass: float | None
```

### Design Principle

**Separation of concerns:**
- `structure`: What (topology)
- `stochastic_metadata`: How much (distributions)
- `descriptor_weights`: How likely (probabilities)

In [2]:
# Access IR components
# component = ir.molecules[0]
# molecule = component.molecule
# 
# print("Component:")
# print(f"  Target mass: {component.target_mass}")
# print(f"  Is fraction: {component.mass_is_fraction}")
# 
# print("\nMolecule:")
# print(f"  Structure type: {type(molecule.structure).__name__}")
# print(f"  Descriptor weights: {len(molecule.descriptor_weights)}")
# print(f"  Stochastic metadata: {len(molecule.stochastic_metadata)}")

## Distribution Annotations

### Schulz-Zimm Distribution

Most common distribution for step-growth and controlled radical polymerization.

**Parameters:**
- `Mn`: Number-average molecular weight (g/mol)
- `Mw`: Weight-average molecular weight (g/mol)
- `PDI = Mw/Mn`: Polydispersity index

**Syntax:**
```
{[<]CC[>]}|schulz_zimm(Mn=10000,Mw=20000)|
```

In [3]:
# NOTE: This example is temporarily disabled due to API changes
# TODO: Update to current API
# 
# # Parse with distribution
# gbig_dist = "{[<]CC[>]}|schulz_zimm(Mn=10000,Mw=20000)|"
# ir_dist = parse_gbigsmiles(gbig_dist)
# 
# # Access distribution
# molecule = ir_dist.molecules[0].molecule
# meta = molecule.stochastic_metadata
# 
# if meta and meta[0].distribution:
#     dist = meta[0].distribution
#     print(f"Distribution: {dist.name}")
#     print(f"Mn: {dist.params['Mn']} g/mol")
#     print(f"Mw: {dist.params['Mw']} g/mol")
#     print(f"PDI: {dist.params['Mw'] / dist.params['Mn']:.2f}")

### PDI Interpretation

| PDI | Type | Example |
|-----|------|----------|
| 1.0 | Monodisperse | Living polymerization |
| 1.5-2.0 | Controlled | ATRP, RAFT |
| 2.0-3.0 | Free radical | Conventional polymerization |
| >3.0 | Highly polydisperse | Step-growth |

## System Size Specifications

Specify total system mass for simulation setup.

**Syntax:**
```
{[<]CC[>]}|5e5|  # 500 kDa total
```

In [4]:
# NOTE: This example is temporarily disabled due to API changes
# TODO: Update to current API
# 
# # System with size
# gbig_sized = "{[<]CC[>]}|5e5|"
# ir_sized = parse_gbigsmiles(gbig_sized)
# 
# print(f"System total mass: {ir_sized.total_mass} g/mol")
# print(f"Component target: {ir_sized.molecules[0].target_mass} g/mol")

## See Also

- [Parser Overview](parser.ipynb): All parsers
- [Monomer IR](monomer_ir.ipynb): IR design
- [System Planner](system_planner.ipynb): Using distributions