# CGSmiles Tutorial: Building Coarse-Grained Polymer Structures

This notebook demonstrates how to use CGSmiles (Coarse-Grained SMILES) notation to define polymer structures in MolPy.

**CGSmiles Features:**
- Coarse-grained representation with labeled nodes (e.g., `[#PEO]`, `[#PMA]`)
- Support for linear chains, branches, and rings
- Repeat operators for efficient notation (`|N`)
- Fragment definitions for reusable building blocks
- Annotations for node properties (e.g., `[#PEO;q=1]`)

## Import Libraries

In [1]:
from molpy.parser.smiles import parse_cgsmiles
import molpy as mp

## Example 1: Linear Chain

A simple linear polymer chain with alternating monomers.

**CGSmiles notation:**
- `{[#PEO][#PMA][#PEO]}` - Three nodes in sequence
- Default bond order is single bond
- Nodes are connected sequentially

In [2]:
# Parse a simple linear chain
cgsmiles = "{[#PEO][#PMA][#PEO]}"
result = parse_cgsmiles(cgsmiles)

print(f"Linear Chain: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Labels: {[n.label for n in result.base_graph.nodes]}")
print(f"Bonds: {len(result.base_graph.bonds)}")

for i, bond in enumerate(result.base_graph.bonds):
    idx_i = result.base_graph.nodes.index(bond.node_i)
    idx_j = result.base_graph.nodes.index(bond.node_j)
    print(f"  Bond {i}: {idx_i}({bond.node_i.label}) -> {idx_j}({bond.node_j.label}) (order={bond.order})")

Linear Chain: {[#PEO][#PMA][#PEO]}
Nodes: 3
Labels: ['PEO', 'PMA', 'PEO']
Bonds: 2
  Bond 0: 0(PEO) -> 1(PMA) (order=1)
  Bond 1: 1(PMA) -> 2(PEO) (order=1)


## Example 2: Using Repeat Operators

Repeat operators (`|N`) allow efficient representation of repeated units.

**CGSmiles notation:**
- `{[#PMA]|10}` - 10 PMA nodes connected in sequence
- `{[#PEO]|5=[#PMA]|5}` - 5 PEO nodes, then double bond, then 5 PMA nodes

In [3]:
# Repeat operator example
cgsmiles = "{[#PMA]|10}"
result = parse_cgsmiles(cgsmiles)

print(f"Repeated Chain: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Bonds: {len(result.base_graph.bonds)}")

# With explicit bond after repeat
cgsmiles = "{[#PEO]|5=[#PMA]|5}"
result = parse_cgsmiles(cgsmiles)

print(f"\nBlock Copolymer: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Labels: {[n.label for n in result.base_graph.nodes]}")

Repeated Chain: {[#PMA]|10}
Nodes: 10
Bonds: 9

Block Copolymer: {[#PEO]|5=[#PMA]|5}
Nodes: 10
Labels: ['PEO', 'PEO', 'PEO', 'PEO', 'PEO', 'PMA', 'PMA', 'PMA', 'PMA', 'PMA']


## Example 3: Ring Structures

Ring closures use numeric labels to connect non-adjacent nodes.

**CGSmiles notation:**
- `{[#PMA]1[#PEO][#PMA]1}` - Triangle: PMA-PEO-PMA with ring closure
- `{[#A]1[#B][#C][#D]1}` - Square ring
- `{[#PMA]1=[#PEO][#PMA]1}` - Ring with double bond

**Ring notation:**
- First occurrence of `1` opens the ring
- Second occurrence of `1` closes the ring
- Bond order can be specified: `=1` for double bond

In [4]:
# Simple ring (triangle)
cgsmiles = "{[#PMA]1[#PEO][#PMA]1}"
result = parse_cgsmiles(cgsmiles)

print(f"Triangle Ring: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Bonds: {len(result.base_graph.bonds)}")

for i, bond in enumerate(result.base_graph.bonds):
    idx_i = result.base_graph.nodes.index(bond.node_i)
    idx_j = result.base_graph.nodes.index(bond.node_j)
    print(f"  Bond {i}: {idx_i} -> {idx_j}")

# Square ring
cgsmiles = "{[#A]1[#B][#C][#D]1}"
result = parse_cgsmiles(cgsmiles)

print(f"\nSquare Ring: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Bonds: {len(result.base_graph.bonds)}")

# Ring with double bond
cgsmiles = "{[#PMA]1=[#PEO][#PMA]1}"
result = parse_cgsmiles(cgsmiles)

print(f"\nRing with Double Bond: {cgsmiles}")
for i, bond in enumerate(result.base_graph.bonds):
    idx_i = result.base_graph.nodes.index(bond.node_i)
    idx_j = result.base_graph.nodes.index(bond.node_j)
    print(f"  Bond {i}: {idx_i} -> {idx_j} (order={bond.order})")

Triangle Ring: {[#PMA]1[#PEO][#PMA]1}
Nodes: 3
Bonds: 3
  Bond 0: 0 -> 1
  Bond 1: 1 -> 2
  Bond 2: 0 -> 2

Square Ring: {[#A]1[#B][#C][#D]1}
Nodes: 4
Bonds: 4

Ring with Double Bond: {[#PMA]1=[#PEO][#PMA]1}
  Bond 0: 0 -> 1 (order=2)
  Bond 1: 1 -> 2 (order=1)
  Bond 2: 0 -> 2 (order=1)


## Example 4: Branched Structures

Branches are denoted with parentheses `()`.

**CGSmiles notation:**
- `{[#PMA]([#PEO])[#PMA]}` - PMA with PEO side chain
- `{[#PMA](=[#PEO])[#PMA]}` - Branch with double bond
- `{[#PMA]([#PEO][#PEO])[#PMA]}` - Branch with multiple nodes

**Branch notation:**
- `(chain)` - Branch with default single bond
- `(=chain)` - Branch with double bond
- Branches can contain multiple nodes and even nested branches

In [5]:
# Simple branch
cgsmiles = "{[#PMA]([#PEO])[#PMA]}"
result = parse_cgsmiles(cgsmiles)

print(f"Simple Branch: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Labels: {[n.label for n in result.base_graph.nodes]}")
print(f"Bonds: {len(result.base_graph.bonds)}")

# Branch with explicit bond
cgsmiles = "{[#PMA](=[#PEO])[#PMA]}"
result = parse_cgsmiles(cgsmiles)

print(f"\nBranch with Double Bond: {cgsmiles}")
for i, bond in enumerate(result.base_graph.bonds):
    idx_i = result.base_graph.nodes.index(bond.node_i)
    idx_j = result.base_graph.nodes.index(bond.node_j)
    print(f"  Bond {i}: {idx_i}({bond.node_i.label}) -> {idx_j}({bond.node_j.label}) (order={bond.order})")

# Complex branched structure
cgsmiles = "{[#PMA]([#PEO][#PEO]=[#OH])|3}"
result = parse_cgsmiles(cgsmiles)

print(f"\nBranched with Repeat: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Structure: 3 PMA units, each with a PEO-PEO=OH branch")

Simple Branch: {[#PMA]([#PEO])[#PMA]}
Nodes: 3
Labels: ['PMA', 'PEO', 'PMA']
Bonds: 2

Branch with Double Bond: {[#PMA](=[#PEO])[#PMA]}
  Bond 0: 0(PMA) -> 1(PEO) (order=2)
  Bond 1: 0(PMA) -> 2(PMA) (order=1)

Branched with Repeat: {[#PMA]([#PEO][#PEO]=[#OH])|3}
Nodes: 12
Structure: 3 PMA units, each with a PEO-PEO=OH branch


## Example 5: Nested Branches

Branches can be nested to create complex dendritic structures.

**CGSmiles notation:**
- `{[#PMA]([#PEO]([#OH]))}` - Branch with sub-branch
- Inner parentheses create branches on the branch

In [6]:
# Nested branches (dendritic structure)
cgsmiles = "{[#PMA][#PMA]([#PEO][#PEO]([#OH])[#PEO])[#PMA]}"
result = parse_cgsmiles(cgsmiles)

print(f"Dendritic Structure: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")
print(f"Labels: {[n.label for n in result.base_graph.nodes]}")
print(f"Bonds: {len(result.base_graph.bonds)}")

print("\nStructure:")
print("  Main chain: PMA-PMA-PMA")
print("  Branch on 2nd PMA: PEO-PEO-PEO")
print("  Sub-branch on 2nd PEO: OH")

Dendritic Structure: {[#PMA][#PMA]([#PEO][#PEO]([#OH])[#PEO])[#PMA]}
Nodes: 7
Labels: ['PMA', 'PMA', 'PEO', 'PEO', 'OH', 'PEO', 'PMA']
Bonds: 6

Structure:
  Main chain: PMA-PMA-PMA
  Branch on 2nd PMA: PEO-PEO-PEO
  Sub-branch on 2nd PEO: OH


## Example 6: Annotations

Nodes can have annotations for properties like charge, mass, etc.

**CGSmiles notation:**
- `{[#PEO;q=1]}` - PEO with charge annotation
- `{[#PEO;q=1;mass=44]}` - Multiple annotations
- Annotations are key=value pairs separated by semicolons

In [7]:
# Annotations example
cgsmiles = "{[#PEO;q=1][#PMA;q=-1][#PEO;q=1]}"
result = parse_cgsmiles(cgsmiles)

print(f"Annotated Chain: {cgsmiles}")
print(f"Nodes: {len(result.base_graph.nodes)}")

for i, node in enumerate(result.base_graph.nodes):
    print(f"  Node {i}: {node.label}, annotations={node.annotations}")

Annotated Chain: {[#PEO;q=1][#PMA;q=-1][#PEO;q=1]}
Nodes: 3
  Node 0: PEO, annotations={'q': '1'}
  Node 1: PMA, annotations={'q': '-1'}
  Node 2: PEO, annotations={'q': '1'}


## Example 7: Fragment Definitions

Fragments allow defining reusable building blocks.

**CGSmiles notation:**
- Base graph: `{[#PEO][#PMA]}`
- Fragment definition: `.{#PEO=[$]COC[$]}`
- `[$]` marks connection points in the fragment

The fragment definition maps the coarse-grained label to its atomistic SMILES representation.

In [8]:
# Fragment definitions
cgsmiles = "{[#PEO][#PMA]}.{#PEO=[$]COC[$],#PMA=[$]CC(C)C[$]}"
result = parse_cgsmiles(cgsmiles)

print(f"CGSmiles with Fragments: {cgsmiles}")
print(f"\nBase Graph:")
print(f"  Nodes: {[n.label for n in result.base_graph.nodes]}")

print(f"\nFragment Definitions: {len(result.fragments)}")
for frag in result.fragments:
    print(f"  {frag.name} = {frag.body}")

CGSmiles with Fragments: {[#PEO][#PMA]}.{#PEO=[$]COC[$],#PMA=[$]CC(C)C[$]}

Base Graph:
  Nodes: ['PEO', 'PMA']

Fragment Definitions: 2
  PEO = [$]COC[$]
  PMA = [$]CC(C)C[$]


## Example 8: Complex Structure - Star Polymer

Combining rings and branches to create a star polymer with a cyclic core.

In [9]:
# Star polymer: cyclic core with 4 arms
cgsmiles = "{[#Core]1([#PEO]|5)[#Core]([#PMA]|5)[#Core]([#PEO]|5)[#Core]1([#PMA]|5)}"
result = parse_cgsmiles(cgsmiles)

print(f"Star Polymer: {cgsmiles}")
print(f"\nStructure:")
print(f"  Total nodes: {len(result.base_graph.nodes)}")
print(f"  Core nodes: 4 (forming a ring)")
print(f"  Arms: 4 branches, each with 5 nodes")
print(f"  Total bonds: {len(result.base_graph.bonds)}")

Star Polymer: {[#Core]1([#PEO]|5)[#Core]([#PMA]|5)[#Core]([#PEO]|5)[#Core]1([#PMA]|5)}

Structure:
  Total nodes: 24
  Core nodes: 4 (forming a ring)
  Arms: 4 branches, each with 5 nodes
  Total bonds: 24


## Summary

This notebook demonstrated CGSmiles notation for coarse-grained polymer structures:

1. ✅ **Linear chains** - Sequential node connections
2. ✅ **Repeat operators** - Efficient notation with `|N`
3. ✅ **Ring structures** - Numeric labels for ring closures
4. ✅ **Branched structures** - Parentheses for side chains
5. ✅ **Nested branches** - Complex dendritic architectures
6. ✅ **Annotations** - Node properties with key=value pairs
7. ✅ **Fragment definitions** - Mapping CG labels to atomistic SMILES
8. ✅ **Complex structures** - Star polymers combining rings and branches

**Key CGSmiles Syntax:**
- `{...}` - Base graph
- `[#Label]` - Node with label
- `[#Label;key=value]` - Node with annotations
- `|N` - Repeat operator
- `1`, `2`, etc. - Ring closure labels
- `(...)` - Branch
- `-`, `=`, `#`, `$` - Bond orders (single, double, triple, aromatic)
- `.{#Label=SMILES}` - Fragment definition