# Tutorial

In [1]:
import selfies as sf
from rdkit import Chem

## Using SELFIES
First, let's try translating from SMILES to SELFIES, and then from SELFIES to SMILES. We will use a non-fullerene acceptor for organic solar cells as an example.

In [2]:
smiles = "CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)" \
         "C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1"
encoded_selfies = sf.encoder(smiles)          # SMILES  -> SEFLIES
decoded_smiles = sf.decoder(encoded_selfies)  # SELFIES -> SMILES

print(f"Original SMILES:    {smiles}")
print(f"Translated SELFIES: {encoded_selfies}")
print(f"Translated SMILES:  {decoded_smiles}")

Original SMILES:    CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1
Translated SELFIES: [C][N][C][=Branch1][C][=O][C][=C][Branch2][Ring2][#Branch1][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][S][C][Branch1][N][C][=N][C][=C][Branch1][Ring1][C][#N][S][Ring1][#Branch1][=C][C][=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1][N][Branch1][C][C][C][=Branch1][C][=O][C][Ring2][Ring1][=N][=C][Ring2][Ring1][P][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][S][C][Branch1][N][C][=N][C][=C][Branch1][Ring1][C][#N][S][Ring1][#Branch1][=C][C][=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1]
Translated SMILES:  CN1C(=O)C2=C(C3=CC4=C(S3)C=5SC(C6=NC=C(C#N)S6)=CC=5C47OCCO7)N(C)C(=O)C2=C1C8=CC9=C(S8)C=%10SC(C%11=NC=C(C#N)S%11)=CC=%10C9%12OCCO%12


When comparing the original and decoded SMILES, do not use `==` equality. Use RDKit to check whether both SMILES represent the same molecule.

In [3]:
print(f"String Equals: {smiles == decoded_smiles}")

# Recomended 
can_smiles = Chem.CanonSmiles(smiles)
can_decoded_smiles = Chem.CanonSmiles(decoded_smiles)
print(f"RDKit Equals:  {can_smiles == can_decoded_smiles}")

String Equals: False
RDKit Equals:  True


## Customizing SELFIES
Now let's try to customize the SELFIES constraints. We will first look at the current (default) semantic constraints. 

In [4]:
default_constraints = sf.get_semantic_constraints()

print(f"Default Constraints:\n {default_constraints}")

Default Constraints:
 {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'O+1': 3, 'O-1': 1, 'N': 3, 'N+1': 4, 'N-1': 2, 'C': 4, 'C+1': 5, 'C-1': 3, 'P': 5, 'P+1': 6, 'P-1': 4, 'S': 6, 'S+1': 7, 'S-1': 5, '?': 8}


We have two compounds here, `CS=CC#S` and `[Li]=CC` in SELFIES form. Under the default SELFIES constraints, they are translated like so. Note that Li is constrained to a maximum of 8 bonds by default.

In [5]:
compound1 = sf.encoder("CS=CC#S")
compound2 = sf.encoder("[Li]=CC")

print(f"CS=CC#S -> {sf.decoder(compound1)}")
print(f"[Li]=CC -> {sf.decoder(compound2)}")

CS=CC#S -> CS=CC#S
[Li]=CC -> [Li]=CC


We can add Li to the SELFIES constraints and restrict it to a maximum of 1 bond. We can also restrict S to a maximum of 2 bonds (instead of its default 6). After setting the new constraints, we can check to see if they were updated.

In [6]:
new_constraints = default_constraints
new_constraints['Li'] = 1
new_constraints['S'] = 2

sf.set_semantic_constraints(new_constraints)  # update constraints 

print(f"Updated Constraints:\n {sf.get_semantic_constraints()}")

Updated Constraints:
 {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'O+1': 3, 'O-1': 1, 'N': 3, 'N+1': 4, 'N-1': 2, 'C': 4, 'C+1': 5, 'C-1': 3, 'P': 5, 'P+1': 6, 'P-1': 4, 'S': 2, 'S+1': 7, 'S-1': 5, '?': 8, 'Li': 1}


Under our new constraints, our previous compounds are translated like so. Notice that the specified bond capacities are met.

In [7]:
print(f"CS=CC#S -> {sf.decoder(compound1)}")
print(f"[Li]=CC -> {sf.decoder(compound2)}")

CS=CC#S -> CSCC=S
[Li]=CC -> [Li]CC


To revert back to the default constraints, simply call: 

In [8]:
sf.set_semantic_constraints()