In [None]:
!pip install selfies

import selfies as sf



# Standard Usage
First let's try translating from SMILES to SELFIES, and then from SELFIES to SMILES. We will use a non-fullerene acceptor for organic solar cells as an example.

In [None]:
smiles = "CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)" \
         "C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1"
encoded_selfies = sf.encoder(smiles)  # SMILES --> SEFLIES
decoded_smiles = sf.decoder(encoded_selfies)  # SELFIES --> SMILES

print(f"Original SMILES: {smiles}")
print(f"Translated SELFIES: {encoded_selfies}")
print(f"Translated SMILES: {decoded_smiles}")

Original SMILES: CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1
Translated SELFIES: [C][N][C][Branch1_3][epsilon][=O][C][=C][Branch2_3][epsilon][S][c][c][c][c][Branch1_3][Ring2][s][Ring1][Ring2][-c][s][c][Branch1_3][O][-c][n][c][c][Branch1_3][Ring1][C][#N][s][Ring1][Branch1_2][c][c][Ring1][F][C][Ring1][=N][O][C][C][O][Ring1][Ring2][N][Branch1_3][epsilon][C][C][Branch1_3][epsilon][=O][C][Ring2][epsilon][Branch2_3][=C][Ring2][epsilon][N][c][c][c][c][Branch1_3][Ring2][s][Ring1][Ring2][-c][s][c][Branch1_3][O][-c][n][c][c][Branch1_3][Ring1][C][#N][s][Ring1][Branch1_2][c][c][Ring1][F][C][Ring1][=N][O][C][C][O][Ring1][Ring2]
Translated SMILES: CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1


When comparing the original and decoded SMILES, do not use `==` equality. Use RDKit to check whether both SMILES correspond to the same molecule.

In [None]:
print(f"== Equals: {smiles == decoded_smiles}")

# Recomended 
# can_smiles = Chem.CanonSmiles(smiles)
# can_decoded_smiles = Chem.CanonSmiles(decoded_smiles)
# print(f"RDKit Equals: {can_smiles == can_decoded_smiles}")

== Equals: True


# Advanced Usage
Now let's try to customize the SELFIES constraints. We will first look at the default SELFIES semantic constraints. 

In [None]:
default_constraints = sf.get_semantic_constraints()
print(f"Default Constraints:\n {default_constraints}")
print()

AttributeError: ignored

We have two compounds here, CS=CC#S and \[Li\]=CC in SELFIES form. Under the default SELFIES settings, they are translated like so. Note that since Li is not recognized by SELFIES, it is constrained to 8 bonds by default.

In [None]:
c_s_compound = sf.encoder("CS=CC#S")
li_compound = sf.encoder("[Li]=CC")

print(f"\t CS=CC#S --> {sf.decoder(c_s_compound)}")
print(f"\t [Li]=CC --> {sf.decoder(li_compound)}")

We can add Li to the SELFIES constraints, and restrict it to 1 bond only. We can also restrict S to 2 bonds (instead of its default 6). After setting the new constraints, we can check to see if they were updated.

In [None]:
new_constraints = default_constraints
new_constraints['Li'] = 1
new_constraints['S'] = 2

sf.set_semantic_constraints(new_constraints)  # update constraints 

print(f"Updated Constraints:\n {sf.get_semantic_constraints()}")

Under our new settings, our previous molecules are translated like so. Notice that our new semantic constraints are met.

In [None]:
print(f"\t CS=CC#S --> {sf.decoder(c_s_compound)}")
print(f"\t [Li]=CC --> {sf.decoder(li_compound)}")

To revert back to the default constraints, simply call: 

In [None]:
sf.set_semantic_constraints()