In [1]:
import pandas as pd

In [2]:
# Prevents truncation of long strings
pd.set_option('display.max_colwidth', None)

## Reading in Essential Medicines List

Pandas is used to read in the data from the Essential Medicines list. The DataFrame is then checked to ensure all the data was read in.
The DataFrame contains 3 columns of data:
* Drug (the name of the drug)
* SMILES (the SMILES label for the drug)
* Canonical SMILES (the canonical SMILES label for the drug)

In [3]:
# Read in the CSV of the Essential Medicines List
eml_df = pd.read_csv('eml.csv')

### Ensuring the data was read in properly

The `head()` function is used to view the first 5 rows of data.

The `info()` function is used to ensure all rows are present. No null values are present.

In [4]:
# Info for the data
eml_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442 entries, 0 to 441
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   drugs       442 non-null    object
 1   smiles      442 non-null    object
 2   can_smiles  442 non-null    object
dtypes: object(3)
memory usage: 10.5+ KB


In [5]:
# Ensure the CSV was read into the DataFrame correctly
eml_df.head(5)

Unnamed: 0,drugs,smiles,can_smiles
0,abacavir,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
1,abiraterone,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,C[C@]12CC[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]1CC=C2c1cccnc1
2,acetazolamide,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,CC(=O)Nc1nnc(S(N)(=O)=O)s1
3,acetic acid,CC(O)=O,CC(=O)O
4,acetylcysteine,CC(=O)N[C@@H](CS)C(O)=O,CC(=O)N[C@@H](CS)C(=O)O


## Translating SMILES to IUPAC
The Ersilia Model Hub implementation of "STOUT: SMILES to IUPAC name translator" is used to translate SMILES labels into IUPAC names.

### How the Ersilia Model Works
The Ersilia implementation of the model is used to:
* Read in a SMILES/Canonical SMILES label and translate it into its IUPAC name

To translate the SMILES labels into IUPAC names, the Ersilia model is being used via shell commands.
The model is fetched, using it's slug `smiles2iupac`, from its remote repo and downloaded locally.
It is then served via Docker and ready to use.

### Applying the Ersilia model the data
The `run` command is used on the model to run predictions, which takes an `input` and an `output` argument.

For this model:
- Input: The SMILES names to be translated.
- Output: The file name where the output will be stored.

A batch of multiple predictions can be made by providing a list or input file as the `input` argument. In this case, to make the predictions on the EML data:

- Input is a CSV of SMILES labels, obtained from the DataFrame
- Output is prediction results stored in a CSV file


The model will be ran on 10 SMILES labels.

#### Fetch the Ersilia model (smiles2iupac)

In [6]:
# fetch the Ersilia model
!ersilia fetch smiles2iupac

#### Serve the Ersilia model

In [7]:
# Serves the smiles2iupac model
!ersilia serve smiles2iupac

[32m🚀 Serving model eos4se9: smiles2iupac[0m
[0m
[33m   URL: http://0.0.0.0:57297[0m
[33m   PID: -1[0m
[33m   SRV: pulled_docker[0m
[0m
[34m👉 To run model:[0m
[34m   - run[0m
[0m
[34m💁 Information:[0m
[34m   - info[0m


#### Obtain the SMILES labels for processing
The data in the EML is broken into chunks (sets) of 5, and stored into CSV files.

##### Set A of SMILES labels

In [8]:
# Ensure the rows of data are correct
eml_df.head(5)

Unnamed: 0,drugs,smiles,can_smiles
0,abacavir,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
1,abiraterone,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,C[C@]12CC[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]1CC=C2c1cccnc1
2,acetazolamide,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,CC(=O)Nc1nnc(S(N)(=O)=O)s1
3,acetic acid,CC(O)=O,CC(=O)O
4,acetylcysteine,CC(=O)N[C@@H](CS)C(O)=O,CC(=O)N[C@@H](CS)C(=O)O


In [9]:
# Obtains the set of SMILES names from the DataFrame
set_a = eml_df.head(5)
smiles_csv = set_a.to_csv(path_or_buf="smiles_a.csv", index=False)

##### Set B of SMILES labels

In [10]:
# Ensure the rows of data are correct
df = eml_df.loc[5:9]
df

Unnamed: 0,drugs,smiles,can_smiles
5,acetylsalicylic acid,CC(=O)Oc1ccccc1C(O)=O,CC(=O)Oc1ccccc1C(=O)O
6,aciclovir,NC1=NC(=O)c2ncn(COCCO)c2N1,Nc1nc(=O)c2ncn(COCCO)c2[nH]1
7,aclidinium,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,O=C(O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)C(O)(c1cccs1)c1cccs1
8,afatinib,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,CN(C)C/C=C/C(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2cc1O[C@H]1CCOC1
9,albendazole,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1


In [11]:
# Obtains the set of SMILES names from the DataFrame
set_b = eml_df.loc[5:9]
smiles_csv = set.to_csv(path_or_buf="smiles_b.csv", index=False)

#### Run the model
Uses `%%time` to keep track of the runtime

##### Running the model on a single SMILES label

In [12]:
%%time
# Runs the smiles2iupac model on a single SMILES label
smiles_to_iupac_output = !ersilia run -i 'Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1'
smiles_to_iupac_output

CPU times: user 12.5 ms, sys: 0 ns, total: 12.5 ms
Wall time: 42.7 s


['{',
 '    "input": {',
 '        "key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",',
 '        "input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",',
 '        "text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"',
 '    },',
 '    "output": {',
 '        "outcome": [',
 '            "[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"',
 '        ]',
 '    }',
 '}']

##### Running the model on the created chunks of SMILES data

In [13]:
%%time
!ersilia api run -i 'smiles_a.csv' -o "iupac_a.csv"

CPU times: user 0 ns, sys: 44.9 ms, total: 44.9 ms
Wall time: 4min 1s


In [14]:
%%time
!ersilia api run -i 'smiles_b.csv' -o "iupac_b.csv"

CPU times: user 0 ns, sys: 32 ms, total: 32 ms
Wall time: 4min 21s


#### Close the model
Stop the model server after all predictions are complete

In [15]:
!ersilia close

[32m⛔ Model eos4se9 closed[0m


## Using the Translated Data

### STOUT vs. Ersilia Model
A comparison between the predictions of the STOUT model and the Ersilia SMILES to IUPAC model.

#### Reading in the translated data

Preparing the DataFrames for retrieving the comparison

##### Set A of Ersilia data

In [16]:
ers_df1 = pd.read_csv('iupac_a.csv')
ers_df1

Unnamed: 0,key,input,iupacs_names
0,MCGSCOLBFJQGHM-SCZZXKLOSA-N,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
1,GZOSMCIZMLWJML-VJLLXTKPSA-N,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol"
2,BZKPWHYZMXOIDC-UHFFFAOYSA-N,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-[5-[amino(dioxo)-λ6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide"
3,QTBSBXVTEAMEQO-UHFFFAOYSA-N,CC(O)=O,aceticacid
4,PWKSKIMOESPYIA-BYPYZUCNSA-N,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid


##### Set B of Ersilia data

In [17]:
ers_df2 = pd.read_csv('iupac_b.csv')
ers_df2

Unnamed: 0,key,input,iupacs_names
0,BSYNRYMUTXBXSQ-UHFFFAOYSA-N,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid
1,MKUXAQIIEYXACX-UHFFFAOYSA-N,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one
2,ASMXXROZKSBQIH-VITNCHFBSA-N,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol"
3,ULXXDDBFHOBEHA-CWDCEQMOSA-N,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,"(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide"
4,HXHWSAZORRCQMX-UHFFFAOYSA-N,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate


##### Combining the Ersilia data

In [18]:
ers_combined_df = pd.concat([df1, df2])
ers_combined_df.reset_index(drop=True, inplace=True)
ers_combined_df

Unnamed: 0,key,input,iupacs_names
0,MCGSCOLBFJQGHM-SCZZXKLOSA-N,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
1,GZOSMCIZMLWJML-VJLLXTKPSA-N,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol"
2,BZKPWHYZMXOIDC-UHFFFAOYSA-N,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-[5-[amino(dioxo)-λ6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide"
3,QTBSBXVTEAMEQO-UHFFFAOYSA-N,CC(O)=O,aceticacid
4,PWKSKIMOESPYIA-BYPYZUCNSA-N,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid
5,BSYNRYMUTXBXSQ-UHFFFAOYSA-N,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid
6,MKUXAQIIEYXACX-UHFFFAOYSA-N,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one
7,ASMXXROZKSBQIH-VITNCHFBSA-N,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol"
8,ULXXDDBFHOBEHA-CWDCEQMOSA-N,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,"(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide"
9,HXHWSAZORRCQMX-UHFFFAOYSA-N,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate


##### STOUT data

In [19]:
st_df1 = pd.read_csv('stout_translated_smiles.csv')
st_df1 = st_df1.head(10)
st_df1

Unnamed: 0,drugs,smiles,can_smiles,iupac
0,abacavir,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1,"[(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol"
1,abiraterone,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,C[C@]12CC[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]1CC=C2c1cccnc1,"(3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol"
2,acetazolamide,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,CC(=O)Nc1nnc(S(N)(=O)=O)s1,"N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide"
3,acetic acid,CC(O)=O,CC(=O)O,aceticacid
4,acetylcysteine,CC(=O)N[C@@H](CS)C(O)=O,CC(=O)N[C@@H](CS)C(=O)O,(2R)-2-acetamido-3-sulfanylpropanoicacid
5,acetylsalicylic acid,CC(=O)Oc1ccccc1C(O)=O,CC(=O)Oc1ccccc1C(=O)O,2-acetyloxybenzoicacid
6,aciclovir,NC1=NC(=O)c2ncn(COCCO)c2N1,Nc1nc(=O)c2ncn(COCCO)c2[nH]1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one
7,aclidinium,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,O=C(O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)C(O)(c1cccs1)c1cccs1,"[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate"
8,afatinib,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,CN(C)C/C=C/C(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2cc1O[C@H]1CCOC1,(E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide
9,albendazole,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate


#### Compraring the SMILES to IUPAC results

##### Creating a DataFrame for the comparisons
The columns are:
- `smiles`: The STOUT model SMILES label
- `iupac`: The STOUT model IUPAC translation
- `ers_smiles`: The Ersilia model SMILES label
- `ers_iupac`: The Ersilia model IUPAC translation

In [20]:
# Combining the needed columns from the DataFrames
ers_st_df = st_df1[['drugs','smiles', 'iupac']]
ers_st_df.loc[:,'ers_smiles'] = combined_df['input']
ers_st_df.loc[:,'ers_iupac'] = combined_df['iupacs_names']
ers_st_df

Unnamed: 0,drugs,smiles,iupac,ers_smiles,ers_iupac
0,abacavir,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol",Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
1,abiraterone,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol",C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol"
2,acetazolamide,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide",CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-[5-[amino(dioxo)-λ6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide"
3,acetic acid,CC(O)=O,aceticacid,CC(O)=O,aceticacid
4,acetylcysteine,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid
5,acetylsalicylic acid,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid
6,aciclovir,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one
7,aclidinium,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate",OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol"
8,afatinib,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,(E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,"(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide"
9,albendazole,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate


##### Creating the comparison column
The `same_iupac` column is a boolean column that compares the results of SMILES to IUPAC translations between the STOUT and Ersilia models.

The cells hold a `True` or `False` value:
- **True**: The predictions of both the STOUT and Ersilia models is the same.
- **False**: The predictions of the STOUT and Ersilia models produced different results.

In [21]:
ers_st_df["same_iupac"] = ers_st_df['iupac']==ers_st_df['ers_iupac']
ers_st_df

Unnamed: 0,drugs,smiles,iupac,ers_smiles,ers_iupac,same_iupac
0,abacavir,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol",Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol",False
1,abiraterone,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol",C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,"(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol",False
2,acetazolamide,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide",CC(=O)Nc1sc(nn1)[S](N)(=O)=O,"N-[5-[amino(dioxo)-λ6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide",False
3,acetic acid,CC(O)=O,aceticacid,CC(O)=O,aceticacid,True
4,acetylcysteine,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid,CC(=O)N[C@@H](CS)C(O)=O,(2R)-2-acetamido-3-sulfanylpropanoicacid,True
5,acetylsalicylic acid,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid,CC(=O)Oc1ccccc1C(O)=O,2-acetyloxybenzoicacid,True
6,aciclovir,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one,NC1=NC(=O)c2ncn(COCCO)c2N1,2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one,True
7,aclidinium,OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate",OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1,"2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol",False
8,afatinib,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,(E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide,CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1,"(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide",False
9,albendazole,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate,CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate,True


#### Retrieving the comparison results of the translations

#### Function for retrieving the comparison result

In [22]:
# Filters the DataFrame based on the input SMILES label
def compare_translations(SMILES, df):
    return df['same_iupac'][df['smiles']==SMILES].values[0]

#### Comparing result retrieval examples

In [23]:
compare_translations('CC(O)=O', ers_st_df)

True

In [24]:
compare_translations('C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5', ers_st_df)

False

In [25]:
comparison = compare_translations('C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5', ers_st_df)
if comparison == True:
    print("The IUPAC names are the same!")
else:
    print("The IUPAC names are not the same!")

The IUPAC names are not the same!
