# <a id='toc1_'></a>[Table of contents](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Table of contents](#toc1_)    
- [Pre-processing and cleaning approved drugs (ChEMBL Web Resource)](#toc2_)    
  - [Importing packages and data](#toc2_1_)    
  - [Pre-process](#toc2_2_)    
    - [Only essential variables](#toc2_2_1_)    
    - [Basic filter (only small molecules and therapeutic drugs)](#toc2_2_2_)    
    - [Separating for manual curation](#toc2_2_3_)    
    - [Calculating Molecular Weight (MW) and use it as filter](#toc2_2_4_)    
    - [Removing the salts, neutralize atoms and keep only the largest fragments](#toc2_2_5_)    
    - [Final tweaking](#toc2_2_6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[Pre-processing and cleaning approved drugs (ChEMBL Web Resource)](#toc0_)

In this step, we drop some missing and irrelevant stuff and reorder stuff. The objective of this notebook is to do the pre-cleaning and cleaning of the dataset gathered from ChEMBL Web Resource. More info about it below.

The ChEMBL Web Resource, as defined by the authors and mantainers in [GitHub](https://github.com/chembl/chembl_webresource_client): *'The library helps accessing ChEMBL data and cheminformatics tools from Python. You don't need to know how to write SQL. You don't need to know how to interact with REST APIs. You don't need to compile or install any cheminformatics frameworks. Results are cached.'*

## <a id='toc2_1_'></a>[Importing packages and data](#toc0_)

This section is dedicated to **import the packages and libraries** that we're going to use in this notebook, aswell as **the data** retrieved from *ChEMBL Web Resource* (more info above).

In [1]:
# Importing libraries
from rdkit import Chem, rdBase
from rdkit.Chem import Draw, Descriptors, PandasTools, AllChem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.SaltRemover import SaltRemover, InputFormat
from rdkit.Chem import rdmolops # To clean the structures
from IPython.display import HTML
import pandas as pd
import re

# Defining a function to print the mol image more than once
from IPython.display import HTML
def show_df(df):
    return HTML(df.to_html(notebook=True))

In [2]:
# Importing the dataset:
approved_drugs_df = pd.read_csv('../data/RAW_approved_drugs.csv')

## <a id='toc2_2_'></a>[Pre-process](#toc0_)

### <a id='toc2_2_1_'></a>[Only essential variables](#toc0_)

Taking a peek into the data we can see that there are a lot of variables that we don't need aswell as some rows depicting "NaN". We need to filter the bulk of information then refine it. This is the objective of the pre-process step.
As we can see below, we have 4121 strucutres and 39 variables. Now it is time to clean it!

In [3]:
# List of variables
print(approved_drugs_df.columns, "\n\n",approved_drugs_df.shape )
# Taking a peek into the data
approved_drugs_df.head(5)

Index(['atc_classifications', 'availability_type', 'biotherapeutic',
       'dosed_ingredient', 'first_approval', 'first_in_class', 'helm_notation',
       'indication_class', 'inorganic_flag', 'max_phase', 'molecule_chembl_id',
       'molecule_hierarchy', 'molecule_properties', 'molecule_structures',
       'molecule_synonyms', 'molecule_type', 'natural_product', 'oral',
       'parenteral', 'polymer_flag', 'pref_name', 'prodrug', 'structure_type',
       'therapeutic_flag', 'topical', 'usan_stem', 'usan_stem_definition',
       'usan_substem', 'usan_year', 'withdrawn_class', 'withdrawn_country',
       'withdrawn_flag', 'withdrawn_reason', 'withdrawn_year', 'SMILES'],
      dtype='object') 

 (4121, 39)


Unnamed: 0,atc_classifications,availability_type,biotherapeutic,black_box_warning,chebi_par_id,chirality,cross_references,dosed_ingredient,first_approval,first_in_class,...,usan_stem,usan_stem_definition,usan_substem,usan_year,withdrawn_class,withdrawn_country,withdrawn_flag,withdrawn_reason,withdrawn_year,SMILES
0,['V03AN03'],1,,0,30217.0,2,[],True,2015.0,0,...,-ium,quaternary ammonium derivatives,-ium,,,,False,,,[He]
1,[],1,,0,16134.0,2,"[{'xref_id': 'ammonia%20n-13', 'xref_name': 'a...",False,2007.0,0,...,,,,1990.0,,,False,,,N
2,[],1,,0,,2,"[{'xref_id': 'ammonia%20n-13', 'xref_name': 'a...",True,2007.0,0,...,,,,1990.0,,,False,,,[13NH3]
3,[],2,,0,15377.0,2,"[{'xref_id': 'purified%20water', 'xref_name': ...",True,2011.0,0,...,deu-,deuterated compounds,deu-,1963.0,,,False,,,O
4,['V03AN04'],1,,0,17997.0,2,[],True,2015.0,0,...,,,,,,,False,,,N#N


In [4]:
# Verifying the 'not found' SMILES
print("Number of 'not found' SMILES, before filtering: ", approved_drugs_df.query("SMILES == 'not found'").shape[0])

Number of 'not found' SMILES, before filtering:  674


We can filter the information and select only these variables:
* pref_name
* SMILES
* molecule_chembl_id
* first_approval
* molecule_type
* indication_class
* polymer_flag
* withdrawn_flag
* withdrawn_year
* inorganic_flag
* therapeutic_flag
* natural_product
* oral
* parenteral
* topical

In [5]:
approved_drugs_df = approved_drugs_df[[
    'pref_name',
    'SMILES',
    'molecule_chembl_id',
    'first_approval',
    'molecule_type',
    'indication_class',
    'polymer_flag',
    'withdrawn_flag',
    'inorganic_flag',
    'therapeutic_flag',
    'withdrawn_year',
    'natural_product',
    'oral', 
    'parenteral', 
    'topical',
]].copy()

# Important note: the copy method is used here because one should not work with a slice of the dataframe

### <a id='toc2_2_2_'></a>[Basic filter (only small molecules and therapeutic drugs)](#toc0_)

Next we do the following pre-cleaning steps:
* Filter only the *Small molecules* on the `molecule_type` variable;
* Remove strictly *inorganic* and *polymers*;
* Remove radioactive, gases, diluent, excipients, plutnium, aid, disinfectants, diagnostic, preservative and flavor tagged strucutres;
* Only `therapeutic_flag` == True

In [6]:
# Filtering the structures:
# We want only 'Small molecule', don't want inorganic or polymer stuff
approved_drugs_df = approved_drugs_df.loc[approved_drugs_df['molecule_type'] == 'Small molecule']
approved_drugs_df = approved_drugs_df.loc[approved_drugs_df['inorganic_flag'] == 0]
approved_drugs_df = approved_drugs_df.loc[approved_drugs_df['polymer_flag'] == 0]

# We don't want indication_class containing the word "gases" or "diluent" or "radioactive"
approved_drugs_df = approved_drugs_df[~approved_drugs_df['indication_class'].str.contains("radioactive|gases|diluent|disinfectant|flavor|preservative|diagnostic|excipient|plutonium|aid", flags=re.IGNORECASE, regex=True, na=False)]

# with terapeutic_flag == True
approved_drugs_df = approved_drugs_df[approved_drugs_df['therapeutic_flag'] == True]
approved_drugs_df

Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical
5,NITRIC OXIDE,[N]=O,CHEMBL1200689,1999.0,Small molecule,,0,False,0,True,,0,False,False,True
9,HYDROGEN PEROXIDE,OO,CHEMBL71595,2017.0,Small molecule,"Anti-Infective, Topical",0,False,0,True,,0,False,False,True
14,NITROUS OXIDE,N#[N+][O-],CHEMBL1234579,,Small molecule,Anesthetic (inhalation),0,False,0,True,,0,False,False,False
23,GUANIDINE,N=C(N)N,CHEMBL821,1939.0,Small molecule,,0,False,0,True,,0,True,False,False
24,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939.0,Small molecule,,0,False,0,True,,0,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3946,"SYNTHETIC CONJUGATED ESTROGENS, B",not found,CHEMBL1201467,2004.0,Small molecule,,0,False,0,True,,0,True,False,False
3954,CRYPTENAMINE TANNATES,not found,CHEMBL1201603,1982.0,Small molecule,,0,False,0,True,,1,True,False,False
3995,"ESTROGENS, ESTERIFIED",not found,CHEMBL1201468,1977.0,Small molecule,Estrogen,0,False,0,True,,1,True,False,False
3999,ALKAVERVIR,not found,CHEMBL1201658,1982.0,Small molecule,,0,False,0,True,,1,True,False,False


On the step above we left only **2914** from the **4121** in the start. We need to save the SMILES and first approval date not found for manual curation and to complete the dataset further on. Also, the index is messed up.

### <a id='toc2_2_3_'></a>[Separating for manual curation](#toc0_)

We need to save another `.csv` for manual curation of `SMILES` == 'not found' (logical operator is **OR**) `first_approval` == 'NaN' (@Artur):

In [7]:
manual_curation_df = approved_drugs_df[(approved_drugs_df['SMILES'] == 'not found') | (approved_drugs_df['first_approval'].isna())]
# Taking a peek
manual_curation_df.head(5)

Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical
14,NITROUS OXIDE,N#[N+][O-],CHEMBL1234579,,Small molecule,Anesthetic (inhalation),0,False,0,True,,0,False,False,False
34,ETHYL CHLORIDE,CCCl,CHEMBL46058,,Small molecule,Anesthetic (topical),0,False,0,True,,0,False,False,False
59,PIPERAZINE,C1CNCCN1,CHEMBL1412,,Small molecule,Anthelmintic,0,False,0,True,,0,False,False,False
63,DIHYDROXYACETONE,O=C(CO)CO,CHEMBL1229937,,Small molecule,,0,False,0,True,,0,False,False,False
70,MEPARFYNOL,C#CC(C)(O)CC,CHEMBL501613,,Small molecule,,0,False,0,True,,0,False,False,False


In [8]:
print(f"Size of the manual curation dataset: {manual_curation_df.shape[0]}")
manual_curation_df.to_csv('../data/RAW_manual_curation_dataset.csv', index=False)

# Droppping from the main dataset the SMILES not found and first_approval
approved_drugs_df = approved_drugs_df[~(approved_drugs_df['SMILES'] == 'not found') & ~   (approved_drugs_df['first_approval'].isna())].copy()
# The main dataset has now:
print(f"Size of the main dataset, after dropping not found SMILES and NaN first_approval date: {approved_drugs_df.shape[0]}")

Size of the manual curation dataset: 534
Size of the main dataset, after dropping not found SMILES and NaN first_approval date: 2380


### <a id='toc2_2_4_'></a>[Calculating Molecular Weight (MW) and use it as filter](#toc0_)

Observing the dataset previously we noted that it does not make any sense to keep structures less than 60 Da as it is just water, nitrogen oxide, oxigenated water, etc... 

Before we calculate the molecular weight, we need to generate the `MOL` format of the SMILES as the `RDKit` function only works on this kind of format

In [9]:
# Creating a ROMol column from SMILES format
PandasTools.AddMoleculeColumnToFrame(approved_drugs_df, smilesCol='SMILES')
print("The size of the main dataset: ", approved_drugs_df.shape[0])

# Checking to see if it worked properly
approved_drugs_df.head(5)

The size of the main dataset:  2380


Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical,ROMol
5,NITRIC OXIDE,[N]=O,CHEMBL1200689,1999.0,Small molecule,,0,False,0,True,,0,False,False,True,
9,HYDROGEN PEROXIDE,OO,CHEMBL71595,2017.0,Small molecule,"Anti-Infective, Topical",0,False,0,True,,0,False,False,True,
23,GUANIDINE,N=C(N)N,CHEMBL821,1939.0,Small molecule,,0,False,0,True,,0,True,False,False,
24,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939.0,Small molecule,,0,False,0,True,,0,True,False,False,
36,LITHIUM CARBONATE,O=C([O-])[O-].[Li+].[Li+],CHEMBL1200826,1970.0,Small molecule,Antimanic,0,False,0,True,,0,True,False,False,


In [10]:
# Calculating the Molecular Weight
approved_drugs_df['mw'] = approved_drugs_df['ROMol'].apply(Descriptors.ExactMolWt)

# Dropping molecules lower than 60 Da
approved_drugs_df = approved_drugs_df.query('mw >= 60').copy()
show_df(approved_drugs_df.head(10))

Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical,ROMol,mw
24,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939.0,Small molecule,,0,False,0,True,,0,True,False,False,,95.025025
36,LITHIUM CARBONATE,O=C([O-])[O-].[Li+].[Li+],CHEMBL1200826,1970.0,Small molecule,Antimanic,0,False,0,True,,0,True,False,False,,74.016753
41,ACETOHYDROXAMIC ACID,CC(=O)NO,CHEMBL734,1983.0,Small molecule,Enzyme Inhibitor (urease),0,False,0,True,,0,True,False,False,,75.032028
42,HYDROXYUREA,NC(=O)NO,CHEMBL467,1967.0,Small molecule,Antineoplastic,0,False,0,True,,0,True,False,False,,76.027277
43,CYSTEAMINE BITARTRATE,NCCS.O=C(O)C(O)C(O)C(=O)O,CHEMBL2062263,1994.0,Small molecule,,0,False,0,True,,0,True,False,False,,227.046358
44,CYSTEAMINE HYDROCHLORIDE,Cl.NCCS,CHEMBL1256137,2012.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,False,False,True,,113.006598
45,CYSTEAMINE,NCCS,CHEMBL602,1994.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,True,False,True,,77.02992
50,DIMETHYL SULFOXIDE,C[S+](C)[O-],CHEMBL504,1978.0,Small molecule,Anti-Inflammatory (topical),0,False,0,True,,0,False,True,False,,78.013936
54,FOMEPIZOLE,Cc1cn[nH]c1,CHEMBL1308,1997.0,Small molecule,Antidote (alcohol dehydrogenase inhibitor),0,False,0,True,,0,False,True,False,,82.053098
62,LACTIC ACID,CC(O)C(=O)O,CHEMBL1200559,2006.0,Small molecule,,0,False,0,True,,0,False,True,False,,90.031694


### <a id='toc2_2_5_'></a>[Removing the salts, neutralize atoms and keep only the largest fragments](#toc0_)

In [11]:
# Defining the salts we want to remove from the database
remover = SaltRemover(defnData="[Cl,Br,Na,K,Gd]")

# Defining the function to neutralize the atoms in organic molecules
def neutralize_atoms(mol):
    pattern = Chem.MolFromSmarts("[+1!h0!$([*]~[-1,-2,-3,-4]),-1!$([*]~[+1,+2,+3,+4])]")
    at_matches = mol.GetSubstructMatches(pattern)
    at_matches_list = [y[0] for y in at_matches]
    if len(at_matches_list) > 0:
        for at_idx in at_matches_list:
            atom = mol.GetAtomWithIdx(at_idx)
            chg = atom.GetFormalCharge()
            hcount = atom.GetTotalNumHs()
            atom.SetFormalCharge(0)
            atom.SetNumExplicitHs(hcount - chg)
            atom.UpdatePropertyCache()
    return mol

def keep_largest_fragment(mol):
    frags = rdmolops.GetMolFrags(mol, asMols=True)
    largest_mol = max(frags, key=lambda x: x.GetNumAtoms())
    return largest_mol

In [12]:
# Running the same function again (it's better to define a function and just call it again...)
mols = []
for i, smi in enumerate(approved_drugs_df.SMILES):
    try:
        mol = Chem.MolFromSmiles(smi)
        mol = remover.StripMol(neutralize_atoms(mol))
        mols.append(mol)
    except:
        print(smi,i)

# If it prints something the neutralization was messed up

In [13]:
# Keep largest fragment
approved_drugs_df["Mol_Clean"] = approved_drugs_df.ROMol.apply(keep_largest_fragment)

# Strip mol
approved_drugs_df["Mol_Clean"] = approved_drugs_df.Mol_Clean.apply(remover.StripMol)

# Neutralize atoms
approved_drugs_df["Mol_Clean"] = approved_drugs_df.Mol_Clean.apply(neutralize_atoms)
show_df(approved_drugs_df.head(10))

Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical,ROMol,mw,Mol_Clean
24,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939.0,Small molecule,,0,False,0,True,,0,True,False,False,,95.025025,
36,LITHIUM CARBONATE,O=C([O-])[O-].[Li+].[Li+],CHEMBL1200826,1970.0,Small molecule,Antimanic,0,False,0,True,,0,True,False,False,,74.016753,
41,ACETOHYDROXAMIC ACID,CC(=O)NO,CHEMBL734,1983.0,Small molecule,Enzyme Inhibitor (urease),0,False,0,True,,0,True,False,False,,75.032028,
42,HYDROXYUREA,NC(=O)NO,CHEMBL467,1967.0,Small molecule,Antineoplastic,0,False,0,True,,0,True,False,False,,76.027277,
43,CYSTEAMINE BITARTRATE,NCCS.O=C(O)C(O)C(O)C(=O)O,CHEMBL2062263,1994.0,Small molecule,,0,False,0,True,,0,True,False,False,,227.046358,
44,CYSTEAMINE HYDROCHLORIDE,Cl.NCCS,CHEMBL1256137,2012.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,False,False,True,,113.006598,
45,CYSTEAMINE,NCCS,CHEMBL602,1994.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,True,False,True,,77.02992,
50,DIMETHYL SULFOXIDE,C[S+](C)[O-],CHEMBL504,1978.0,Small molecule,Anti-Inflammatory (topical),0,False,0,True,,0,False,True,False,,78.013936,
54,FOMEPIZOLE,Cc1cn[nH]c1,CHEMBL1308,1997.0,Small molecule,Antidote (alcohol dehydrogenase inhibitor),0,False,0,True,,0,False,True,False,,82.053098,
62,LACTIC ACID,CC(O)C(=O)O,CHEMBL1200559,2006.0,Small molecule,,0,False,0,True,,0,False,True,False,,90.031694,


We then convert the `mol_clean` to a `smiles_clean` column.

In [14]:
approved_drugs_df['smiles_clean'] = approved_drugs_df.Mol_Clean.apply(lambda x: Chem.MolToSmiles(x))
show_df(approved_drugs_df.head(10))

Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical,ROMol,mw,Mol_Clean,smiles_clean
24,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939.0,Small molecule,,0,False,0,True,,0,True,False,False,,95.025025,,N=C(N)N
36,LITHIUM CARBONATE,O=C([O-])[O-].[Li+].[Li+],CHEMBL1200826,1970.0,Small molecule,Antimanic,0,False,0,True,,0,True,False,False,,74.016753,,O=C(O)O
41,ACETOHYDROXAMIC ACID,CC(=O)NO,CHEMBL734,1983.0,Small molecule,Enzyme Inhibitor (urease),0,False,0,True,,0,True,False,False,,75.032028,,CC(=O)NO
42,HYDROXYUREA,NC(=O)NO,CHEMBL467,1967.0,Small molecule,Antineoplastic,0,False,0,True,,0,True,False,False,,76.027277,,NC(=O)NO
43,CYSTEAMINE BITARTRATE,NCCS.O=C(O)C(O)C(O)C(=O)O,CHEMBL2062263,1994.0,Small molecule,,0,False,0,True,,0,True,False,False,,227.046358,,O=C(O)C(O)C(O)C(=O)O
44,CYSTEAMINE HYDROCHLORIDE,Cl.NCCS,CHEMBL1256137,2012.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,False,False,True,,113.006598,,NCCS
45,CYSTEAMINE,NCCS,CHEMBL602,1994.0,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,True,False,True,,77.02992,,NCCS
50,DIMETHYL SULFOXIDE,C[S+](C)[O-],CHEMBL504,1978.0,Small molecule,Anti-Inflammatory (topical),0,False,0,True,,0,False,True,False,,78.013936,,C[S+](C)[O-]
54,FOMEPIZOLE,Cc1cn[nH]c1,CHEMBL1308,1997.0,Small molecule,Antidote (alcohol dehydrogenase inhibitor),0,False,0,True,,0,False,True,False,,82.053098,,Cc1cn[nH]c1
62,LACTIC ACID,CC(O)C(=O)O,CHEMBL1200559,2006.0,Small molecule,,0,False,0,True,,0,False,True,False,,90.031694,,CC(O)C(=O)O


### <a id='toc2_2_6_'></a>[Final tweaking](#toc0_)

Now we can do some tweaking into our dataframe:
* **Rounding** the number of approval date (just for aesthetic)
* Reseting the index

In [15]:
#Rounding the datetime and reseting index
approved_drugs_df['first_approval'] = approved_drugs_df['first_approval'].apply(int)
approved_drugs_df.reset_index(drop=True, inplace=True)

# The main dataset:
print("\n", approved_drugs_df.columns, "\n")
print(f"The dataset has the shape {approved_drugs_df.shape}")
show_df(approved_drugs_df.head(10))


 Index(['pref_name', 'SMILES', 'molecule_chembl_id', 'first_approval',
       'molecule_type', 'indication_class', 'polymer_flag', 'withdrawn_flag',
       'inorganic_flag', 'therapeutic_flag', 'withdrawn_year',
       'natural_product', 'oral', 'parenteral', 'topical', 'ROMol', 'mw',
       'Mol_Clean', 'smiles_clean'],
      dtype='object') 

The dataset has the shape (2377, 19)


Unnamed: 0,pref_name,SMILES,molecule_chembl_id,first_approval,molecule_type,indication_class,polymer_flag,withdrawn_flag,inorganic_flag,therapeutic_flag,withdrawn_year,natural_product,oral,parenteral,topical,ROMol,mw,Mol_Clean,smiles_clean
0,GUANIDINE HYDROCHLORIDE,Cl.N=C(N)N,CHEMBL1200728,1939,Small molecule,,0,False,0,True,,0,True,False,False,,95.025025,,N=C(N)N
1,LITHIUM CARBONATE,O=C([O-])[O-].[Li+].[Li+],CHEMBL1200826,1970,Small molecule,Antimanic,0,False,0,True,,0,True,False,False,,74.016753,,O=C(O)O
2,ACETOHYDROXAMIC ACID,CC(=O)NO,CHEMBL734,1983,Small molecule,Enzyme Inhibitor (urease),0,False,0,True,,0,True,False,False,,75.032028,,CC(=O)NO
3,HYDROXYUREA,NC(=O)NO,CHEMBL467,1967,Small molecule,Antineoplastic,0,False,0,True,,0,True,False,False,,76.027277,,NC(=O)NO
4,CYSTEAMINE BITARTRATE,NCCS.O=C(O)C(O)C(O)C(=O)O,CHEMBL2062263,1994,Small molecule,,0,False,0,True,,0,True,False,False,,227.046358,,O=C(O)C(O)C(O)C(=O)O
5,CYSTEAMINE HYDROCHLORIDE,Cl.NCCS,CHEMBL1256137,2012,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,False,False,True,,113.006598,,NCCS
6,CYSTEAMINE,NCCS,CHEMBL602,1994,Small molecule,Anti-Urolithic (cystine calculi),0,False,0,True,,0,True,False,True,,77.02992,,NCCS
7,DIMETHYL SULFOXIDE,C[S+](C)[O-],CHEMBL504,1978,Small molecule,Anti-Inflammatory (topical),0,False,0,True,,0,False,True,False,,78.013936,,C[S+](C)[O-]
8,FOMEPIZOLE,Cc1cn[nH]c1,CHEMBL1308,1997,Small molecule,Antidote (alcohol dehydrogenase inhibitor),0,False,0,True,,0,False,True,False,,82.053098,,Cc1cn[nH]c1
9,LACTIC ACID,CC(O)C(=O)O,CHEMBL1200559,2006,Small molecule,,0,False,0,True,,0,False,True,False,,90.031694,,CC(O)C(=O)O
