## Installing mmltoolkit

To install, go to the mmltoolkit directory and run the command: <br>
*python setup.py install*

You may need to install several required packages (RDKit, etc). With Anaconda you can install RDKit in your default environment using the following commands: 

*conda config --add channels rdkit*

*conda install rdkit*


## Import a list of SMILES

In [6]:
with open('../../datasets/energetics_list_cleaned.csv') as file:
    file.readline() # to skip the header 
    smiles = file.readlines()

## Create RDkit mol objects and add hydrogens 

In [8]:
from rdkit import Chem

mol_list = [Chem.AddHs(Chem.MolFromSmiles(smile)) for smile in smiles]

## Create sum over bonds featurization

In [10]:
from mmltoolkit.featurizations import sum_over_bonds

bond_types, X_LBoB  = sum_over_bonds(mol_list)

## Print out bond types

In [11]:
for bond_type in bond_types:
    print(bond_type+',', end='')

C:N,N:N,C-N,C-C,C=O,C-O,C-H,H-O,C:C,N=O,N-O,H-N,N-N,C=N,N=N,C-Cl,C#N,N#N,F-N,C=C,C#C,C-F,N:O,C:O,O-O,Cl-N,Cl-O,F-O,N-Pb,I-N,N-Si,N=S,N-S,C-S,O=S,O-S,

## Save the feature vector array to a .csv file

In [13]:
import numpy as np

np.savetxt('sum_over_bonds.csv', X_LBoB.astype('int') , fmt='%i', delimiter=',')

## Generate combined Estate+CDS+SoB feature vector

In [16]:
from mmltoolkit.featurizations import Estate_CDS_SoB_featurizer

names_Estate_CDS_SoB, X_Estate_CDS_SoB = Estate_CDS_SoB_featurizer(mol_list)