# Obtaining BO results from different molecule representation

## Step 1: Load Model Dataset from DeepChem
In the present demonstration, we will use a model dataset *LIPO* correlating molecular structure with lipophilicity, a typical quantitative-structure property relationship (QSPR) modelling task.

Hersey, A. ChEMBL Deposited Data Set - AZ dataset; 2015. https://doi.org/10.6019/chembl3301361

In [20]:
import deepchem as dc
import numpy as np
import pandas as pd
from deepchem.feat import CircularFingerprint, RDKitDescriptors, Mol2VecFingerprint

In [24]:
# Instantiate featurizers
rdkit_featurizer = dc.feat.RDKitDescriptors()
ecfp_featurizer = dc.feat.CircularFingerprint()
mol2vec_featurizer = dc.feat.Mol2VecFingerprint()
mordred_featurizer = dc.feat.MordredDescriptors()

In [23]:
# Load the Lipophilicity dataset
tasks, datasets, transformers = dc.molnet.load_lipo(featurizer=ecfp_featurizer)

train_dataset, valid_dataset, test_dataset = datasets
X, y = train_dataset.X, np.ravel(train_dataset.y) 
# Check the shape of the dataset
X.shape

(3360, 2048)