# Descriptor calculation
This tutorial provides an overview of the descriptors that can be calculated with QSPRpred.

First, we will import the necessary modules and load the dataset that we will use for this tutorial.

In [1]:
import os

from qsprpred.data import MoleculeTable

os.makedirs("../../tutorial_output/data", exist_ok=True)

dataset = MoleculeTable.fromTableFile(
    filename="../../tutorial_data/A2A_LIGANDS.tsv",
    path="../../tutorial_output/data",
    name="DescriptorsTutorialDataset",
)
dataset.randomState = 42

dataset.getDF()

Unnamed: 0_level_0,SMILES,pchembl_value_Mean,Year,original_smiles,ID,ID_before_change
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
DescriptorsTutorialDataset_storage_library_0000,Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...,8.68,2008.0,Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...,DescriptorsTutorialDataset_storage_library_0000,DescriptorsTutorialDataset_storage_library_0000
DescriptorsTutorialDataset_storage_library_0001,Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...,4.82,2010.0,Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...,DescriptorsTutorialDataset_storage_library_0001,DescriptorsTutorialDataset_storage_library_0001
DescriptorsTutorialDataset_storage_library_0002,O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1,5.65,2009.0,O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1,DescriptorsTutorialDataset_storage_library_0002,DescriptorsTutorialDataset_storage_library_0002
DescriptorsTutorialDataset_storage_library_0003,CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...,5.45,2009.0,CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...,DescriptorsTutorialDataset_storage_library_0003,DescriptorsTutorialDataset_storage_library_0003
DescriptorsTutorialDataset_storage_library_0004,CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...,5.20,2019.0,CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...,DescriptorsTutorialDataset_storage_library_0004,DescriptorsTutorialDataset_storage_library_0004
...,...,...,...,...,...,...
DescriptorsTutorialDataset_storage_library_4077,CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn12,7.09,2018.0,CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn12,DescriptorsTutorialDataset_storage_library_4077,DescriptorsTutorialDataset_storage_library_4077
DescriptorsTutorialDataset_storage_library_4078,Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n1,8.22,2008.0,Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n1,DescriptorsTutorialDataset_storage_library_4078,DescriptorsTutorialDataset_storage_library_4078
DescriptorsTutorialDataset_storage_library_4079,Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1,4.89,2010.0,Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1,DescriptorsTutorialDataset_storage_library_4079,DescriptorsTutorialDataset_storage_library_4079
DescriptorsTutorialDataset_storage_library_4080,CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc1,6.51,2013.0,CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc1,DescriptorsTutorialDataset_storage_library_4080,DescriptorsTutorialDataset_storage_library_4080


## Descriptor sets

In QSPRpred, descriptors are organized into sets. Each `Descriptorset` can contain one or more types of descriptors. For example, `RDKitDescs`, will contain all physciochemical properties calculated with the RDKit, while `Mordred` will contain descriptors calculated with Mordred.

Descriptors can be added to a dataset using the `addDescriptors` method. This method will calculate the descriptors and add them to the dataset.

In [2]:
from qsprpred.data.descriptors.sets import RDKitDescs

rdkit_descs = RDKitDescs()

dataset.addDescriptors([rdkit_descs])

dataset.descriptorSets

[<qsprpred.data.descriptors.sets.RDKitDescs at 0x7fec796cd460>]

With the `getDescriptorsNames` method, we can retrieve the names of the calculated descriptors from the dataset and with the `getDescriptors` method, we can retrieve the calculated descriptors themselves.

In [3]:
display(dataset.getDescriptorNames()[0:10])
display(dataset.getDescriptors().head())

['RDkit_AvgIpc',
 'RDkit_BCUT2D_CHGHI',
 'RDkit_BCUT2D_CHGLO',
 'RDkit_BCUT2D_LOGPHI',
 'RDkit_BCUT2D_LOGPLOW',
 'RDkit_BCUT2D_MRHI',
 'RDkit_BCUT2D_MRLOW',
 'RDkit_BCUT2D_MWHI',
 'RDkit_BCUT2D_MWLOW',
 'RDkit_BalabanJ']

Unnamed: 0_level_0,RDkit_AvgIpc,RDkit_BCUT2D_CHGHI,RDkit_BCUT2D_CHGLO,RDkit_BCUT2D_LOGPHI,RDkit_BCUT2D_LOGPLOW,RDkit_BCUT2D_MRHI,RDkit_BCUT2D_MRLOW,RDkit_BCUT2D_MWHI,RDkit_BCUT2D_MWLOW,RDkit_BalabanJ,...,RDkit_fr_sulfonamd,RDkit_fr_sulfone,RDkit_fr_term_acetylene,RDkit_fr_tetrazole,RDkit_fr_thiazole,RDkit_fr_thiocyan,RDkit_fr_thiophene,RDkit_fr_unbrch_alkane,RDkit_fr_urea,RDkit_qed
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DescriptorsTutorialDataset_storage_library_0000,3.175462,2.146917,-2.111585,2.224211,-2.211801,5.89803,-0.115978,16.34251,10.325111,1.97945,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.719563
DescriptorsTutorialDataset_storage_library_0001,2.962034,2.203517,-2.136048,2.354533,-2.114561,7.208763,-0.384429,32.133541,9.952695,1.63729,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.537523
DescriptorsTutorialDataset_storage_library_0002,3.127957,2.1882,-2.067268,2.189363,-2.20295,6.053281,0.102176,16.153774,10.19078,1.748699,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.517728
DescriptorsTutorialDataset_storage_library_0003,3.491801,2.749014,-2.231626,2.674016,-2.410868,6.301607,-0.140325,35.495693,9.981499,1.448422,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.173447
DescriptorsTutorialDataset_storage_library_0004,3.236332,2.185263,-2.113838,2.178416,-2.404113,7.903051,0.095234,32.227749,10.184907,1.581267,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.334943


Descriptors can also be calculate through the `prepareDataset` method of the `QSPRDataset` class.

In [4]:
from qsprpred.data import QSPRDataset
from qsprpred.data.descriptors.fingerprints import MorganFP

qspr_dataset = QSPRDataset.fromMolTable(
    dataset,
    target_props=[{"name": "pchembl_value_Mean", "task": "REGRESSION"}],
    name="DescriptorsTutorialQSPRDataset"
)

qspr_dataset.prepareDataset(feature_calculators=[MorganFP(radius=2, nBits=128)])

print(qspr_dataset.descriptorSets)
qspr_dataset.getDescriptors().head()

[<qsprpred.data.descriptors.sets.RDKitDescs object at 0x7fec796cd460>, <qsprpred.data.descriptors.fingerprints.MorganFP object at 0x7fec7b5f3d70>]


Unnamed: 0_level_0,RDkit_AvgIpc,RDkit_BCUT2D_CHGHI,RDkit_BCUT2D_CHGLO,RDkit_BCUT2D_LOGPHI,RDkit_BCUT2D_LOGPLOW,RDkit_BCUT2D_MRHI,RDkit_BCUT2D_MRLOW,RDkit_BCUT2D_MWHI,RDkit_BCUT2D_MWLOW,RDkit_BalabanJ,...,MorganFP_MorganFP_118,MorganFP_MorganFP_119,MorganFP_MorganFP_120,MorganFP_MorganFP_121,MorganFP_MorganFP_122,MorganFP_MorganFP_123,MorganFP_MorganFP_124,MorganFP_MorganFP_125,MorganFP_MorganFP_126,MorganFP_MorganFP_127
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DescriptorsTutorialDataset_storage_library_0000,3.175462,2.146917,-2.111585,2.224211,-2.211801,5.89803,-0.115978,16.34251,10.325111,1.97945,...,False,False,False,True,True,True,True,True,False,False
DescriptorsTutorialDataset_storage_library_0001,2.962034,2.203517,-2.136048,2.354533,-2.114561,7.208763,-0.384429,32.133541,9.952695,1.63729,...,False,False,False,False,True,False,True,True,True,True
DescriptorsTutorialDataset_storage_library_0002,3.127957,2.1882,-2.067268,2.189363,-2.20295,6.053281,0.102176,16.153774,10.19078,1.748699,...,False,False,True,False,True,False,False,True,False,False
DescriptorsTutorialDataset_storage_library_0003,3.491801,2.749014,-2.231626,2.674016,-2.410868,6.301607,-0.140325,35.495693,9.981499,1.448422,...,True,False,False,False,True,True,True,True,False,False
DescriptorsTutorialDataset_storage_library_0004,3.236332,2.185263,-2.113838,2.178416,-2.404113,7.903051,0.095234,32.227749,10.184907,1.581267,...,False,False,True,False,True,False,True,True,False,False


Note. if applying any feature standardization or feature filtering, this will not be reflected in the descriptor dataframe. Instead the original descriptors will be returned by the `getDescriptors` method. To retrieve the standardized or filtered descriptors, use the `getFeatures` method.

Descriptorsets can also be used to directly calculate descriptors for a list of molecules using the `calculateDescriptors` method.

In [5]:
from qsprpred.data.descriptors.fingerprints import MorganFP
from rdkit import Chem

smiles = ["CC(=O)NC1=CC=C(C=C1)O", "CN1CCC23C4C1CC5=C2C(=C(C=C5)O)OC3C(C=C4)O"]
mols = [Chem.MolFromSmiles(smiles) for smiles in smiles]

morgan_fp = MorganFP(radius=2, nBits=128)

morgan_fp.getDescriptors(mols, props=None)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
        0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
        0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
        0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1.,
        1., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1., 0.,
        0., 1., 0., 0., 1., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0.,
        0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
        0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.,

## Examples

Here, we will go over a few examples of the available descriptor sets. The full list of available descriptor sets can be found in the [documentation](https://cddleiden.github.io/QSPRpred/docs/features.html). Note that some descriptors sets may require additional dependencies to be installed in order to be used, see [installation instructions](https://cddleiden.github.io/QSPRpred/docs-dev/install.html) for more information.

### Fingerprints

Fingerprints are a type of molecular descriptor that encode the presence or absence of substructures in a molecule. They are often used in cheminformatics for tasks such as similarity searching and clustering. In QSPRpred, fingerprints can be calculated with the `Fingerprints` descriptor sets. Each bit of the fingerprint is a separate descriptor. In the previous example, we have shown how to calculate the Morgan fingerprints with the `Fingerprints` descriptor set. Another type of fingerprint that can be calculated with QSPRpred is the MACCS keys.

In [6]:
from qsprpred.data.descriptors.fingerprints import MaccsFP

dataset.addDescriptors([MaccsFP()])

qspr_dataset.getDescriptors().head()

Unnamed: 0_level_0,RDkit_AvgIpc,RDkit_BCUT2D_CHGHI,RDkit_BCUT2D_CHGLO,RDkit_BCUT2D_LOGPHI,RDkit_BCUT2D_LOGPLOW,RDkit_BCUT2D_MRHI,RDkit_BCUT2D_MRLOW,RDkit_BCUT2D_MWHI,RDkit_BCUT2D_MWLOW,RDkit_BalabanJ,...,MACCSFP_MACCSFP_157,MACCSFP_MACCSFP_158,MACCSFP_MACCSFP_159,MACCSFP_MACCSFP_160,MACCSFP_MACCSFP_161,MACCSFP_MACCSFP_162,MACCSFP_MACCSFP_163,MACCSFP_MACCSFP_164,MACCSFP_MACCSFP_165,MACCSFP_MACCSFP_166
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DescriptorsTutorialDataset_storage_library_0000,3.175462,2.146917,-2.111585,2.224211,-2.211801,5.89803,-0.115978,16.34251,10.325111,1.97945,...,False,True,True,True,True,True,True,True,True,False
DescriptorsTutorialDataset_storage_library_0001,2.962034,2.203517,-2.136048,2.354533,-2.114561,7.208763,-0.384429,32.133541,9.952695,1.63729,...,False,True,True,False,True,True,True,True,True,False
DescriptorsTutorialDataset_storage_library_0002,3.127957,2.1882,-2.067268,2.189363,-2.20295,6.053281,0.102176,16.153774,10.19078,1.748699,...,False,True,True,False,True,True,True,True,True,False
DescriptorsTutorialDataset_storage_library_0003,3.491801,2.749014,-2.231626,2.674016,-2.410868,6.301607,-0.140325,35.495693,9.981499,1.448422,...,True,True,True,True,True,True,True,True,True,False
DescriptorsTutorialDataset_storage_library_0004,3.236332,2.185263,-2.113838,2.178416,-2.404113,7.903051,0.095234,32.227749,10.184907,1.581267,...,True,True,True,True,True,True,True,True,True,False


### Pre-calculated descriptors

In addition to calculating descriptors with QSPRpred, it is also possible to add pre-calculated descriptors to a dataset. This can be done with the `DataFrameDescriptorSet` class. This class takes a pandas DataFrame as input, where the rows correspond to the molecules and the columns correspond to the descriptors. The `DataFrameDescriptorSet` can then be added to a dataset with the `addDescriptors` method.

Note. that with pre-calculated descriptors, the prediction with trained models will be limited to molecules that are present in the pre-calculated descriptor dataframe.

In [7]:
from qsprpred.data.descriptors.sets import DataFrameDescriptorSet
import numpy as np
import pandas as pd

# Create a dataframe with 10 columns of random values with the same index as the dataset
index = dataset.getDF().index
random_descriptors = pd.DataFrame(np.random.rand(len(index), 10),
                                  index=index,
                                  columns=[f"random_{i}" for i in range(10)])
display(random_descriptors.head())

# Create a DataFrameDescriptorSet from the random descriptors
random_descriptor_set = DataFrameDescriptorSet(random_descriptors)
dataset.addDescriptors([random_descriptor_set])
dataset.getDescriptors().head()

Unnamed: 0_level_0,random_0,random_1,random_2,random_3,random_4,random_5,random_6,random_7,random_8,random_9
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
DescriptorsTutorialDataset_storage_library_0000,0.080991,0.597768,0.043864,0.202611,0.372926,0.369336,0.472443,0.38283,0.945565,0.771105
DescriptorsTutorialDataset_storage_library_0001,0.044646,0.4733,0.828493,0.639553,0.649485,0.164579,0.58083,0.629916,0.064137,0.314737
DescriptorsTutorialDataset_storage_library_0002,0.731378,0.759163,0.445363,0.32564,0.537908,0.581036,0.063058,0.910307,0.273743,0.106696
DescriptorsTutorialDataset_storage_library_0003,0.63819,0.410162,0.834029,0.406996,0.592557,0.00625,0.156885,0.921326,0.05625,0.415941
DescriptorsTutorialDataset_storage_library_0004,0.181689,0.24491,0.184006,0.584334,0.865965,0.517236,0.514595,0.401747,0.165611,0.607979


Unnamed: 0_level_0,RDkit_AvgIpc,RDkit_BCUT2D_CHGHI,RDkit_BCUT2D_CHGLO,RDkit_BCUT2D_LOGPHI,RDkit_BCUT2D_LOGPLOW,RDkit_BCUT2D_MRHI,RDkit_BCUT2D_MRLOW,RDkit_BCUT2D_MWHI,RDkit_BCUT2D_MWLOW,RDkit_BalabanJ,...,DataFrame_random_0,DataFrame_random_1,DataFrame_random_2,DataFrame_random_3,DataFrame_random_4,DataFrame_random_5,DataFrame_random_6,DataFrame_random_7,DataFrame_random_8,DataFrame_random_9
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DescriptorsTutorialDataset_storage_library_0000,3.175462,2.146917,-2.111585,2.224211,-2.211801,5.89803,-0.115978,16.34251,10.325111,1.97945,...,0.080991,0.597768,0.043864,0.202611,0.372926,0.369336,0.472443,0.38283,0.945565,0.771105
DescriptorsTutorialDataset_storage_library_0001,2.962034,2.203517,-2.136048,2.354533,-2.114561,7.208763,-0.384429,32.133541,9.952695,1.63729,...,0.044646,0.4733,0.828493,0.639553,0.649485,0.164579,0.58083,0.629916,0.064137,0.314737
DescriptorsTutorialDataset_storage_library_0002,3.127957,2.1882,-2.067268,2.189363,-2.20295,6.053281,0.102176,16.153774,10.19078,1.748699,...,0.731378,0.759163,0.445363,0.32564,0.537908,0.581036,0.063058,0.910307,0.273743,0.106696
DescriptorsTutorialDataset_storage_library_0003,3.491801,2.749014,-2.231626,2.674016,-2.410868,6.301607,-0.140325,35.495693,9.981499,1.448422,...,0.63819,0.410162,0.834029,0.406996,0.592557,0.00625,0.156885,0.921326,0.05625,0.415941
DescriptorsTutorialDataset_storage_library_0004,3.236332,2.185263,-2.113838,2.178416,-2.404113,7.903051,0.095234,32.227749,10.184907,1.581267,...,0.181689,0.24491,0.184006,0.584334,0.865965,0.517236,0.514594,0.401747,0.165611,0.607979


### Using a trained model to calculate descriptors

In some cases, it may be useful to use a trained model to calculate properties of a molecule that can than be used as descriptors. This can be done with the `PredictorDesc` class. This class takes a trained model as input and uses it to make predictions for a set of molecules. The predictions are then added to the dataset as descriptors with the `addDescriptors` method.

In [8]:
# First we need to create a model to use as descriptor set
from qsprpred.models.scikit_learn import SklearnModel
from sklearn.ensemble import RandomForestRegressor

dataset_for_predictor = QSPRDataset.fromTableFile(
    filename="../../tutorial_data/A2A_LIGANDS.tsv",
    path="../../tutorial_output/data",
    name="DescriptorsTutorialPredictorDataset",
    target_props=[{"name": "pchembl_value_Mean", "task": "REGRESSION"}]
)
dataset_for_predictor.randomState = 42

dataset_for_predictor.prepareDataset(
    feature_calculators=[MorganFP(radius=2, nBits=128)],
    data_filters=None,
)

model = SklearnModel(base_dir="../../tutorial_output/models",
                     name="DescriptorsTutorialModel",
                     alg=RandomForestRegressor)

_ = model.fitDataset(dataset_for_predictor)

In [9]:
# Now we can use the model as a descriptor set
from qsprpred.data.descriptors.sets import PredictorDesc

predictor_desc = PredictorDesc(model)

dataset.addDescriptors([predictor_desc])

dataset.getDescriptors().head()

Unnamed: 0_level_0,RDkit_AvgIpc,RDkit_BCUT2D_CHGHI,RDkit_BCUT2D_CHGLO,RDkit_BCUT2D_LOGPHI,RDkit_BCUT2D_LOGPLOW,RDkit_BCUT2D_MRHI,RDkit_BCUT2D_MRLOW,RDkit_BCUT2D_MWHI,RDkit_BCUT2D_MWLOW,RDkit_BalabanJ,...,DataFrame_random_1,DataFrame_random_2,DataFrame_random_3,DataFrame_random_4,DataFrame_random_5,DataFrame_random_6,DataFrame_random_7,DataFrame_random_8,DataFrame_random_9,PredictorDesc_DescriptorsTutorialModel
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DescriptorsTutorialDataset_storage_library_0000,3.175462,2.146917,-2.111585,2.224211,-2.211801,5.89803,-0.115978,16.34251,10.325111,1.97945,...,0.597768,0.043864,0.202611,0.372926,0.369336,0.472443,0.38283,0.945565,0.771105,8.67295
DescriptorsTutorialDataset_storage_library_0001,2.962034,2.203517,-2.136048,2.354533,-2.114561,7.208763,-0.384429,32.133541,9.952695,1.63729,...,0.4733,0.828493,0.639553,0.649485,0.164579,0.58083,0.629916,0.064137,0.314737,5.4155
DescriptorsTutorialDataset_storage_library_0002,3.127957,2.1882,-2.067268,2.189363,-2.20295,6.053281,0.102176,16.153774,10.19078,1.748699,...,0.759163,0.445363,0.32564,0.537908,0.581036,0.063058,0.910307,0.273743,0.106696,6.1395
DescriptorsTutorialDataset_storage_library_0003,3.491801,2.749014,-2.231626,2.674016,-2.410868,6.301607,-0.140325,35.495693,9.981499,1.448422,...,0.410162,0.834029,0.406996,0.592557,0.00625,0.156885,0.921326,0.05625,0.415941,5.54285
DescriptorsTutorialDataset_storage_library_0004,3.236332,2.185263,-2.113838,2.178416,-2.404113,7.903051,0.095234,32.227749,10.184907,1.581267,...,0.24491,0.184006,0.584334,0.865965,0.517236,0.514594,0.401747,0.165611,0.607979,5.536767


### Custom descriptors

To implement your own descriptors, see the [custom descriptors](../../advanced/data/custom_descriptors.ipynb) tutorial.