# Multi-task scorers
The scorers in the reinforcement learning framework described in the previous notebook are designed to work with a single task.
Multiple scorers can be added to the environment to handle multiple tasks.
However, we can also design a single scorer that can handle multiple tasks, which can, for example, be useful when the
task share preprocessing steps or when we have a trained multi-task model that takes a molecule as input and returns multiple outputs.
In this notebook, we will show how you can implement such a multi-task scorer. Moreover, we will show how you can use the QSPRpredScorer with multi-task models and attached applicability domain predictions.

## Implementing a multi-task scorer

### recap: Implementing a scorer
First, let's briefly recap how to implement your own scorer as described in the [sequence RNN tutorial](../Sequence-RNN.ipynb).
The scorer should be a class that inherits from `Scorer` and implements the `getScores` and `getKey` methods shown below.

In [6]:
from drugex.training.scorers.interfaces import Scorer

class ModelScorer(Scorer):
    
    def __init__(self, *args, **kwargs):
        super().__init__()
        pass
    
    def getScores(self, mols, frags=None):
        """
        Processes molecules and returns a score for each (i.e. a QSAR model prediction).
        """
        
        return [0] * len(mols) # just return zero for all molecules for the sake of example
    
    def getKey(self):
        """
        Unique Identifier among all the scoring functions used in a single environment.
        """
        
        return "DummyName"
    
dummy_scorer = ModelScorer()
print(dummy_scorer.getScores(["CCO", "CCN"])) # [0, 0]
print(dummy_scorer.getKey()) # ScorerName

[0, 0]
DummyName


One or more Scorer instances can then be used in a DrugEx reinforcement learning environment to evaluate the performance of a model:

In [7]:
from drugex.training.environment import DrugExEnvironment
from drugex.training.rewards import ParetoCrowdingDistance

scorers = [
    dummy_scorer
]
thresholds = [
    0.5,
]

environment = DrugExEnvironment(scorers, thresholds, reward_scheme=ParetoCrowdingDistance())

environment.getScores(["CCO", "CCN"])

Unnamed: 0,Valid,DummyName,Desired
0,1.0,0,0
1,1.0,0,0


### Implementing a multi-task scorer
To implement a multi-task scorer, we can simply implement a scorer that returns multiple scores.
This scorer should also inherit from `Scorer` and implement the `getScores` and `getKey` methods.
The `getScores` method should return a numpy array with the scores for each task. The `getKey` method should return a list of keys that correspond to the tasks.

In [13]:
import numpy as np

class MultitaskScorer(Scorer):
    
    def __init__(self, *args, **kwargs):
        super().__init__()
        pass
    
    def getScores(self, mols, frags=None):
        """
        Processes molecules and returns a score for each (i.e. a QSAR model prediction).
        """
        
        return np.array([[0, 1]] * len(mols)) # return a 2D array with two scores (0 and 1) for each molecule
    
    def getKey(self):
        """
        Unique Identifier among all the scoring functions used in a single environment.
        """
        
        return ["DummyName1", "DummyName2"]
    
dummy_multitask_scorer = MultitaskScorer()
print(dummy_multitask_scorer.getScores(["CCO", "CCN"])) # [0, 0]
print(dummy_multitask_scorer.getKey()) # ScorerName

[[0 1]
 [0 1]]
['DummyName1', 'DummyName2']


Modifiers can also be added in the same way as for single-task scorers, but the `setModifier` method should be passed a list of modifiers, one for each task.

In [14]:
from drugex.training.scorers.modifiers import ClippedScore

dummy_multitask_scorer.setModifier([ClippedScore(lower_x=0.2, upper_x=0.8), ClippedScore(lower_x=0.2, upper_x=0.8)])

Finally, creating the environment with the multi-task scorer is the same as for single-task scorers.
Make sure to add a threshold for each task.

In [15]:
from drugex.training.environment import DrugExEnvironment
from drugex.training.rewards import ParetoCrowdingDistance

scorers = [
    dummy_scorer,
    dummy_multitask_scorer
]
thresholds = [
    0.5,
    0.3,
    0.4
]

environment = DrugExEnvironment(scorers, thresholds, reward_scheme=ParetoCrowdingDistance())

environment.getScores(["CCO", "CCN"])

Unnamed: 0,Valid,DummyName,DummyName1,DummyName2,Desired
0,1.0,0,0,1,0
1,1.0,0,0,1,0


## Special case: QSPRpredScorer with multi-task/multi-class models and applicability domain predictions

### Multi-task modelling
We can also use a multi-task QSPRpredScorer within DrugEx.
Here we first load a multi-task model that predicts pChemBl values for the A1, A2A, A2B, and A3 adenosine receptor subtypes.

In [7]:
from qsprpred.models.scikit_learn import SklearnModel

predictor = SklearnModel(
    name='AR_RandomForestMultiTaskRegressor',
    base_dir='../data/models/qsar'
)

predictor.predictMols(["CCO", "CCN"])

array([[6.94141125, 6.81195176, 7.010821  , 6.57044983],
       [6.93735704, 6.81614159, 7.01081618, 6.57156469]])

The QSPRpredScorer with the multi-task model can be intialized the same as a single-task model.

In [9]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor)

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[6.94141125 6.81195176 7.010821   6.57044983]
 [6.93735704 6.81614159 7.01081618 6.57156469]]
['QSPRpred_AR_RF_reg_P0DMS8', 'QSPRpred_AR_RF_reg_P29274', 'QSPRpred_AR_RF_reg_P29275', 'QSPRpred_AR_RF_reg_P30542']


Alternatively, we can select any number of tasks from the multi-task model that we want to use for scoring.

In [11]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, multi_task=["P29274", "P29275"])

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[6.81195176 7.010821  ]
 [6.81614159 7.01081618]]
['QSPRpred_AR_RF_reg_P29274', 'QSPRpred_AR_RF_reg_P29275']


By default a score of 0 is returned for invalid molecules, but we can also set the `invalids_score` parameter to a different value per task.


In [12]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, invalids_score=[0.0, 3.5, 6.7, 0.0])

print(qsprpred_scorer.getScores(["CCO", "CCN", "XXX"]))

[[6.94141125 6.81195176 7.010821   6.57044983]
 [6.93735704 6.81614159 7.01081618 6.57156469]
 [0.         3.5        6.7        0.        ]]


### Multi-class modelling
In addition to multi-task regression tasks, we can also return multiple classes as separate tasks. 

In [1]:
from qsprpred.models.scikit_learn import SklearnModel

predictor = SklearnModel(
    name='A2AR_RandomForestMultiClassClassifier',
    base_dir='../data/models/qsar'
)

print(predictor.predictMols(["CCO", "CCN"]))
print(predictor.predictMols(["CCO", "CCN"], use_probas=True))

[[1]
 [1]]
[array([[0.15852085, 0.49896536, 0.34251379],
       [0.13512813, 0.47490139, 0.38997048]])]


By default the QSPRpredScorer will return the class probabilities for each class as a score, except in the single-class case where only the probability of the positive class is returned.

In [17]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor)

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[0.15852085 0.49896536 0.34251379]
 [0.13512813 0.47490139 0.38997048]]
['QSPRpred_A2AR_RF_multicls_0', 'QSPRpred_A2AR_RF_multicls_1', 'QSPRpred_A2AR_RF_multicls_2']


It is also possible to return the class predictions by setting the use_probas parameter to False.

In [18]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, use_probas=False)

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[1. 1.]
QSPRpred_A2AR_RF_multicls


Finally as with the multi-task regression case, we can select any number of classes from the multi-class model that we want to use for scoring.

In [19]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, multi_class=[1,2])

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[0.49896536 0.34251379]
 [0.47490139 0.38997048]]
['QSPRpred_A2AR_RF_multicls_1', 'QSPRpred_A2AR_RF_multicls_2']


### Applicability domain predictions

If the QSPRpred model has an applicability domain attached, the applicability domain predictions can also be used within DrugEx.

Our example predictor does not have an attached applicability domain, so we will quickly add a dummy applicability domain to demonstrate how this can be used.

In [3]:
import pandas as pd
import numpy as np

class dummyAD():            
    # returns half of the molecules as outliers
    def contains(self, mols):
        return pd.DataFrame(np.array([True if i % 2 == 0 else False for i in range(len(mols))]).reshape(-1, 1))

dummy_ad = dummyAD()

predictor.applicabilityDomain = dummy_ad

predictor.predictMols(["CCO", "CCN"], use_applicability_domain=True)

(array([[1],
        [1]]),
 array([[ True],
        [False]]))

We can then set the `app_domain` parameter to `True`, to return the applicability domain predictions as a seperate task.

In [4]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, app_domain=True)

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[0.15852085 0.49896536 0.34251379 1.        ]
 [0.13512813 0.47490139 0.38997048 0.        ]]
['QSPRpred_A2AR_RF_multicls_0', 'QSPRpred_A2AR_RF_multicls_1', 'QSPRpred_A2AR_RF_multicls_2', 'QSPRpred_A2AR_RF_multicls_app_domain']


Alternatively, we can set the `app_domain` parameter to `invalid` to assign molecules that are outside the applicability domain the specified `invalids_score`.

In [5]:
from drugex.training.scorers.qsprpred import QSPRPredScorer

qsprpred_scorer = QSPRPredScorer(predictor, app_domain='invalid')

print(qsprpred_scorer.getScores(["CCO", "CCN"]))
print(qsprpred_scorer.getKey())

[[0.15852085 0.49896536 0.34251379]
 [0.         0.         0.        ]]
['QSPRpred_A2AR_RF_multicls_0', 'QSPRpred_A2AR_RF_multicls_1', 'QSPRpred_A2AR_RF_multicls_2']
