# Working with different metrics in OligoGym

The OligoGym metrics module provide several common metrics (mostly inherited from sklearn and scipy) to evaluate regression and classification models. It also provide custom function to automatically calculate several metrics silmutaneously. ```selection_metrics``` in particular, calculate a set of metrics that reflect selection of compounds through ranking of regression score or digitization of the score into discrete classes as common with toxicity readouts. This notebook provide examples of the type of metrics availble in the metrics module.

In [1]:
from oligogym.data import DatasetDownloader
from oligogym.models import XGBoostModel
from oligogym.features import KMersCounts
from oligogym import metrics

import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
downloader = DatasetDownloader()

## Evaluate regression model with siRNA potency data

In [3]:
data = downloader.download('siRNA1',verbose=1)
print(data.desc)
data.data.head()

Dataset 'Ichihara_2007_1' has been successfully downloaded.
A precurated collection of unmodified siRNA potency data. The dataset can be split into two. Ichihara_2007_1 is a dataset from Huesken_2005 of siRNA screen using GFP reporter assay.


Unnamed: 0,x,y,y_raw,targets,smiles,fasta
0,RNA1{r(A)p.r(A)p.r(A)p.r(U)p.r(C)p.r(A)p.r(A)p...,46.2,46.2,RHOQ,Cc1cn([C@H]2C[C@H](O)[C@@H](COP(=O)(O)O[C@H]3C...,AAAUCAAUUAACAUAUUAG.CUAAUAUGUUAAUUGAUUUAU
1,RNA1{r(A)p.r(U)p.r(A)p.r(A)p.r(A)p.r(U)p.r(C)p...,38.4,38.4,RHOQ,Cc1cn([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C@@H](n4...,AUAAAUCAAUUAACAUAUU.AAUAUGUUAAUUGAUUUAUAC
2,RNA1{r(G)p.r(A)p.r(A)p.r(A)p.r(G)p.r(G)p.r(A)p...,51.4,51.4,RHOQ,Cc1cn([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C@@H](n4...,GAAAGGAAUUGUAUAAAUC.GAUUUAUACAAUUCCUUUCAA
3,RNA1{r(A)p.r(U)p.r(A)p.r(A)p.r(A)p.r(A)p.r(U)p...,36.4,36.4,RHOQ,Cc1cn([C@H]2C[C@H](O)[C@@H](COP(=O)(O)O[C@H]3C...,AUAAAAUUGAAAGGAAUUG.CAAUUCCUUUCAAUUUUAUCU
4,RNA1{r(C)p.r(U)p.r(U)p.r(A)p.r(U)p.r(U)p.r(U)p...,52.2,52.2,RHOQ,Cc1cn([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C@@H](n4...,CUUAUUUAAUUUUGGUCUG.CAGACCAAAAUUAAAUAAGAA


In [4]:
x_train, x_test, y_train, y_test = data.split(split_strategy='random')
feat = KMersCounts(k=[1,2,3],modification_abundance=True)
feat_x_train = feat.fit_transform(x_train)
feat_x_test = feat.transform(x_test)

In [5]:
model = XGBoostModel()
model.fit(feat_x_train, y_train)
y_pred = model.predict(feat_x_test)

In [6]:
metrics.regression_metrics(y_test,y_pred)

{'r2_score': 0.38794478251680786,
 'root_mean_squared_error': 16.079482330511706,
 'mean_absolute_error': 12.812271296757691,
 'pearson_correlation': 0.6235600734103589,
 'spearman_correlation': 0.6292656680524044}