# Putting it all together

We now demonstrate all of Oloren ChemEngine's uncertainty features by training a production-level model and error model on the BACE dataset from DeepChem datasets.

In [1]:
import olorenchemengine as oce
import pandas as pd
import numpy as np

## Creating the dataset

We will train the model on 90% of the data and leave 10% for testing.

In [2]:
data = pd.read_csv(oce.download_public_file("MoleculeNet/load_bace_regression.csv"))
bace_dataset = oce.BaseDataset(data=data.to_csv(), structure_col = "mol", property_col = "pIC50")
splitter = oce.RandomSplit(split_proportions=[0.9,0,0.1])
bace_dataset = splitter.transform(bace_dataset)
oce.save(bace_dataset, "bace_dataset.oce")

## Training the production-level model

Production-level models can be produced by running ```fit_cv```, which in addition to fitting the whole model, fits the error model via cross validation.

In [3]:
bace_dataset = oce.load("bace_dataset.oce")

model = oce.ZWK_XGBoostModel(oce.OlorenCheckpoint("default"))
model.fit_cv(bace_dataset.train_dataset[0], bace_dataset.train_dataset[1], error_model = oce.SDCwRMSD1())
oce.save(model, "bace_model.oce")

KeyError: "None of [Index(['mol'], dtype='object')] are in the [columns]"

## Visualizing results

We visualize the probable output range for each test molecule (80% confidence interval) and the true output for each test molecule. For the ones plotted, each of the predicted values are within the error margin.

In [None]:
model = oce.load("bace_model.oce")
bace_dataset = oce.load("bace_dataset.oce")

results_df = model.predict(bace_dataset.test_dataset[0], return_ci=True, return_vis=True)
results_df

: 

### Ground truth output

In [None]:
list(bace_dataset.test_dataset[1])[150]

: 

### Predicted output and error margin

In [None]:
results_df["vis"][150].render_ipynb()

: 

: 