KinoML notebooks

These repository contains detailed execution instructions for the notebooks used in our structure-informed machine learning experiments. We put special emphasis in reproducibility of results by carefully specifying every aspect of the running environment.

Some history hygiene details

This repository should only contain the input code. This extends to notebooks themselves: clear all outputs before commiting!
Outputs are sync'ed to Weights & Biases. When running locally, outputs should be saved in a folder named _output/ relative to the running notebook. This includes the run notebook itself, with (updated) output cells.

Included studies

Note: Each directory contains one more README.md you can check for additional details. The details of each run are encoded in a YAML file.

001-ligand-based: Baseline models for the rest of the studies. Models are trained with ligand information, exclusively.
002-kinase-informed: Include some kinase information, without structural details.

Philosophy

We split an experiment in four stages:

Data intake. This involves taking the raw data (e.g. as provided in a publication, dataset or through a collaborator) and creating a DatasetProvider adapter. This stage should not happen in a notebook, but as part of the KinoML library.
Featurization. One or more DatasetProvider objects are converted into a tensorial representations exported as NumPy arrays. This process should also export the measurement type metadata: observation model mathematical expression, dynamic range, loss adapters.
Training. Takes a collection of tensors & measurement types, one model and a set of hyperparameters, and produces a collection of k-models (one per k-fold). Contextual metadata includes loss data, validation scores, data splits. Only training and validation sets are used here, but indices to test set are available for later stages if needed.
Evaluation. Using the models from step 3, produce reports of performance of a test set. Test set does not have to be part of the same collection used in training; it can be a different collection entirely, as long as the featurized tensors are compatible (e.g. test ChEMBL data on PKIS2). Outputs include test scores.

Since featurized vectors can be reused across models, we do not implement a linear hierarchy that implies such a dependency. Instead, we use metadata to annotate each artifact and identify whether a certain stage is compatible with another, across experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
experiments		experiments
features		features
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
run_notebook.py		run_notebook.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KinoML notebooks

Included studies

Philosophy

About

Releases

Packages

Languages

AndreaVolkamer/experiments-binding-affinity

Folders and files

Latest commit

History

Repository files navigation

KinoML notebooks

Included studies

Philosophy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages