Skip to content

AndreaVolkamer/experiments-binding-affinity

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KinoML notebooks

These repository contains detailed execution instructions for the notebooks used in our structure-informed machine learning experiments. We put special emphasis in reproducibility of results by carefully specifying every aspect of the running environment.

Some history hygiene details

  • This repository should only contain the input code. This extends to notebooks themselves: clear all outputs before commiting!
  • Outputs are sync'ed to Weights & Biases. When running locally, outputs should be saved in a folder named _output/ relative to the running notebook. This includes the run notebook itself, with (updated) output cells.

Included studies

Note: Each directory contains one more README.md you can check for additional details. The details of each run are encoded in a YAML file.

  • 001-ligand-based: Baseline models for the rest of the studies. Models are trained with ligand information, exclusively.
  • 002-kinase-informed: Include some kinase information, without structural details.

Philosophy

We split an experiment in four stages:

  1. Data intake. This involves taking the raw data (e.g. as provided in a publication, dataset or through a collaborator) and creating a DatasetProvider adapter. This stage should not happen in a notebook, but as part of the KinoML library.
  2. Featurization. One or more DatasetProvider objects are converted into a tensorial representations exported as NumPy arrays. This process should also export the measurement type metadata: observation model mathematical expression, dynamic range, loss adapters.
  3. Training. Takes a collection of tensors & measurement types, one model and a set of hyperparameters, and produces a collection of k-models (one per k-fold). Contextual metadata includes loss data, validation scores, data splits. Only training and validation sets are used here, but indices to test set are available for later stages if needed.
  4. Evaluation. Using the models from step 3, produce reports of performance of a test set. Test set does not have to be part of the same collection used in training; it can be a different collection entirely, as long as the featurized tensors are compatible (e.g. test ChEMBL data on PKIS2). Outputs include test scores.

Since featurized vectors can be reused across models, we do not implement a linear hierarchy that implies such a dependency. Instead, we use metadata to annotate each artifact and identify whether a certain stage is compatible with another, across experiments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%