Skip to content

Data and analysis notebooks for Predicting Peptide-MHC Binding Affinities With Imputed Training Data

Notifications You must be signed in to change notification settings

alexanderwhatley/mhcflurry-icml-compbio-2016

 
 

Repository files navigation

Predicting Peptide-MHC Binding Affinities With Imputed Training Data

This repository has the data, analysis notebooks, and Authorea-generated latex files for Predicting Peptide-MHC Binding Affinities With Imputed Training Data, submitted to the ICML 2016 Workshop on Computational Biology.

Data and notebooks

The predictions on the blind test data generated by the MHCflurry predictors, netMHC, netMHCpan, and SMM are available in data/validation_predictions_full.csv. This file has predictions from 64 MHCflurry models, 32 with imputation and 32 without. Descriptions of the models are in data/validation_models.csv.

The notebook to train the predictors and generate these results took about 20 hours to run on a single TITAN X GPU and is in notebooks/validation.ipynb.

The analysis of these results, including generating ensemble predictions from the individual predictors and calculating AUC, F1, and tau scores is in notebooks/validation results analysis.ipynb.

The command to generate the data for Figure 1 was:

mhcflurry-dataset-size-sensitivity.py \
	--allele HLA-A0201  \
	--training-csv data/bdata.2009.mhci.public.1.txt \
	--imputation-method mice \
	--number-dataset-sizes 15 \
	--random-negative-samples 0 \
	--min-observations-per-peptide 3 \
	--training-epochs 250 \
	--repeat 3 \
	--max-training-samples 500 \
	--min-training-samples 10 \
	--dropout 0.5 \
	--hidden-layer-size 64 \
	--embedding-size 32

Versions

We used MHCflurry revision 52a88ace.

Other libraries:

appdirs==1.4.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
biopython==1.66
bottle==0.12.9
certifi==2016.2.28
CherryPy==5.1.0
climate==0.4.6
configparser==3.3.0.post2
CVXcanon==0.0.23.4
cvxopt==1.1.8
cvxpy==0.4.0
cycler==0.10.0
datacache==0.4.17
decorator==4.0.9
dill==0.2.5
downhill==0.3.2
ecos==2.0.4
entrypoints==0.2.1
-e git+git@github.com:hammerlab/fancyimpute.git@c4510c5a77fcf27af65149610f260f18826129a4#egg=fancyimpute
functools32==3.2.3.post2
h5py==2.6.0
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.1.2
Jinja2==2.8
jsonschema==2.5.1
jupyter-client==4.2.2
jupyter-core==4.1.0
Keras==1.0.2
lxml==3.6.0
MarkupSafe==0.23
matplotlib==1.5.1
mistune==0.7.2
multiprocess==0.70.4
nbconvert==4.2.0
nbformat==4.0.1
notebook==4.2.0
numpy==1.10.4
pandas==0.18.0
pathlib2==2.1.0
-e git+git@github.com:hammerlab/pepdata.git@a76e9606a24ff0d1b4c817182cdd06d5c75ba169#egg=pepdata
pexpect==4.0.1
pickleshare==0.7.2
plac==0.9.1
progressbar33==2.4
ptyprocess==0.5.1
pycairo==1.10.0
Pygments==2.1.3
pyparsing==2.1.1
python-dateutil==2.5.2
pytz==2016.3
PyYAML==3.11
pyzmq==15.2.0
requests==2.10.0
scikit-learn==0.17.1
scipy==0.17.0
scs==1.2.6
seaborn==0.7.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
terminado==0.6
Theano==0.9.0.dev0
toolz==0.7.4
tornado==4.3
traitlets==4.2.1
typechecks==0.0.2
widgetsnbextension==1.2.1

About

Data and analysis notebooks for Predicting Peptide-MHC Binding Affinities With Imputed Training Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.1%
  • TeX 2.9%