The DeepChem library is packaged alongside the MoleculeNet suite of datasets. One of the most important parts of machine learning applications is finding a suitable dataset. The MoleculeNet suite has curated a whole range of datasets and loaded them into DeepChem dc.data.Dataset
objects for convenience.
If you are proposing a new dataset to be included in the MoleculeNet benchmarking suite, please follow the instructions below. Please review the datasets already available in MolNet before contributing.
- Read the Contribution guidelines.
- Open an issue to discuss the dataset you want to add to MolNet.
- Implement a function in the deepchem.molnet.load_function module following the template function deepchem.molnet.load_function.load_dataset_template. Specify which featurizers, transformers, and splitters (available from deepchem.molnet.defaults) are supported for your dataset.
- Add your load function to deepchem.molnet.__init__.py for easy importing.
- Prepare your dataset as a .tar.gz or .zip file. Accepted filetypes include CSV, JSON, and SDF.
- Ask a member of the technical steering committee to add your .tar.gz or .zip file to the DeepChem AWS bucket. Modify your load function to pull down the dataset from AWS.
- Submit a [WIP] PR (Work in progress pull request) following the PR template.
deepchem.molnet.load_bace_classification
deepchem.molnet.load_bace_regression
deepchem.molnet.load_bbbc001
deepchem.molnet.load_bbbc002
BBBP stands for Blood-Brain-Barrier Penetration
deepchem.molnet.load_bbbp
deepchem.molnet.load_cell_counting
deepchem.molnet.load_chembl
deepchem.molnet.load_chembl25
deepchem.molnet.load_clearance
deepchem.molnet.load_clintox
deepchem.molnet.load_delaney
deepchem.molnet.load_factors
deepchem.molnet.load_hiv
HOPV stands for the Harvard Organic Photovoltaic Dataset.
deepchem.molnet.load_hopv
deepchem.molnet.load_hppb
deepchem.molnet.load_kaggle
deepchem.molnet.load_kinase
deepchem.molnet.load_lipo
Materials datasets include inorganic crystal structures, chemical compositions, and target properties like formation energies and band gaps. Machine learning problems in materials science commonly include predicting the value of a continuous (regression) or categorical (classification) property of a material based on its chemical composition or crystal structure. "Inverse design" is also of great interest, in which ML methods generate crystal structures that have a desired property. Other areas where ML is applicable in materials include: discovering new or modified phenomenological models that describe material behavior
deepchem.molnet.load_bandgap
deepchem.molnet.load_perovskite
deepchem.molnet.load_mp_formation_energy
deepchem.molnet.load_mp_metallicity
deepchem.molnet.load_muv
deepchem.molnet.load_nci
deepchem.molnet.load_pcba
deepchem.molnet.load_pdbbind
deepchem.molnet.load_ppb
deepchem.molnet.load_qm7
deepchem.molnet.load_qm7_from_mat
deepchem.molnet.load_qm7b_from_mat
deepchem.molnet.load_qm8
deepchem.molnet.load_qm9
deepchem.molnet.load_sampl
deepchem.molnet.load_sider
deepchem.molnet.load_thermosol
deepchem.molnet.load_tox21
deepchem.molnet.load_toxcast
deepchem.molnet.load_uspto
deepchem.molnet.load_uv
deepchem.molnet.load_zinc15