Skip to content

A Python toolkit to compute molecular features and predict activities and properties of small molecules

License

Notifications You must be signed in to change notification settings

BeckResearchLab/PyMolSAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyMolSAR

PyMolSAR aims to provide a generalizable open-source tool for calculating 759 molecular descriptors and test out several different supervised learning algorithms to build the most-appropriate Quantitative Structure-Activity Relationship (QSAR) classification or regression model that accurately predicts the chemical properties or activities of small molecules.

Table of contents:

Requirements

Installation

Using a conda environment

git clone https://github.com/BeckResearchLab/small-molecule-design-toolkit.git
cd small-molecule-design-toolkit
python setup.py install                                 

Getting Started

Two good tutorials to get started are Melting Point Prediction and Blood-Brain Barrier Permeability. Follow along with the tutorials to see how to predict properties on molecules using machine learning.

Input Formats

  • A column containing SMILES strings.
  • A column containing an experimental measurement.

Data Featurization

Most machine learning algorithms require that input data form vectors. However, input data for cheminformatics and drug discovery datasets routinely come in the format of lists of molecules and associated experimental readouts. To transform lists of molecules into vectors, we need to calculate a set of molecular descriptors using smdt.molecular_descriptors.getAllDescriptors()

Models

smdt can build and evaluate different classification and regression models built on top of sklearn. A model report is generated to facilitate the user to choose the most appropriate Quantitative Structure-Activity Relationship (QSAR) or Quantitative Structure-Property Relationship (QSPR) model.

About

A Python toolkit to compute molecular features and predict activities and properties of small molecules

Resources

License

Stars

Watchers

Forks

Packages

No packages published