Skip to content

NiklasZ/Metabolite-Substructures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Characteristic Substructure of Metabolites Application

This project is an implementation of a method(PDF) that should help identify metabolites. To do this it requires:

  • A fragmentation pattern of a molecule.
  • Potential candidate molecules obtained by performing a lookup with the spectral data on a major database/search engine of chemicals such as HMDB or PubChem.

It will then produce a Characteristic Substructure (CS) that is a representative molecule of all the input molecules. This can then be optionally fed into a tool such as CFM-ID that breaks the molecule back down into the fragmentation pattern via machine learning & heuristics: Ideally, if the original pattern matches this one, then the algorithm has produced a good representation.

As this is a lot of information here is a visualisation of how the application is intended to be used.

Alt text

More details to how the algorithms work and the general flow of the application are detailed here

Getting Started

This section explains the requirements of the application and how to get it running.

Prerequisites

The application has the following dependencies:

  • Python 2 - it is the main language this is implemented in.
  • Python Enum - needed for compatibility purposes.
  • NetworkX - a graph library that is used to create the Characteristic Substructure.

Optionally

  • MatplotLib, numpy and rdkit are required to draw molecules.
  • PubChemPy - required if you want to do lookups on the PubChem database.
  • CFM-ID - used to create a fragmentation pattern from a molecule. An older version is already included by default for testing purposes (is a Windows binary, so will probably not work with Linux distributions).

Installing

To be able to install the application the aforementioned dependencies are required.

From there, all that is needed is to download the src folder and run src/main.py.

Running

The application is given inputs in the command line and has 2 modes:

main.py [-h] {cs,rm}

{cs,rm}
  cs        Find characteristic substructure and optionally use
            fragmentation comparison.
  rm        Find a the best-matching molecule from a list, when building 2
            CS.

The cs creates a characteristic substructure from a molecule list and will draw it if the given libraries are installed. The rm is an experimental process that allows the closest fitting molecule between several lists of candidate molecules via CS.

At the bare minimum you will need to input python main.py cs FILE_NAME to create a CS, although there are plenty more options that can be viewed via python main.py cs -h.

Example

There are some example files of metabolites & spectral patterns that can be used as inputs in test_data and results are output into output_data. Here is an example of running one of them:

python main.py cs ../test_data/acetylaminofluorenes.txt

which will create cs.txt in output_data/acetylaminofluorenes, containing the created CS and an image of it. Optionally we can also specify an -img flag to have it drawn in the same folder as cs.png.

Contributing

Please contact me if you want to contribute to this project. There are possible enhancements in regard of the characteristic substructure and heuristic choices.

Authors

Acknowledgments

About

An application to help identify metabolites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages