Characteristic Substructure of Metabolites Application

This project is an implementation of a method(PDF) that should help identify metabolites. To do this it requires:

A fragmentation pattern of a molecule.
Potential candidate molecules obtained by performing a lookup with the spectral data on a major database/search engine of chemicals such as HMDB or PubChem.

It will then produce a Characteristic Substructure (CS) that is a representative molecule of all the input molecules. This can then be optionally fed into a tool such as CFM-ID that breaks the molecule back down into the fragmentation pattern via machine learning & heuristics: Ideally, if the original pattern matches this one, then the algorithm has produced a good representation.

As this is a lot of information here is a visualisation of how the application is intended to be used.

More details to how the algorithms work and the general flow of the application are detailed here

Getting Started

This section explains the requirements of the application and how to get it running.

Prerequisites

The application has the following dependencies:

Python 2 - it is the main language this is implemented in.
Python Enum - needed for compatibility purposes.
NetworkX - a graph library that is used to create the Characteristic Substructure.

Optionally

MatplotLib, numpy and rdkit are required to draw molecules.
PubChemPy - required if you want to do lookups on the PubChem database.
CFM-ID - used to create a fragmentation pattern from a molecule. An older version is already included by default for testing purposes (is a Windows binary, so will probably not work with Linux distributions).

Installing

To be able to install the application the aforementioned dependencies are required.

From there, all that is needed is to download the src folder and run src/main.py.

Running

The application is given inputs in the command line and has 2 modes:

main.py [-h] {cs,rm}

{cs,rm}
  cs        Find characteristic substructure and optionally use
            fragmentation comparison.
  rm        Find a the best-matching molecule from a list, when building 2
            CS.

The cs creates a characteristic substructure from a molecule list and will draw it if the given libraries are installed. The rm is an experimental process that allows the closest fitting molecule between several lists of candidate molecules via CS.

At the bare minimum you will need to input python main.py cs FILE_NAME to create a CS, although there are plenty more options that can be viewed via python main.py cs -h.

Example

There are some example files of metabolites & spectral patterns that can be used as inputs in test_data and results are output into output_data. Here is an example of running one of them:

python main.py cs ../test_data/acetylaminofluorenes.txt

which will create cs.txt in output_data/acetylaminofluorenes, containing the created CS and an image of it. Optionally we can also specify an -img flag to have it drawn in the same folder as cs.png.

Contributing

Please contact me if you want to contribute to this project. There are possible enhancements in regard of the characteristic substructure and heuristic choices.

Authors

NiklasZ

Acknowledgments

The University of Glasgow's Computing Department
CFM-ID

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
evaluation_data		evaluation_data
output_data		output_data
readme_img		readme_img
src		src
test_data		test_data
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

evaluation_data

evaluation_data

output_data

output_data

readme_img

readme_img

src

src

test_data

test_data

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Characteristic Substructure of Metabolites Application

Getting Started

Prerequisites

Optionally

Installing

Running

Example

Contributing

Authors

Acknowledgments

About

Releases

Packages

Languages

License

NiklasZ/Metabolite-Substructures

Folders and files

Latest commit

History

Repository files navigation

Characteristic Substructure of Metabolites Application

Getting Started

Prerequisites

Optionally

Installing

Running

Example

Contributing

Authors

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages