Skip to content

Repository for the article Statistical-Based Database Fingerprint: Chemical space dependent representation of compound databases

Notifications You must be signed in to change notification settings

DIFACQUIM/SB-DFP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SB-DFP

Statistical-Based Database Fingerprint: Chemical space dependent representation of compound databases

Epigenetic_datasets.tar.gz Contains the 28 datasets described in the work in CSV format.

ZINC12_AllClean.tar.gz.part00 to 07 Contain the following 4 files in CSV format: Not_Processed.csv | Contains 21 structures not processed by rdkit. Repeated_Structures.csv | Contains 154 repeated compounds between ZINC 12 AllClean subset and Epigenetic datasets. Reference_Dataset.csv | Contains 15,403,690 structures used as reference for SB-DFP calculations. Decoys.csv | Contains 1 million compounds used as decoys in similarity searching runs. To uncompress: 1) Join the 8 parts into a single compressed file with: cat ZINC12_AllClean.tar.gz.part* >> ZINC12_AllClean.tar.gz and 2) Extract the files with: tar -xvzf ZINC12_AllClean.tar.gz

SB-DFPCalc.ipynb and SB-DFPCalc.py The Jupyter notebook with the code employed for DFP and SB-DFP calculations with some examples, and the Python script of such notebook.

MACCS.counts and ECFP4.counts Files containing the reference counts for the calculation of SB-DFP based on MACCS and ECFP4 respectively. These files are needed for the execution of the Python code.

DNMT1.csv This file is needed for the execution of the Python code (used as example). This file is also contained in Epigenetic_datasets.tar.gz

Similarity_Searching.csv This file contains the results of all similarity searching calculations performed in the work, including Data set, Method, Fingerprint, Confidence level (if applicable), Recovery Rate and Area Under ROC Curve.

About

Repository for the article Statistical-Based Database Fingerprint: Chemical space dependent representation of compound databases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published