Improvised CDK Hashed fingerprint
Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
blog
lib
src
test
README

README

This is an attempt to improve the CDK HashFingerprint (Fingerprinter class).
The idea behind the improved version is borrowed from my blog improvised hashing function and their impact on the fingerprints. 

http://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/

Command line interface

/*  Test improved CDK FP */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hash  2  2000
 
/* Test CDK default FP */
 
java -jar BenchmarkHashedFingerprinter.jar test/data/mol cdk  2  2000
   
***************************************
Improved CDK HashedFingerprinter class with 1024 size FP
***************************************
CASES:          TP:     FP:	TN:	FN:   ACCURACY:	TPR:	FPR:   Time (mins): 
200*200         629	189	39182	0	0.995	1.000	0.005	0.11
400*400         2428	972	156600	0	0.994	1.000	0.006	0.37
600*600         4940	2449	352611	0	0.993	1.000	0.007	0.75
800*800         8562	5083	626355	0	0.992	1.000	0.008	1.27
1000*1000	12802	9011	978187	0	0.991	1.000	0.009	2.04
1200*1200	17178	12727	1410095	0	0.991	1.000	0.009	2.94

***************************************
Improved New HashedFingerprinter class with 2048 size FP
***************************************

------------------------------------------------------------------------------
CASES:		TP:	FP:	TN:	FN:	ACCURACY:	TPR:	FPR:	Time (mins): 
------------------------------------------------------------------------------
200*200		629	189	39182	0	0.995		1.000	0.005	0.1
400*400		2381	974	156645	0	0.994		1.000	0.006	0.35
600*600		4882	2452	352666	0	0.993		1.000	0.007	0.71
800*800		8484	5085	626431	0	0.992		1.000	0.008	1.19
1000*1000	12710	9014	978276	0	0.991		1.000	0.009	1.93
1200*1200	17070	12730	1410200	0	0.991		1.000	0.009	2.77

***************************************
CDK Default Fingerprinter class with 1024 size FP
***************************************
CASES:		TP:	FP:	TN:	FN:   ACCURACY:	TPR:	FPR:   Time (mins): 
200*200		629	298	39073	0	0.993	1.000	0.008	0.11
400*400		2428	1691	155881	0	0.989	1.000	0.011	0.37
600*600		4940	3765	351295	0	0.990	1.000	0.011	0.74
800*800		8562	7522	623916	0	0.988	1.000	0.012	1.26
1000*1000	12802	13922	973276	0	0.986	1.000	0.014	2.05
1200*1200	17178	19262	1403560	0	0.987	1.000	0.014	2.92



Results:

The improved hashed fingerprinter has better "Accuracy" 
and ~30-40% lesser false positives (FPs) than the original version!

/* Test new FP with ring matcher */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hash  1  2000

------------------------------------------------------------------------------
CASES:		TP:	FP:	TN:	FN:	ACCURACY:   TPR:    FPR:	Time (mins): 
------------------------------------------------------------------------------
200*200		629	144	39227	0	0.996       1.000   0.004	0.1
400*400		2381	842	156777	0	0.995       1.000   0.005	0.34
600*600		4882	2161	352957	0	0.994       1.000   0.006	0.71
800*800		8484	4477	627039	0	0.993       1.000   0.007	1.2
1000*1000	12710	7977	979313	0	0.992       1.000   0.008	1.97
1200*1200	17070	11429	1411501	0	0.992       1.000   0.008	2.82

The improved hashed fingerprinter with ring matcher has better "Accuracy" 
and ~40% lesser false positives (FPs) than the original version!

/* Test new FP with bloom filter and ring matcher */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hashbloom  1  2000