-
Notifications
You must be signed in to change notification settings - Fork 1
3. Example Usage with predefined Fragments
To create an example, the 20 most frequently represented functional groups of 407270 natural products were previously identified using the ErtlFunctionalGroupsFinder [1] functionality. The natural products were obtained from the COCONUT database [2]. These 20 identified functional groups are set as user-defined key fragments for the fingerprint (dimensionality = 20). Then, the functional groups of the example molecule veratrum aldehyde are identified. Based on these key fragments and molecule fragments, the bit and count fragment fingerprints are generated.
-
The first step is to define the user-defined key fragments and pass them to the fingerprinter during initialization:
//user-defined key fragments of the 20 most frequently identified functional groups in 407270 COCONUT natural products
ArrayList tmpKeyFragments = new ArrayList<>(20);
tmpKeyFragments.add("[H]OC");
tmpKeyFragments.add("*O*");
tmpKeyFragments.add("[H]Oc);
tmpKeyFragments.add("C=C");
tmpKeyFragments.add("*OC(*)=O");
tmpKeyFragments.add("*n(*)*");
tmpKeyFragments.add("*C(=O)N(*)*");
tmpKeyFragments.add("*OCO*");
tmpKeyFragments.add("*o*");
tmpKeyFragments.add("O=[cH2]");
tmpKeyFragments.add("*C(*)=O");
tmpKeyFragments.add("*C(=O)O[H]");
tmpKeyFragments.add("*N(*)*");
tmpKeyFragments.add("*F");
tmpKeyFragments.add("*N(*)[H]");
tmpKeyFragments.add("*OC(=O)C=C");
tmpKeyFragments.add("*Cl");
tmpKeyFragments.add("*C(=O)C=C");
tmpKeyFragments.add("[H]N([H])C");
tmpKeyFragments.add("*Br");
//initialization the fingerprinter with the key fragments
FragmentFingerprinter tmpFragmentFingerprinter = new FragmentFingerprinter(tmpKeyFragments); -
Next, molecule fragments are defined and passed to the respective fragment fingerprinter method for generating the bit or count fingerprint:
//molecule fragments in a list to create the bit fingerprint (here: functional groups of veratrum aldehyde)
ArrayList tmpMoleculeFragmentsList = new ArrayList<>();
tmpMoleculeFragmentsList.add("*C(*)=O");
tmpMoleculeFragmentsList.add("*O*");
tmpMoleculeFragmentsList.add("*Cl");
//method call to create the bit fingerprint
IBitFingerprinter tmpBitFingerprint = tmpFragmentFingerprinter.getBitFingerprint(tmpMoleculeFragmentsList);
tmpBitFingerprint.cardinality(); //returns 3, the number of positive bits.
tmpFragmentFingerprinter.getBitArray(tmpMoleculeFragmentsList);
// returns bit fingerprint of veratrum aldehyde: [0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0]
//molecule fragments and their frequencies in a HashMap to create the count fingerprint (here: functional groups of veratrum aldehyde and their frequencies)
HashMap<String, Integer> tmpMoleculeFragmentsToFrequenciesMap = new HashMap<>();
tmpMoleculeFragmentsToFrequenciesMap.put("*C(*)=O", 1);
tmpMoleculeFragmentsToFrequenciesMap.put("*O*", 2);
tmpMoleculeFragmentsToFrequenciesMap.put("*Cl", 2);
//method call to create the count fingerprint
ICountFingerprinter tmpCountFingerprint = tmpFragmentFingerprinter.getCountFingerprint(tmpMoleculeFragmentsToFrequenciesMap);
tmpCountFingerprint.getCount(16);
tmpFragmentFingerprinter.getCountArray(tmpMoleculeFragmentsToFrequenciesMap);
// returns count fingerprint of veratrum aldehyde: [0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0]
[1] ErtlFunctionalGroupsFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0361-8
[2] COCONUT database: https://coconut.naturalproducts.net/