Skip to content

3. Example Usage with predefined Fragments

BetuelSevindik1 edited this page Jun 5, 2023 · 6 revisions

Example initialization and usage of the FragmentFingerprinter

To create an example, the 20 most frequently represented functional groups of 407270 natural products were previously identified using the ErtlFunctionalGroupsFinder [1] functionality. The natural products were obtained from the COCONUT database [2]. These 20 identified functional groups are set as user-defined key fragments for the fingerprint (dimensionality = 20). Then, the functional groups of the example molecule veratrum aldehyde are identified. Based on these key fragments and molecule fragments, the bit and count fragment fingerprints are generated.

  1. The first step is to define the user-defined key fragments and pass them to the fingerprinter during initialization:

    //user-defined key fragments of the 20 most frequently identified functional groups in 407270 COCONUT natural products

    ArrayList tmpKeyFragments = new ArrayList<>(20);
    tmpKeyFragments.add("[H]OC");
    tmpKeyFragments.add("*O*");
    tmpKeyFragments.add("[H]Oc);
    tmpKeyFragments.add("C=C");
    tmpKeyFragments.add("*OC(*)=O");
    tmpKeyFragments.add("*n(*)*");
    tmpKeyFragments.add("*C(=O)N(*)*");
    tmpKeyFragments.add("*OCO*");
    tmpKeyFragments.add("*o*");
    tmpKeyFragments.add("O=[cH2]");
    tmpKeyFragments.add("*C(*)=O");
    tmpKeyFragments.add("*C(=O)O[H]");
    tmpKeyFragments.add("*N(*)*");
    tmpKeyFragments.add("*F");
    tmpKeyFragments.add("*N(*)[H]");
    tmpKeyFragments.add("*OC(=O)C=C");
    tmpKeyFragments.add("*Cl");
    tmpKeyFragments.add("*C(=O)C=C");
    tmpKeyFragments.add("[H]N([H])C");
    tmpKeyFragments.add("*Br");


    //initialization the fingerprinter with the key fragments

    FragmentFingerprinter tmpFragmentFingerprinter = new FragmentFingerprinter(tmpKeyFragments);

  2. Next, molecule fragments are defined and passed to the respective fragment fingerprinter method for generating the bit or count fingerprint:

    //molecule fragments in a list to create the bit fingerprint (here: functional groups of veratrum aldehyde)

    ArrayList tmpMoleculeFragmentsList = new ArrayList<>();
    tmpMoleculeFragmentsList.add("*C(*)=O");
    tmpMoleculeFragmentsList.add("*O*");
    tmpMoleculeFragmentsList.add("*Cl");


    //method call to create the bit fingerprint

    IBitFingerprinter tmpBitFingerprint = tmpFragmentFingerprinter.getBitFingerprint(tmpMoleculeFragmentsList);
    tmpBitFingerprint.cardinality(); //returns 3, the number of positive bits.
    tmpFragmentFingerprinter.getBitArray(tmpMoleculeFragmentsList);
    // returns bit fingerprint of veratrum aldehyde: [0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0]


    //molecule fragments and their frequencies in a HashMap to create the count fingerprint (here: functional groups of veratrum aldehyde and their frequencies)

    HashMap<String, Integer> tmpMoleculeFragmentsToFrequenciesMap = new HashMap<>();
    tmpMoleculeFragmentsToFrequenciesMap.put("*C(*)=O", 1);
    tmpMoleculeFragmentsToFrequenciesMap.put("*O*", 2);
    tmpMoleculeFragmentsToFrequenciesMap.put("*Cl", 2);


    //method call to create the count fingerprint

    ICountFingerprinter tmpCountFingerprint = tmpFragmentFingerprinter.getCountFingerprint(tmpMoleculeFragmentsToFrequenciesMap);
    tmpCountFingerprint.getCount(16);
    tmpFragmentFingerprinter.getCountArray(tmpMoleculeFragmentsToFrequenciesMap);
    // returns count fingerprint of veratrum aldehyde: [0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0]


    [1] ErtlFunctionalGroupsFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0361-8
    [2] COCONUT database: https://coconut.naturalproducts.net/