out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

biotech7 · 2021-07-19T08:05:24Z

when calculating atmoic weights from a molecule which contains chloridehydrochloride salts, it results in heap space problem. codes as follow:

        IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
        SmilesParser smipar = new SmilesParser(bldr);
        IAtomContainer mol = smipar.parseSmiles("Cl.O=S(=O)(C1=CC=C(NN)C(=C1)C)C");
        double[] cdkWeights = new double[0];
        try {
            cdkWeights = HuLuIndexTool.getAtomWeights(mol);
        } catch (Exception e) {
            System.out.println(e);
        }

errors :

         Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.openscience.cdk.graph.invariant.HuLuIndexTool.getAtomWeights(HuLuIndexTool.java:135)

it mostly likes entering in an endless loop resulting in memory overflow.

The text was updated successfully, but these errors were encountered:

johnmay · 2021-07-20T13:55:02Z

As a rule of thumb most algorithms in the CDK-EXTRA modules are bit wonky and not maintained well. Since this index tool uses shortest paths between node any molecule with more than one component will fail (since there is no path between the two components). I don't know enough about the algorithm to but a simple fix would be split the molecule up and process each component on it's own (e.g. ConnectivityChecker.partitionIntoMolecules()).

It actually doesn't look too bad to fix, I would probably update FloydAPSP with -1 to indicate not connected rather than 999,999. Multiple BFS's is actually more efficient. The problem is would these then still be valid HuLuIndexes. More to the point why did you need HuLuIndexes, why not use InChI numbering for example.

egonw · 2021-07-20T20:05:19Z

@rajarshi, didn't you use the HuluIndexTool in the past?

biotech7 · 2021-07-21T01:55:52Z

@johnmay thanks for your advice，I'll try it. I use HuLuIndexTool for a specific goal which
InChINumbersTools.getNumbers() cannot reach , ,mainly due to InChI numbering has its own nomenclature rule while HuLuIndexTool only provides unique labels/weights.

johnmay · 2021-07-21T09:23:03Z

HuLuIndexTool only provides unique labels/weights.

Hmm not sure they're unique :-)

johnmay · 2021-07-21T09:23:42Z

I've got a patch BTW to handle multiple components, not too bad in the end

biotech7 · 2021-07-21T13:06:28Z

cheers!
on some cases, HuLuIndexTool does'nt generate unique weights for every atom when a molecule has symmetric atoms, e.g toluene.

biotech7 · 2021-07-21T13:42:38Z

btw, does CDK has the module of calculating pKa value for small molecules? trying to search this feature but in vain.

egonw · 2021-08-01T06:32:28Z

btw, does CDK has the module of calculating pKa value for small molecules? trying to search this feature but in vain.

Maybe time to revisit https://chem-bla-ics.blogspot.com/2008/10/pka-prediction-or-how-to-convert-jcim.html

(but I think the issue can be close, correct?)

johnmay closed this as completed Jul 21, 2021

johnmay reopened this Jul 21, 2021

egonw closed this as completed Aug 1, 2021

johnmay mentioned this issue Feb 8, 2022

Compute correct HuLuIndex values and add a test case for a closed bug… #832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

biotech7 commented Jul 19, 2021 •

edited by egonw

johnmay commented Jul 20, 2021

egonw commented Jul 20, 2021

biotech7 commented Jul 21, 2021

johnmay commented Jul 21, 2021

johnmay commented Jul 21, 2021

biotech7 commented Jul 21, 2021

biotech7 commented Jul 21, 2021

egonw commented Aug 1, 2021

out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

Comments

biotech7 commented Jul 19, 2021 • edited by egonw

johnmay commented Jul 20, 2021

egonw commented Jul 20, 2021

biotech7 commented Jul 21, 2021

johnmay commented Jul 21, 2021

johnmay commented Jul 21, 2021

biotech7 commented Jul 21, 2021

biotech7 commented Jul 21, 2021

egonw commented Aug 1, 2021

biotech7 commented Jul 19, 2021 •

edited by egonw