Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory when getting weights from molecule consisting of chloridehydrochloride salts #737

Closed
biotech7 opened this issue Jul 19, 2021 · 8 comments · Fixed by #832
Closed

Comments

@biotech7
Copy link

biotech7 commented Jul 19, 2021

when calculating atmoic weights from a molecule which contains chloridehydrochloride salts, it results in heap space problem. codes as follow:

        IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
        SmilesParser smipar = new SmilesParser(bldr);
        IAtomContainer mol = smipar.parseSmiles("Cl.O=S(=O)(C1=CC=C(NN)C(=C1)C)C");
        double[] cdkWeights = new double[0];
        try {
            cdkWeights = HuLuIndexTool.getAtomWeights(mol);
        } catch (Exception e) {
            System.out.println(e);
        }

errors :

         Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.openscience.cdk.graph.invariant.HuLuIndexTool.getAtomWeights(HuLuIndexTool.java:135)

it mostly likes entering in an endless loop resulting in memory overflow.

@johnmay
Copy link
Member

johnmay commented Jul 20, 2021

As a rule of thumb most algorithms in the CDK-EXTRA modules are bit wonky and not maintained well. Since this index tool uses shortest paths between node any molecule with more than one component will fail (since there is no path between the two components). I don't know enough about the algorithm to but a simple fix would be split the molecule up and process each component on it's own (e.g. ConnectivityChecker.partitionIntoMolecules()).

It actually doesn't look too bad to fix, I would probably update FloydAPSP with -1 to indicate not connected rather than 999,999. Multiple BFS's is actually more efficient. The problem is would these then still be valid HuLuIndexes. More to the point why did you need HuLuIndexes, why not use InChI numbering for example.

@egonw
Copy link
Member

egonw commented Jul 20, 2021

@rajarshi, didn't you use the HuluIndexTool in the past?

@biotech7
Copy link
Author

@johnmay thanks for your advice,I'll try it. I use HuLuIndexTool for a specific goal which
InChINumbersTools.getNumbers() cannot reach , ,mainly due to InChI numbering has its own nomenclature rule while HuLuIndexTool only provides unique labels/weights.

@johnmay
Copy link
Member

johnmay commented Jul 21, 2021

HuLuIndexTool only provides unique labels/weights.

Hmm not sure they're unique :-)

@johnmay
Copy link
Member

johnmay commented Jul 21, 2021

I've got a patch BTW to handle multiple components, not too bad in the end

@biotech7
Copy link
Author

cheers!
on some cases, HuLuIndexTool does'nt generate unique weights for every atom when a molecule has symmetric atoms, e.g toluene.

@biotech7
Copy link
Author

btw, does CDK has the module of calculating pKa value for small molecules? trying to search this feature but in vain.

@johnmay johnmay closed this as completed Jul 21, 2021
@johnmay johnmay reopened this Jul 21, 2021
@egonw
Copy link
Member

egonw commented Aug 1, 2021

btw, does CDK has the module of calculating pKa value for small molecules? trying to search this feature but in vain.

Maybe time to revisit https://chem-bla-ics.blogspot.com/2008/10/pka-prediction-or-how-to-convert-jcim.html

(but I think the issue can be close, correct?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants