New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maygen #811
Conversation
… new Atom() directly.
… we can put this in the test/ root
…t for now we only need this function.
…r our IAtomContainer's.
…nning on a background thread.
@@ -2594,21 +2590,8 @@ public void structureGenerator(String localFormula) { | |||
List<int[]> newDegrees = distributeHydrogens(); | |||
|
|||
if (multiThread) { | |||
try { | |||
new ForkJoinPool(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size is important, it can't be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain, I asked @MehmetAzizYirik. My understanding this code is not doing anything since the parellel stream will not use the forjoin pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay see here it really does work that way but is quite opaque/brittle - https://stackoverflow.com/questions/30802463/how-many-threads-are-spawned-in-parallelstream-in-java-8. "It should be noted that the Stream APIs use of ForkJoinPool is an implementation detail. Thus, both solutions work with Oracle’s current implementation but are not guaranteed to work everywhere."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why size and not -
ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that is the default anyways for ForkJoinPool in ParallelStream which is cleaner IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmay we had that conversation with Valentyn about the size parameter, but I believe this needs to be changed like you explained we can use that other option you recommended.
Also, this parallelization actually needs to be enhanced but further tasks not for now. Thus, we dont need to keep it with the ForkJoinPool but can change it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain, I asked @MehmetAzizYirik. My understanding this code is not doing anything since the parellel stream will not use the forjoin pool
It is related for the bug MehmetAzizYirik/MAYGEN#48
In some configuration default value is 79. We need to control it not use default.
import java.util.logging.Level; | ||
import java.util.logging.Logger; | ||
|
||
public class MaygenCLI { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class may be removed. It is useless in the test folder.
|
||
@Test | ||
public void detectAllenes() { | ||
assertFalse(BoundaryConditions.detectAllenes(new int[][] {}, new String[] {})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests may be improved.
<dependency> | ||
<groupId>commons-cli</groupId> | ||
<artifactId>commons-cli</artifactId> | ||
<version>1.5.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this dependency.
|
||
@Test | ||
public void testExpectedCountMultithreaded() throws CDKException, IOException, CloneNotSupportedException { | ||
Maygen maygen = new Maygen(SilentChemObjectBuilder.getInstance()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to define SilentChemObjectBuilder.getInstance()
in constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the structgen module is independent of implementation. Same reason it is provided to SmilesParser. Silent is faster than default which sends out notifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be set parameter for class and default is SilentChemObjectBuilder.getInstance()
. It is easier to create object with constructor without arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module does not and should not depend on data/silent. It’s just the way CDK is - I would not write it this way if we started again but “this is the way”
* this is only used for formula validating and normalizing a parser would | ||
* be better. */ | ||
private static String replaceEach(String str, String[] from, String[] to) { | ||
if (from.length != to.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to use other implementation with better error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to not use this function at all. Using string munging to validate a formula is not nice. Adding in a third party library for a simple function - no
@MehmetAzizYirik Why is the D2 graph not supported in SMILES?
|
It caused an error as I remember. |
I am reading this correct that the way the fuzzy formulae are handled is by enumerating non-fuzzy formula? Hence why there seems to be some vestigial stats displayed to TSV. |
@johnmay As far as I remember, one of our colleagues recommended not to print SMILES for D2 graphs but I do not remember now the chemical reason she explained about the SMILES of D2 graphs. I am not sure whether there was an error or not. |
@johnmay Yes, we needed the fuzzy formulae for another project, then I coded it like generating al possible standard formulae for given fuzzy formula. Can you please explain the thing about the TSV ? I did not get it. |
Just in the process of refactoring the IO - to me it looks like the TSVOUTPUT mode was probably just for the manuscript. I've left it as it is for now. Is a D2 graph just one with all atoms degree 2 right? So in SMILES C1CC1 but of course you could have other elements. |
@johnmay Just to be sure, I went through the code. Yes the SMILES could be given as string array as what we do in OnSm matrices. But I remember now the comment: "it is just a long list of S or O, and chemists are not interested in these molecules". I left it as how it is when I submitted the paper but in the future later we can change it like what we do with OnSm matrices. Same string array. Also TSV, I remember for benchmarking while writing the paper, Chris recommended it would be better to have TSV option and he added it. That is all about TSV. |
Okay… but then why is it useful in SDF? |
@johnmay There is no TSV usage in SDF, I checked the code but I think I did not understood what you meant. Can you please explain ? |
That was unrelated to the TSV comment sorry. What I'm trying to understand is if (onlyDegree2) {
if (oxygen == 0 || sulfur == 0) {
degree2graph(); // when is this called, why does it only output to SDF and not SMILES?
} else {
distributeSulfurOxygen(normalizedLocalFormula);
}
displayStatistic(startTime, normalizedLocalFormula);
} I presume this is only for custom valences since those are the only valence=2 default elements (well S should really be 2,4,6 but anyways). So my thoughts are something like this:
|
Remaining code smells would need a lot of work (i.e. wrapping up methods with larger number of params in a state object). One I can't work out is where the duplicated block is: https://sonarcloud.io/project/issues?id=cdk_cdk&pullRequest=811&resolved=false&rules=common-java%3ADuplicatedBlocks&types=CODE_SMELL |
It looks good except CLI class in test package and commons-cli dependency. They may be removed. |
That’s fine for now - intention is these will go back into the maygen project repo (preferred) or we can add an assembly descriptor. TBD |
@johnmay we can just do whatever you prefer like you explained above. I am working on the test changes now and will do the pull request to the maygen branch. And about parallelization, yes it needs changes like we talked but for now it can stay as how it is, I guess. |
@johnmay yes john, as you recommended, I will do the changes after the release. |
@johnmay I checked now the structgen package and run the mvn test but no data was written to disk I checked. Did I misunderstand what you meant ? |
@MehmetAzizYirik no I did it all in the end. If you need SDF and SMILES you just write a new consumer that does both. I didn’t do this because on the command line tool it looked like smiles output would only activate if sdf wasn’t specified. i’m very worn down with this PR now and still not 100% happy but it’s good enough now. |
Support both formats at the same time is important. |
You can literally convert one in to the other so no it is really NOT important and a stupid design. But yea it’s still possible to do as I explained. |
I’m turning off notifications on this now - @egonw will take over. |
Thanks a lot for this detailed and helpful PR. |
Sdf also supports coordinates. I think the console application will behave unexpectedly in case you will use both keys sdf and smiles. |
@javadev as I wrote above, if John changed it, then done. I did all my changes before the integration pull request, and after that, I left the code to CDK developers. It is a great pleasure for me if they do any changes on it and no need of any other changes after them. |
I added support for the sdf and smi generator #818 |
@javadev I as the main developer of MAYGEN , I dont want any other changes from someone else while CDK developers are working on it. And I prefer to keep it as how John changed! As it was your task in MAYGEN to clean code or format, if there is a need, we will let you know but besides that please do not resist to the changes of CDK developers, especially you are not in the position to criticize cheminformatics related work. I left the code to them after my changes and it is a pleasure if they enhance the code. |
Added support for the smiles and sdf generator.
Hi @MehmetAzizYirik don't worry. These changes were mentioned by @johnmay in comments. Changes have been approved btw. |
Kudos, SonarCloud Quality Gate passed! |
All, thank you very much for all your hard work, peer review, and positiveness. Very much appreciated! |
@egonw Thanks to John and you for the guidance, this PR and all the changes you did. So happy that MAYGEN is integrated to CDK as planned in my PhD. |
Rebased - maygen PR.