-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating MAYGEN project to structgen package #767
Conversation
MehmetAzizYirik
commented
Nov 28, 2021
- Integrating maygen source and test files to structgen
- Improved also the SDFWriter class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some initial comments.
*/ | ||
package org.openscience.cdk.structgen.maygen; | ||
|
||
class BoundaryConditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add JavaDoc.
* No triple bonds | ||
* | ||
* @param mat the adjacency matrix | ||
* @return boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete JavaDoc.
check = true; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this with return true
?
@@ -0,0 +1,103 @@ | |||
/* | |||
MIT License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
/* | ||
MIT License | ||
|
||
Copyright (c) 2021 Mehmet Aziz Yirik |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Traditionally, you'd add your email here but consider instead <0000-YOUR-ORCI-D_ID@orcid.org>
import org.openscience.cdk.group.Permutation; | ||
import org.openscience.cdk.interfaces.IAtomContainer; | ||
|
||
class Generation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add JavaDoc
class Generation { | ||
private final Maygen maygen; | ||
|
||
public Generation(Maygen maygen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add JavaDoc
this.maygen = maygen; | ||
} | ||
|
||
public void run(int[] degree) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add JavaDoc
private final SmilesGenerator smilesGenerator = new SmilesGenerator(SmiFlavor.Unique); | ||
private IAtomContainer atomContainer = builder.newInstance(IAtomContainer.class); | ||
|
||
public Maygen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the valences as an configuration option?
@@ -273,7 +273,6 @@ private void writeMolecule(IAtomContainer container) throws CDKException { | |||
mdlWriter.addSettings(getSettings()); | |||
mdlWriter.write(container); | |||
mdlWriter.close(); | |||
writer.write(stringWriter.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmay, you may want to check out the changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They look fine, I had a similar patching waiting to find time to apply
All looks good to me, I think I may tweak some things like naming once merged (i.e. just keep it under ../cdk/structgen/ rather than add another subpackage). But that can all happen after being applied providing it's before release. |
@johnmay, we wait for a bit more JavaDoc? or also after merge? |
Don't mind - I guess some would be good. Also just seen the MIT license, so we going to dual license Egon? Sounds complicated |
we already have other licenses in the CDK. I agree we don't communicate this well. something to think about. this is why copyright/license headers in the source files help. |
@egonw @johnmay If possible and if needed, we can also change the license as how you wish. I am currently writing the javadocs and will psuh changes to the pull request today. |
@MehmetAzizYirik, okay, getting compile errors: https://github.com/cdk/cdk/runs/4372594354?check_suite_focus=true |
Looks like Maygen.java is trying to import the Commons CLI package. Is the command line component still needed? (build failed) Not a blocker (as Egon says it's a mess already :-)) but I think it's cleaner to relicense as LGPL as it just makes things simpler to keep track of. |
@egonw @johnmay Yes, it was about the maygen class. The compiling issue is solved, Valentyn ( @javadev ) updated the dependencies. I committed that changes. |
Personally, I would prefer CLI tools in a separate module. @johnmay, what are your ideas about that?
Did I say that? :) No, I would not say "mess"... we just have a few source files that are MIT and/or public domain.
Yes, agreed. |
Yes CLI separately - prompts me actually to just add some other CLIs that I have floating around, I’ll try and find time tonight to fix this up. |
I'd love that! |
Not got time tonight but hopefully tomorrow. The dependency changes are very suspect, i.e. structgen depending on cdk-pdb and cdk-group. May be the case but seems odd. Likewise the Vecmath should be transient via the cdk-interfaces since it's mainly for the Point2d etc. As noted, we are a library so adding a dependencies on commons-cli sets this further back. |
Closing - now #801 |
Okay @MehmetAzizYirik reopened this one as requested. I have rebased here and am working on it here: https://github.com/cdk/cdk/tree/maygen. List may grow but here is things I will be doing:
|
@johnmay Thanks John for the detailed list. Since you said " I ll be doing", I dont wanna do changes without informing. For now if that is ok for you, I can take a look at the last three in the list ? @javadev And for the changes Valentyn did in the other pull request, if they are in the format of CDK and if they are good to integrate, maybe he can contribute that changes here ? So with your feedbacks, we will have the clear good format for MAYGEN and make it ready for the integration to the CDK. Thanks a lot! |
Because I needed to rebase due to the large number of changes (related to Log4j and dependency purging) I can't merge back into this branch so will create another PR - sound okay? I think just the test fixing would be good, the atomic number is a nice to have and probably quite involved. |
@johnmay Absolutely okay whatever you recommend. :) I will look at the test fixing. |
Unrelated to anything that needs changing but could you explain this logic. I do most multithreading in C/C++/pthreads now a days so a little rusty on ForkJoinPool but this looks like it creates a redundant thread. try {
new ForkJoinPool(size)
.submit(
() ->
newDegrees
.parallelStream()
.forEach(new Generation(this)::run))
.get();
} catch (InterruptedException | ExecutionException ex) {
if (verbose) {
Logger.getLogger(Maygen.class.getName())
.log(Level.SEVERE, ex, () -> "Formula " + localFormula);
}
Thread.currentThread().interrupt();
} i.e. I believe it's equivalent to this: Thread t = new Thread(() ->
newDegrees
.parallelStream()
.forEach(new Generation(this)::run)
);
t.start();
t.join(); which other than spawning a redundant thread is the same as just calling: newDegrees .parallelStream().forEach(new Generation(this)::run) |
runs faster in the tests :p |
But only slightly |
@MehmetAzizYirik Could you check this commit looks reasonable: a0dfd2b. Providing the logic is correct I can now move the IO outside of Maygen |
We just parallelized MAYGEN fork join after reading some sources but this can be even changed if there is a better option. The logic is, after getting the pre-hydrogen distribution, we have a new list of degrees and we parallelized MAYGEN based on this new degree list. But again, if you think that can be changed, please feel free to do that changes. I am open to learn better methods because I dont think this parallelization is efficient enough. |
@johnmay Thanks a lot John! I went though the changes and looks cleaner and perfect! |
Awesome almost done then, just need to move the IO to a separate functor. The tests could simply be tagged as "slow" in which case they don't get run as part of the main suite. Ideally we want to run as many tests as possible for each build but it's okay if some are only run occasionally. Using an Executor service may be better for the multi-threading but I need to look more at the code and profile. For now removing the redundant ForkJoinPool saves some time on the tests at least. Is it possible, to reuse a Maygen object? May be able to make the tests faster using simple parameterisation. |
In-fact SonarCloud even recommends using Parameterized tests :-). |
Infact the simplest thing to do is not run the tests twice :-) Maygen maygen = new Maygen(SilentChemObjectBuilder.getInstance());
maygen.setFormula("C3O6PH5");
maygen.run();
assertEquals(51323, maygen.getCount());
maygen.setMultiThread(true);
maygen.run();
assertEquals(51323, maygen.getCount()); |
Well actually it is testing both the single and multithread within the same tests rather than making them separate. If you recommend something else for testi multithread, we can modify that is also fine. |
About parallelization, it needs better approach actually and we can change it as how you recommend. About the parameterisation, I agree it will be better if we reuse the maygen object. |
I've made the parameterized tests now and actually allows us to mark the single threaded as slow. And only takes 1s or so to run them all multithreaded |
Now #811 |