Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating MAYGEN project to structgen package #767

Closed
wants to merge 3 commits into from

Conversation

MehmetAzizYirik
Copy link
Contributor

  • Integrating maygen source and test files to structgen
  • Improved also the SDFWriter class

Copy link
Member

@egonw egonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some initial comments.

*/
package org.openscience.cdk.structgen.maygen;

class BoundaryConditions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add JavaDoc.

* No triple bonds
*
* @param mat the adjacency matrix
* @return boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete JavaDoc.

Comment on lines +66 to +67
check = true;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with return true ?

@@ -0,0 +1,103 @@
/*
MIT License
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

/*
MIT License

Copyright (c) 2021 Mehmet Aziz Yirik
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Traditionally, you'd add your email here but consider instead <0000-YOUR-ORCI-D_ID@orcid.org>

import org.openscience.cdk.group.Permutation;
import org.openscience.cdk.interfaces.IAtomContainer;

class Generation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add JavaDoc

class Generation {
private final Maygen maygen;

public Generation(Maygen maygen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add JavaDoc

this.maygen = maygen;
}

public void run(int[] degree) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add JavaDoc

private final SmilesGenerator smilesGenerator = new SmilesGenerator(SmiFlavor.Unique);
private IAtomContainer atomContainer = builder.newInstance(IAtomContainer.class);

public Maygen() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the valences as an configuration option?

@@ -273,7 +273,6 @@ private void writeMolecule(IAtomContainer container) throws CDKException {
mdlWriter.addSettings(getSettings());
mdlWriter.write(container);
mdlWriter.close();
writer.write(stringWriter.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnmay, you may want to check out the changes in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look fine, I had a similar patching waiting to find time to apply

@johnmay
Copy link
Member

johnmay commented Nov 30, 2021

All looks good to me, I think I may tweak some things like naming once merged (i.e. just keep it under ../cdk/structgen/ rather than add another subpackage). But that can all happen after being applied providing it's before release.

@egonw
Copy link
Member

egonw commented Nov 30, 2021

All looks good to me, I think I may tweak some things like naming once merged (i.e. just keep it under ../cdk/structgen/ rather than add another subpackage). But that can all happen after being applied providing it's before release.

@johnmay, we wait for a bit more JavaDoc? or also after merge?

@johnmay
Copy link
Member

johnmay commented Nov 30, 2021

Don't mind - I guess some would be good. Also just seen the MIT license, so we going to dual license Egon? Sounds complicated

@egonw
Copy link
Member

egonw commented Nov 30, 2021

Don't mind - I guess some would be good. Also just seen the MIT license, so we going to dual license Egon? Sounds complicated

we already have other licenses in the CDK. I agree we don't communicate this well. something to think about. this is why copyright/license headers in the source files help.

@MehmetAzizYirik
Copy link
Contributor Author

MehmetAzizYirik commented Nov 30, 2021

Don't mind - I guess some would be good. Also just seen the MIT license, so we going to dual license Egon? Sounds complicated

we already have other licenses in the CDK. I agree we don't communicate this well. something to think about. this is why copyright/license headers in the source files help.

@egonw @johnmay If possible and if needed, we can also change the license as how you wish. I am currently writing the javadocs and will psuh changes to the pull request today.

@MehmetAzizYirik
Copy link
Contributor Author

@johnmay @egonw I committed my changes. Added the javadoc and some other small changes as you requested. If I need to do anything else, please let me know. Thanks!

@egonw
Copy link
Member

egonw commented Nov 30, 2021

@johnmay
Copy link
Member

johnmay commented Nov 30, 2021

Looks like Maygen.java is trying to import the Commons CLI package. Is the command line component still needed? (build failed)

Not a blocker (as Egon says it's a mess already :-)) but I think it's cleaner to relicense as LGPL as it just makes things simpler to keep track of.

@MehmetAzizYirik
Copy link
Contributor Author

MehmetAzizYirik commented Dec 1, 2021

Looks like Maygen.java is trying to import the Commons CLI package. Is the command line component still needed? (build failed)

Not a blocker (as Egon says it's a mess already :-)) but I think it's cleaner to relicense as LGPL as it just makes things simpler to keep track of.

@egonw @johnmay Yes, it was about the maygen class. The compiling issue is solved, Valentyn ( @javadev ) updated the dependencies. I committed that changes.

@egonw
Copy link
Member

egonw commented Dec 1, 2021

Looks like Maygen.java is trying to import the Commons CLI package. Is the command line component still needed? (build failed)

Personally, I would prefer CLI tools in a separate module. @johnmay, what are your ideas about that?

Not a blocker (as Egon says it's a mess already :-))

Did I say that? :) No, I would not say "mess"... we just have a few source files that are MIT and/or public domain.

but I think it's cleaner to relicense as LGPL as it just makes things simpler to keep track of.

Yes, agreed.

@johnmay
Copy link
Member

johnmay commented Dec 1, 2021

Yes CLI separately - prompts me actually to just add some other CLIs that I have floating around, I’ll try and find time tonight to fix this up.

@egonw
Copy link
Member

egonw commented Dec 1, 2021

prompts me actually to just add some other CLIs that I have floating around, I’ll try and find time tonight to fix this up.

I'd love that!

@johnmay
Copy link
Member

johnmay commented Dec 1, 2021

Not got time tonight but hopefully tomorrow. The dependency changes are very suspect, i.e. structgen depending on cdk-pdb and cdk-group. May be the case but seems odd. Likewise the Vecmath should be transient via the cdk-interfaces since it's mainly for the Point2d etc. As noted, we are a library so adding a dependencies on commons-cli sets this further back.

@johnmay
Copy link
Member

johnmay commented Jan 30, 2022

Closing - now #801

@johnmay johnmay closed this Jan 30, 2022
@johnmay johnmay reopened this Jan 31, 2022
@johnmay
Copy link
Member

johnmay commented Jan 31, 2022

Okay @MehmetAzizYirik reopened this one as requested. I have rebased here and am working on it here: https://github.com/cdk/cdk/tree/maygen.

List may grow but here is things I will be doing:

  • rebased on current master
  • avoid new Atom() should use building
  • Pass in the builder, prefer silent builder
  • remove commons-cli, may be able to put in a separate source root
  • remove commons-lang3 usage
  • remove ctab/smiles/sdg compile dependency, IO should be indepandant. Probably needs a "collector" like API
  • stop tests writing files to disk
  • faster tests, we have the "SlowTest" tag for things we don't want to run every build but now takes 9 seconds on my laptop. The entire test suite only takes 60 seconds, so not ideal :-). This probably just means simpler unit tests.
  • Use atomic numbers rather than Strings (optional)

@MehmetAzizYirik
Copy link
Contributor Author

MehmetAzizYirik commented Jan 31, 2022

Okay @MehmetAzizYirik reopened this one as requested. I have rebased here and am working on it here: https://github.com/cdk/cdk/tree/maygen.

List may grow but here is things I will be doing:

  • rebased on current master
  • avoid new Atom() should use building
  • Pass in the builder, prefer silent builder
  • remove commons-cli, may be able to put in a separate source root
  • remove commons-lang3 usage
  • remove ctab/smiles/sdg compile dependency, IO should be indepandant. Probably needs a "collector" like API
  • stop tests writing files to disk
  • faster tests, we have the "SlowTest" tag for things we don't want to run every build but now takes 9 seconds on my laptop. The entire test suite only takes 60 seconds, so not ideal :-). This probably just means simpler unit tests.
  • Use atomic numbers rather than Strings (optional)

@johnmay Thanks John for the detailed list. Since you said " I ll be doing", I dont wanna do changes without informing. For now if that is ok for you, I can take a look at the last three in the list ?

@javadev And for the changes Valentyn did in the other pull request, if they are in the format of CDK and if they are good to integrate, maybe he can contribute that changes here ? So with your feedbacks, we will have the clear good format for MAYGEN and make it ready for the integration to the CDK.

Thanks a lot!

@johnmay
Copy link
Member

johnmay commented Jan 31, 2022

Because I needed to rebase due to the large number of changes (related to Log4j and dependency purging) I can't merge back into this branch so will create another PR - sound okay? I think just the test fixing would be good, the atomic number is a nice to have and probably quite involved.

@MehmetAzizYirik
Copy link
Contributor Author

Because I needed to rebase due to the large number of changes (related to Log4j and dependency purging) I can't merge back into this branch so will create another PR - sound okay? I think just the test fixing would be good, the atomic number is a nice to have and probably quite involved.

@johnmay Absolutely okay whatever you recommend. :) I will look at the test fixing.

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

Unrelated to anything that needs changing but could you explain this logic. I do most multithreading in C/C++/pthreads now a days so a little rusty on ForkJoinPool but this looks like it creates a redundant thread.

try {
    new ForkJoinPool(size)
            .submit(
                    () ->
                            newDegrees
                                    .parallelStream()
                                    .forEach(new Generation(this)::run))
            .get();
} catch (InterruptedException | ExecutionException ex) {
    if (verbose) {
        Logger.getLogger(Maygen.class.getName())
                .log(Level.SEVERE, ex, () -> "Formula " + localFormula);
    }
    Thread.currentThread().interrupt();
}

i.e. I believe it's equivalent to this:

                Thread t = new Thread(() ->
                                        newDegrees
                                                .parallelStream()
                                                .forEach(new Generation(this)::run)
                );
                t.start();
                t.join();

which other than spawning a redundant thread is the same as just calling:

newDegrees .parallelStream().forEach(new Generation(this)::run)

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

newDegrees .parallelStream().forEach(new Generation(this)::run)

runs faster in the tests :p

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

But only slightly

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

@MehmetAzizYirik Could you check this commit looks reasonable: a0dfd2b. Providing the logic is correct I can now move the IO outside of Maygen

@MehmetAzizYirik
Copy link
Contributor Author

Unrelated to anything that needs changing but could you explain this logic. I do most multithreading in C/C++/pthreads now a days so a little rusty on ForkJoinPool but this looks like it creates a redundant thread.

try {
    new ForkJoinPool(size)
            .submit(
                    () ->
                            newDegrees
                                    .parallelStream()
                                    .forEach(new Generation(this)::run))
            .get();
} catch (InterruptedException | ExecutionException ex) {
    if (verbose) {
        Logger.getLogger(Maygen.class.getName())
                .log(Level.SEVERE, ex, () -> "Formula " + localFormula);
    }
    Thread.currentThread().interrupt();
}

i.e. I believe it's equivalent to this:

                Thread t = new Thread(() ->
                                        newDegrees
                                                .parallelStream()
                                                .forEach(new Generation(this)::run)
                );
                t.start();
                t.join();

which other than spawning a redundant thread is the same as just calling:

newDegrees .parallelStream().forEach(new Generation(this)::run)

We just parallelized MAYGEN fork join after reading some sources but this can be even changed if there is a better option. The logic is, after getting the pre-hydrogen distribution, we have a new list of degrees and we parallelized MAYGEN based on this new degree list. But again, if you think that can be changed, please feel free to do that changes. I am open to learn better methods because I dont think this parallelization is efficient enough.

@MehmetAzizYirik
Copy link
Contributor Author

@MehmetAzizYirik Could you check this commit looks reasonable: a0dfd2b. Providing the logic is correct I can now move the IO outside of Maygen

@johnmay Thanks a lot John! I went though the changes and looks cleaner and perfect!

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

Awesome almost done then, just need to move the IO to a separate functor. The tests could simply be tagged as "slow" in which case they don't get run as part of the main suite. Ideally we want to run as many tests as possible for each build but it's okay if some are only run occasionally.

Using an Executor service may be better for the multi-threading but I need to look more at the code and profile. For now removing the redundant ForkJoinPool saves some time on the tests at least.

Is it possible, to reuse a Maygen object? May be able to make the tests faster using simple parameterisation.

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

In-fact SonarCloud even recommends using Parameterized tests :-).

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

Infact the simplest thing to do is not run the tests twice :-)

        Maygen maygen = new Maygen(SilentChemObjectBuilder.getInstance());
        maygen.setFormula("C3O6PH5");
        maygen.run();
        assertEquals(51323, maygen.getCount());
        maygen.setMultiThread(true);
        maygen.run();
        assertEquals(51323, maygen.getCount());

@MehmetAzizYirik
Copy link
Contributor Author

Infact the simplest thing to do is not run the tests twice :-)

        Maygen maygen = new Maygen(SilentChemObjectBuilder.getInstance());
        maygen.setFormula("C3O6PH5");
        maygen.run();
        assertEquals(51323, maygen.getCount());
        maygen.setMultiThread(true);
        maygen.run();
        assertEquals(51323, maygen.getCount());

Well actually it is testing both the single and multithread within the same tests rather than making them separate. If you recommend something else for testi multithread, we can modify that is also fine.

@MehmetAzizYirik
Copy link
Contributor Author

Awesome almost done then, just need to move the IO to a separate functor. The tests could simply be tagged as "slow" in which case they don't get run as part of the main suite. Ideally we want to run as many tests as possible for each build but it's okay if some are only run occasionally.

Using an Executor service may be better for the multi-threading but I need to look more at the code and profile. For now removing the redundant ForkJoinPool saves some time on the tests at least.

Is it possible, to reuse a Maygen object? May be able to make the tests faster using simple parameterisation.

About parallelization, it needs better approach actually and we can change it as how you recommend.

About the parameterisation, I agree it will be better if we reuse the maygen object.

@johnmay
Copy link
Member

johnmay commented Feb 1, 2022

I've made the parameterized tests now and actually allows us to mark the single threaded as slow. And only takes 1s or so to run them all multithreaded

@johnmay
Copy link
Member

johnmay commented Feb 5, 2022

Now #811

@johnmay johnmay closed this Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants