1.5.7 Release Notes

John May edited this page Jul 18, 2014 · 25 revisions

Summary

  • Support for representation, input, and output of extended tetrahedral (e.g. allenes) stereochemistry in SMILES, InChI, and depictions (e.g. molfile).
  • The Kekulisation used in SMILES has been ported to the module 'cdk-standard' and can now be used on IAtomContainer representations.
  • Improved support for multiple namespaces in CML.
  • Quintuple and sextuple bond orders.
  • Improved and validated MMFF atom types based on SMARTS.
  • Modified SMARTS '*' behaviour to match explicit hydrogen atoms. Previously, among all other elements, only hydrogen isotopes would be matched by '*'. This was inconsistent with respect to protons, bridged hydrogens, molecular hydrogens. '*' now really matches any atom.
  • Bug fixes and maintenance.
  • Maven plugins for code analysis and standardisation.

Maven plugins

Several maven plugins have been added to the build. These replicate/replace/improve some functionality from the old ant build. More may be added in future (e.g. FindBugs and checkstyle/ojdcheck). This section details how to run the plugins and access the reports.

PMD

PMD analyses code style (e.g. variable naming, complexity) and reports potential bugs. PMD was previously used in the ant build but has now been configured to run through maven. Currently only production (non-test) code is inspected. The following snippet shows how to run PMD on the 'cdk-silent' module.

cdk/: cd base/silent
cdk/base/silent: ls
cdk/base/silent: mvn pmd:pmd
cdk/base/silent: open target/site/pmd.html 

java-formatter

As a relatively mature project with many different developers there are many different formatting styles used in the CDK source code. Following patches from different IDEs with different settings some files have gotten pretty messy. This release adds a java-formatter that tidies up the code using consistent settings. The formatter is not run automatically but will applied to the whole code base in the near future.

The formatting settings are in the cdk-build-util project cdk-build-util/.../cdk-formatting-conventions.xml.

To run the formatter on the silent module

cdk/: cd base/silent
cdk/base/silent: ls
cdk/base/silent: mvn java-formatter:format
[INFO] --- maven-java-formatter-plugin:0.4:format (default-cli) @ cdk-silent ---
[INFO] Using 'UTF-8' encoding to format source files.
[INFO] Number of files to be formatted: 76
[INFO] Successfully formatted: 76 file(s)
[INFO] Fail to format        : 0 file(s)
[INFO] Skipped               : 0 file(s)
[INFO] Approximate time taken: 3s

JaCoCo

JaCoCo is a tool for analysing test coverage. Previously the CDK used custom @TestMethod and @TestClass annotations to couple and inspect whether tests were present. Although functional there are/were a couple of problems

  • extra maintenance of keeping the annotations in sync
  • clutter on a well tested method @TestMethod("testALongName,testBLongName,testCLongName,testDLongName") public void importantMethod(Object obj){}
  • no support for multiple test classes, or at least not clear how to do it
  • game the analysis and indicate a method was tested when it really wasn't
  • coverage of less than 100% was reported a test failure

JaCoCo can install agent instrumentation and check exactly which lines are called and missed by tests. This not only serves as a quality measure but also can guide optimisation, "why isn't that conditional ever hit by my tests, is it possible?".

I'll use the new MMFF atom typing to demonstrate:

cdk/: cd tool/forcefield
cdk/tool/forcefield: ls
cdk/tool/forcefield: mvn jacoco:prepare-agent test
cdk/tool/forcefield: mvn jacoco:report
cdk/tool/forcefield: open target/site/jacoco/index.html

The contribute method determines the number of pi electrons for an element with specified valence (v) and connectivity (x). We can see that two lines are flagged as yellow. On inspection we can see that 1 of 4 branches was missed. There are four branches because of two conditionals (2^2=4) and one of them is missed.

JaCoCo Report Example

IDEs and CI servers (Jenkins) can also integrate the reports directly.

Reporting coverage when the tests are separate to the production code is a little more tricky but possible. Here is an example for the 'cdk-standard' module.

cdk/: cd base/standard
cdk/base/standard: mvn install
cdk/base/standard: cd ../test-standard
cdk/base/test-standard: mvn jacoco:prepare-agent test
cdk/base/standard: cd ../standard
cdk/base/standard: mvn jacoco:report
cdk/base/standard: open target/site/jacoco/index.html

Test status

20,965  (+1004) tests
19 (-4) failures
0 errors

Commits

  56  John May
  17  Egon Willighagen
   8  Mark Williamson

We would also like to acknowledge Mark Vine for suggesting the formatter and JaCoCo plugins.

Reviewers

  43  Egon Willighagen 
  23  John May 

Change log

  • Bumping version for 1.5.7 release. dc92035
  • Added unit tests for some yet untested code, according to Jacoco 684664c
  • Proper setting of the C-terminus in the unit test a0f4394
  • Missed assertion change due to '*' behaviour. fdad832
  • Revert "Circumvent '*' not matching explicit hydrogens." 3fbab65
  • Only create a new Reaction convention if the current convention isn't already CMLR. 58d5b8a
  • ChemModel is also added elsewhere leading to a regression. bc0312e
  • Resolve unit test regressions. Comment added to note the original bug report was wrong. 29d9a75
  • The SMARTS '*' should match any atom, include hydrogen. Hydrogen was not previously matched due to confussion over explicit-H matching in SMARTS. 6999951
  • Remove test reporting "This descriptor is not tested". Test coverage is reported using jacoco. The abstract descriptor test is still running some useful assertions (i.e. hydrogen representation) and so we leave the class in place. 813637a
  • Inline method invocations and avoid using reflection (error prone). 0cf2c60
  • The expected value is for chloropropane and not chlorobutane. The SMILES 'CCCCl' was mistakenly misread as 'CCCCCl' in commit db9a311. 9dedcca
  • Circumvent '*' not matching explicit hydrogens. 791f353
  • MMFF atom type assignment, unit and validation tests. The atom types are assign using simple SMARTS patterns and the previously implemented aromatic mapping. This provides correct assignment of all symbolic types in the validation suite. 9125e9d
  • MMFF aromatic types are assigned by updating the existing symbolic types. a55c15f
  • MMFF resource files 4eab06d
  • Additional commit to fix still failing test. The previous example was hitting a heuristic in the code, where there is more than one hydrogen the type is automatically set to non-stereogenic. This was the case for the terminal atoms in this extended tetrahedral example. 1531311
  • These atoms need to be recognised for creating extended tetrahedral (allene) stereo elements. cf00f01
  • Very minor JavaDoc improvements 8fea909
  • Removed unused imports and code 5247f59
  • Use the @cdk.cite mechanism of citing papers 3936c37
  • Configuring jacoco code coverage plugin. Due to separate prod/test modules some extra config is required and the setup cannot be bound to a build phase. When invoked, it is important that the report is built separately, example - 'mvn jacoco:prepare-agent test; mvn jacoco:report'. cc6acb3
  • Added two missing bond order types ac2e575
  • Added command to build source jars. 2f7cf38
  • Improved Mol2 atom type handling. ebb7735
  • Ensure atomic number is set when reading Mol2. b24b4cf
  • Run slow tests by invoking 'mvn test -Pslow-tests'. 6e6cb97
  • Formatting pom. a07aefa
  • Marking several slow running tests across various modules. 4b2f843
  • Mechanism for marking slow running tests with JUnit groups/categories. 2efa760
  • Move new resource file to the module where it is used. e5b04a8
  • Encapsulate CML stack and make package private. 1f0c3ff
  • Include unpaired electrons sooner to catch negative hydrogen count in existing conditionals [Fixes:1343]. 9b9791d
  • Symmetry of heavy atoms (skeleton) independent of explicit hydrogens. d7f8a2c
  • Mask hydrogens in canonical labelling. 17a518e
  • Multiline SD value joined with non OS newline [bug:1337] 5b4c073
  • Automatic code formatting (not yet run), convention has been added to the 'cdk-build-util' project where it can be modified. 5298ddc
  • Update README.md 79b692b
  • Log4J is invoked indirectly through the CDK LoggingTool. 40cba9d
  • Formatting. c869a13
  • More logging improvements to MMFF94PartialCharges 2b1d50e
  • Minor MMFF94charge LOG improvements e2fbf4f
  • Enable logging for MMFF94PartialCharges 54e94fb
  • Remove unused imports bd25e18
  • fix for The import org.openscience.cdk.interfaces.ITetrahedralChirality.Stereo collides with another import statement 8f0a934
  • Use the charge value from the paper for methanol f632e96
  • testPartialTotalChargeDescriptor_Methylamine does not check final atom ee8db9b
  • Be more explicit about source of reference values 3fd7af56
  • Only store a cdk:Formula field if we really found formula 02505c2
  • Make sure to store the reaction set a961337
  • Report the simple name, not the hashcode 5e7f08b
  • Ignore elements from other namespaces (but default to CML if not NS is defined) b1edcbf
  • Also track the modules and thus with their states b0398a0
  • Unit test that shows that namespaces are no longer properly handled when reading CML 733ccd9
  • Plugin repository required for PMD to obtain cdk-build-util. 59ab922
  • Correct PMD configuration. Rulesets are now accessed cdk-build-util on the classpath. They could also be hosted remotely but this way they can be run offline. 200c603
  • Reinstate some PMD custom rulesets. f7edea1
  • Added the names to the pom.xml too 9db2ef9
  • Added a few missing names 0723f1c
  • Cleaner include/exclude for the core module 02e0f65
  • Restored some error messages f76b171
  • Use a Maven properties file b872a15
  • setProperties removes existing properties. 6cdbca9
  • Existing invokers may depend on adding properties. f2478cd
  • Lots of redundant boiler plate in the debug object updated. 15b9e77
  • Improved method naming, to set or add properties. addProperties implementation has been simplified and improved such as to use the existing putAll method on maps avoid NPEs. be40b52
  • Accurate and efficient assignment of a Kekulé form to a compound with aromatic bond types. e9e9611
  • Improved identification of tetrahedral elements. More improvements needed but now any non-planar bonds indicate a tetrahedral centre should be created in 2D depictions. In 3D depictions, we check the environment of each neighbour. The method is currently inadequant and fails to identifier interdependent stereocentres (as seen in test) but is required to avoid create tetrahedral configurations for non-stereogenic atoms (i.e. methane). d5a4c4b
  • Improved winding detection for tetrahedral centres. af43454
  • Explicit hydrogens invert winding. 28217ff
  • Clockwise sort should handle exactly opposite atoms. bdd6514
  • Perceive allene stereochemistry from 2D and 3D depictions. cab1181
  • Perceive extended tetrahedral from 2D/3D coordinates. 77f2fd4
  • Latest version of Beam with support for parsing and encoding extended tetrahedral stereochemistry. 8f1e18d
  • Extended tetrahedral support in SMILES. cd7cb58
  • Extended tetrahedral depiction. fa2b7f2
  • Conversion of extended tetrahedral stereochemistry to/from InChI. 023d216
  • Represent and store extended tetrahedral stereochemistry. df82687
  • Linear placement of cumulated atoms. 3a9104c
  • Radicals assigned to wrong atoms. 593a106
  • Bumping version - open for changes 62d462b