CDK 2.7.1
This page documents the changes for CDK v2.7 and v2.7.1. The patch version was made after some minor issues with how the new InChI code was organised were discovered by downstream projects.
Features
Switch from JNI to JNA InChI.
There are two main technologies for calling native code JNI (Java Native Interface) and JNA (Java Native Access). JNI requires writing a custom native wrapper which is then bound to Java code, JNA allows you to call the native methods of an existing SO/DYLIB directly. Essentially what this means is to expose the native InChI library in Java one needs to first write (and maintain) a native wrapper, with JNA we can just drop the InChI SO directly in. JNI InChI exposed InChI v1.03 and worked well for many years - unfortunately this project was no longer maintained and as newer more stable versions of InChI were released (now v1.06) an alternative was needed. A few years ago Daniel Lowe started JNA InChI and recently made it feature complete and released v1.0.
ChemAxon have also independently used the JNA path to integrated newer InChI libraries into their tools: (slides). It is not clear if this was made available, it is not listed on GitHub/ChemAxon.
Build on Java 17
The Maven plugins were updated to allow building on Java 17
Verify declared dependencies
The maven modules were checked for unused declared dependencies and used undeclared dependencies (mvn dependency:analyze
).
Organise and restructure test-jar and testdata
CDK was originally built with the ant
build tool, under this scheme there was a jar for the main/ code and one the test/ code. Test modules could share an inherit dependencies. To replicate this in maven we install and deploy "test-jar" artefacts. The project test code was restructured to put all common test code in the "cdk-test" module.
All test data was stored in a cdk-testdata module, this data has now been relocated to the test/resources
of each module where it is used. This meant some data was duplicated but means the ~18MB test-jar no longer needs to be uplodaded to maven central.
Remove Guava dependency
We have removed the use of Guava, the functionality could mostly be directly replaced with newer JDK idioms (Function/Predicate/Stream) which were not available in the past.
Use XorShift PRNG in ShortestPathFingerprinter (different fingerprint)
Commons Math3 was used in a single place to hash paths (Mersenne Twister) in the ShortestPathFingerprinter. Since this fingerprint method is not widely used and the hashes do not need to be cryptographically secure a simple https://en.wikipedia.org/wiki/Xorshift random generate is now used instead. This allows us to remove the dependency on Commons Math3. This does mean the fingerprint bits have changed, note the CDK version description is accessible via the Fingerprinter.getVersionDescription()
method.
Authors
137 John Mayfield
6 Egon Willighagen
1 dependabot[bot]
Full Change Log
- Bump version ready for development John Mayfield on 2021-12-14
- Bumped the log4j version Egon Willighagen on 2021-12-14
- Make log4j a test-only dependency of the InChI module John Mayfield on 2021-12-16
- Make sure TOTAL_DEGREE works correctly John Mayfield on 2021-12-16
- Additional test for converting "Molecules" to queries using the useful SMARTS therom: D=X+h. John Mayfield on 2021-12-16
- Updated CMLXOM version Egon Willighagen on 2021-12-16
- Update Mockito to 4.1.0 - fixing test failures (except InChI) on M1 AARCH64. John Mayfield on 2021-12-16
- Add in a Java 17 build John Mayfield on 2021-12-16
- Fix indents John Mayfield on 2021-12-16
- Update Jacoco plugin John Mayfield on 2021-12-16
- Formatting only - to make changes easier to follow John Mayfield on 2021-12-18
- First pass at moving from JNA to JNI inchi - some tests need adjusting. John Mayfield on 2021-12-18
- Fix Test - ignore longer extended tetrahedral - could be a warning. John Mayfield on 2021-12-18
- 5000L (long) default timeout in ms John Mayfield on 2021-12-18
- This is not a status message rather than log - makes sense John Mayfield on 2021-12-18
- These tests add the same bond twice a1E=a2E which is now a warning - used to be ignored. The tests were wrong John Mayfield on 2021-12-18
- This is now a warning, there is no EOF status. However it should perhaps set a sensible message John Mayfield on 2021-12-18
- More double bonds added twice. John Mayfield on 2021-12-18
- Message is empty string rather than null. John Mayfield on 2021-12-18
- This is the most questionable change but believe to be a bug in InChI 1.03. Using JNI INCHI setting the chiral flag = on or off we get "rA:9n..." without it we get ""rA:9...". In JNA INCHI we always get "rA:9n..." - this molecule is not chiral so it seems odd that the setting would change anything. Since this is only a change in AuxInfo this is acceptable. John Mayfield on 2021-12-18
- Looks like we can different timeout messages based on the system? John Mayfield on 2021-12-18
- Bump log4j-core from 2.16.0 to 2.17.0 dependabot[bot] on 2021-12-18
- Some minor version/scope cleanups. Hamcrest should be a test dependency. Make sure we pull in Log4j2 2.17.0. In QSARCML log4j-core should only be in the tests John Mayfield on 2021-12-22
- Move over to Log4J2 configuration - allows us to remove some log4j-1.2 deps John Mayfield on 2021-12-22
- We don't need the Log4J 1.2 API in these locations - unfortunately it still comes in via CMLXOM and JENA but better for now. John Mayfield on 2021-12-22
- Cleanup of the cdk/base modules - using dependency analzye to ensure all used undelcared dependencies are included and unused declared are removed. John Mayfield on 2021-12-22
- Cleanup of dependencies in CDK storage/io modules John Mayfield on 2021-12-22
- Significant dependency cleanup in the descriptor/qsar modules. John Mayfield on 2021-12-22
- Cleanup dependencies in depict/render module John Mayfield on 2021-12-22
- Cleanup dependencies in CDK tool modules. John Mayfield on 2021-12-22
- Cleanup dependencies in the misc/ module John Mayfield on 2021-12-22
- Broken by changes to another module - it implicitly depended on the CDK atomtyping. John Mayfield on 2021-12-22
- Make sure everything is used is declared in the cdk-legacy module John Mayfield on 2021-12-22
- More cleanup of base/ modules now I've got better at using dependency:analyze John Mayfield on 2021-12-22
- Minor issues of non-test dependencies now a clean build is tested John Mayfield on 2021-12-22
- More left overs - all good now. John Mayfield on 2021-12-22
- This should probably be install instead of test John Mayfield on 2021-12-22
- Looks like some things were folded into JDK 17 John Mayfield on 2021-12-22
- JENA-CORE pulls in a very specific version XML-APIS. There may well be a conflict but a fix should be to leave it as a transient dependency in cdk-io John Mayfield on 2021-12-22
- Avoid test-jar dependency for qsarcml - we only need the roundtrip function. John Mayfield on 2021-12-24
- This code is deprecated we don't need the full Desctiptor basic checks. Note it may make sense to move the descriptor interfaces to CDK interfacts then the descriptor tests can go to into cdk-test. John Mayfield on 2021-12-24
- Duplicate some basic generator tests. John Mayfield on 2021-12-24
- We can from cdk-fingerprint -> cdk-test-standard:test-jar:test by moving the required classes to where they are needed (they are not needed anywhere else). John Mayfield on 2021-12-24
- Remove cdk-diff dependency on cdk-test, it only used very basic assertion utilites which we can just write out or duplicate. John Mayfield on 2021-12-24
- Now cdk-diff is essenitaly independant we can completely from the cdk-test-inferfaces module and move all of the abstract tests into the main cdk-test. John Mayfield on 2021-12-24
- Eliminate cdk-test-core dependencies by moving some helper classes to cdk-test John Mayfield on 2021-12-24
- Move the TestMoleculeFactory to cdk-data/main. I'm not super happy about this but seems like the simplest solution. The other being to make the tests pass in a builder, or have the test molecule factory dynamically find the IChemObjectBuilder - that would be good except for now Java really doesn't like reflective access between modules. John Mayfield on 2021-12-24
- First of let's ignore some trivial tests that check a ChemFile etc is accepted... will we work out how to add these back in once relocated but the tests aren't THAT useful really. John Mayfield on 2021-12-24
- Demonstrates the tests are perhaps a little odd John Mayfield on 2021-12-24
- Duplicate the expectReader test from the FormatFactory. John Mayfield on 2021-12-24
- Extract out the interfaces for Rgroup queries so we can beter seperate tests. The Rgroup queries need a rethink in my opinion but this will do for now. John Mayfield on 2021-12-24
- We will be mocking this IChemModel and so no builder will be avaliable - rewrite to avoid creating IAtomContainerSets (and a temp ChemModel). John Mayfield on 2021-12-24
- Mock the ChemObjectIO test cases John Mayfield on 2021-12-24
- Remove test-jar dependencies on cdk-qsar. We can relocate the required interfaces to cdk-interfaces and add an optional setDescriptor method that takes a builder. We can then move it to cdk-test. John Mayfield on 2021-12-24
- One more in cdk-legacy John Mayfield on 2021-12-24
- Duplicate MolecularDescriptorTest for qsarprotein - I would actually vote to just move the two qsarprotein tests into qsarmolecular John Mayfield on 2021-12-24
- Need this dependency since scope=test doesn't pull in the extras John Mayfield on 2021-12-24
- Eliminate testdata dependency on cdk-charges. John Mayfield on 2021-12-29
- Eliminate testdata from cdk-sdg moving all resources to the module (and package) where they are used. John Mayfield on 2021-12-29
- Eliminate testdata from cdk-builder3d moving all resources to the module (and package) where they are used. We need to duplicate one resource also used in cdk-io test. John Mayfield on 2021-12-29
- Eliminate testdata from cdk-formula moving all resources to the module (and package) where they are used. John Mayfield on 2021-12-29
- Eliminate testdata from cdk-pcore moving all resources to the module (and package) where they are used. John Mayfield on 2021-12-29
- Relocate required file from testdata to where it is needed. John Mayfield on 2021-12-29
- Generics fixup John Mayfield on 2021-12-29
- forcefield doesn't need testdata John Mayfield on 2021-12-29
- cdk-qsarcml, cdk-qsarbond and cdk-qsarprotein do not need testdata John Mayfield on 2021-12-29
- Eliminate cdk-cip dependency on testdata, one dupe needed John Mayfield on 2021-12-29
- Eliminate cdk-qsaratomic dependance on testdata John Mayfield on 2021-12-29
- Seperate cdk-qsarmolecular from testdata, 4 files need duplicating to other modules as well. John Mayfield on 2021-12-29
- Seperate out cdk-fingerprint from testdata, 2 test files needed duplicating. John Mayfield on 2021-12-29
- Some collatoral damage - there is a "defeat device" that tries to slurp up all MDL files and run them though the atom typer. Probably will just replace with SMILES or at best an SDFfile. For now let's keep it working as we expect. John Mayfield on 2021-12-29
- Split out cdk-legacy from testdata - most of the dupes here are for that one atomtype matcher test John Mayfield on 2021-12-29
- Eliminate testdata from test-valencycheck John Mayfield on 2021-12-29
- Seperate out testdata from iordf. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-inchi. John Mayfield on 2021-12-29
- Eliminate the testdata dependency from PDB John Mayfield on 2021-12-29
- pdbcml is now independent of testdata - due to cdk-pdb changes. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-libiocml John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-smiles. As before most of the duplicatations are due to the cdk-core AtomType "MDLfiles" test John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-reaction John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-ctab. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-ioformats, we need some duplication in core/extra an io. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-io. John Mayfield on 2021-12-29
- Eliminate testdata dependency from test-extra. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-test-atomtype John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-test-core. John Mayfield on 2021-12-29
- Eliminate testdata dependency from cdk-test-standard. John Mayfield on 2021-12-29
- The explicit testdata module can now be deleted, all files have been moved where they are needed - remaining files are not used. Note some were copied i.e. for the "mdlFiles" CDKAtomTypeMatcher test. John Mayfield on 2021-12-29
- Move cdk-test test/ code to main/ - we now expect the modules using it should include via scope=test. John Mayfield on 2021-12-29
- We no longer need to build test-jars as none are be used. This saves the additional uploads to central. John Mayfield on 2021-12-29
- Bumped log4j to 2.17.1 Egon Willighagen on 2021-12-30
- Bumped CMLXOM to the latest version Egon Willighagen on 2021-12-31
- We can replace checkNotNull with requireNonNull (since JDK 1.7) John Mayfield on 2022-01-01
- There is no replacement for checkArgument in recent JDK's - however the logic is simple enough that we can just expand out the conditions. John Mayfield on 2022-01-01
- IDE warns about Jacoco version missing from sub-modules (e.g. test-standard) - let's define it once in the parent. John Mayfield on 2022-01-01
- Cleanup some other pom issues reported by the IDE, prerequisites is for plugins, cdk-qsar was duplicated John Mayfield on 2022-01-01
- Replace Guava Predicate/Function with the JDK 1.8 versions. This technically breaks the API but is very easy to update thanks to the :: operator. We will be revisiting the Mappings.java to use the Streams API John Mayfield on 2022-01-01
- Use JDK 1.7 Objects methods John Mayfield on 2022-01-01
- JDK 1.8 has Charsets John Mayfield on 2022-01-01
- Joiner -> String.join, since JDK 1.8 John Mayfield on 2022-01-01
- This was a useful one, the "official" calculation is (int)1+(n/0.75) for the default load factor. However in all these cases we expected n to be small - say 100 or so. In which case 135 vs 200 (2*n) is fine. John Mayfield on 2022-01-01
- Fixup - Objects John Mayfield on 2022-01-01
- Replace the XmlEscape from Guava, this was a unstable API anyways. We really just need >/< to be replaced but have handled the control characters anyways. John Mayfield on 2022-01-01
- Expected HashSet size John Mayfield on 2022-01-01
- Avoid Longs.toArray and just create the array John Mayfield on 2022-01-01
- Replace Guava CharStreams with normal JDK 1.8 streams John Mayfield on 2022-01-01
- Remove Guava caches, one is no longer needed - we have matchRoot now. The second case is technically incorrect as the contents of the IAtomContainer may change so it's not safe to cache here. There is a bug with AtomContainer2.add which will be fixed in another commit - for now we can use the copy constructor. John Mayfield on 2022-01-01
- Replace Lists usage - diamond brackets (1.7+) make thinks more concise than the old help function. John Mayfield on 2022-01-01
- Replace usage of Guava Ints utility. Integer.compare helps as well as IntStream John Mayfield on 2022-01-01
- Replace usage of Guava's BiMap John Mayfield on 2022-01-01
- ImmutableSet removal John Mayfield on 2022-01-01
- Replace usages of MultiMap - the computeIfAbsent and getOrDefault APIs of Map now make this construct on vanialla JDK easy. We need a slight tweak to our Cycle comparator since TreeMulitmap keeps things sorted by keys and then values. Since the key is just the length we can keep things equivalent by first comparing the length. John Mayfield on 2022-01-01
- We can now use Java 8 streams to filter an iterator - more verbose. John Mayfield on 2022-01-01
- Remove usage of FluentIterable using the Stream API. Some usages are more verbose but OK and perhaps indicate a List should have been consumed/returned in the first place. John Mayfield on 2022-01-01
- Replace Iterables and Iterables usage - alot of this can be handled with Stream's now. Again slightly more verbose, there is one extra case which needs special comment due to mocking. John Mayfield on 2022-01-01
- Here we do the old-school iteration, we canot call stream().count() because it use the forEachRemaining method which will was not mocked. John Mayfield on 2022-01-01
- Improved Mocking to allow use to use the stream().count() method. John Mayfield on 2022-01-01
- Replace ImmutableMap usage, JDK 9+ has Map.of() which is a direct replacement. However usage is minimal so we can just do it the verbose way. John Mayfield on 2022-01-01
- Remove Guava dependency John Mayfield on 2022-01-01
- Add a test of the HETATM atom types. John Mayfield on 2022-01-03
- Compress the PDB hetatm type_map by not storing the common C.sp2 and H types. John Mayfield on 2022-01-03
- Sort the type_map.txt lexicographically for a slightly smaller JAR footprint - add a comment explaining the common type handling. John Mayfield on 2022-01-03
- We need to also handle residues that are 100% C.sp2 and H John Mayfield on 2022-01-03
- Bumped to CMLXOM 4.0 Egon Willighagen on 2022-01-03
- We don't actually use FreeHEP to SVG output anymore - we have our own SVG draw visitor since it's such a simple format. John Mayfield on 2022-01-03
- Eliminate commons-math3 dependency by reimplementing the MersenneTwister logic. This was only used for a over-complicated PRNG used in the shortest path fingerprinter. A simpler linear-feedback shift PRNG would work just as well here but we want to keep backwards compatbility. Note this fingerprint it's really very good since the fingerprints are non-transativie and can't be used for substructure screening. John Mayfield on 2022-01-04
- Use a simpler and faster random number generate to hash fingerprint bits. Note we actually set more bits in a test fingerprint so in that particualar case the features were better distributed. John Mayfield on 2022-01-04
- Better comment John Mayfield on 2022-01-04
- Fix HTML5 JavaDoc errors (JDK 17). John Mayfield on 2022-01-06
- cdk-test code is now in the main/ so goes through the JavaDoc (will fix later) - but we should make sure it doesn't have errors in it. First off there many case of a retrun on a void - looks like a copy paste error. John Mayfield on 2022-01-06
- Some others errors in cdk-test John Mayfield on 2022-01-06
- Relocate all classes in cdk-test to the package org.openscience.cdk.test.*. John Mayfield on 2022-01-07
- Exclude anything in org.openscience.cdk.test from the JavaDoc John Mayfield on 2022-01-07
- Latest version of JavaDoc plugin John Mayfield on 2022-01-07
- CDK 2.7 John Mayfield on 2022-01-08
- New dev version 2.8-SNAPSHOT John Mayfield on 2022-01-09
- Update README John Mayfield on 2022-01-09
- Use HTTPS for EBI plugin repo John Mayfield on 2022-01-09
- Move the old JNI enums to a seperate (optional) module - this means the InChI works fine. If you need the enums you can optionally include the cdk-jniinchi-support module. John Mayfield on 2022-01-10
- Include in the cdk-bundle by default John Mayfield on 2022-01-10
- Deprecate the JNI Inchi return/inputs and provide preferred JNA inchi alternatives. John Mayfield on 2022-01-10
- Match the version in master Egon Willighagen on 2022-01-10
- Make sure we have correct storage order in a bond when reading from InChI. John Mayfield on 2022-01-11
- Version 2.7.1 John Mayfield on 2022-01-11