1.5.4 Release Notes

Egon Willighⓐgen 0000-0001-7542-0286 edited this page Jul 16, 2016 · 29 revisions

The main features of release include major updates to aromaticity, stereo-chemistry, isotopes, SMARTS and SMILES handling. A huge thanks to Egon and Stephan for reviewing so many patches, particularly at this busy time of year. There are a lot of bug fixes which have reduced information loss between formats and fixed several regressions.

Download available on Sourceforge.

Feature Summary

This section summarises the new features in this release, some aspects will be expanded on in detail or more blog posts (see. efficient bits). I still need to update some of the documentation but the following sections give a summary and example usage.


Rewrite of SMILES parser. Tetrahedral and double-bond stereo chemistry is now parsed and can be output by the SMILES generator. The SMILES parser no longer stores molecules internally allowing the same parser to be used across multiple threads. See also - New SMILES parser behaviour.
IChemObjectBuilder blr    = SilentChemObjectBuilder.getInstance();
SmilesParser       smipar = new SmilesParser(blr);

for (String line : lines)
    IAtomContainer container = smipar.parseSmiles(line);

Atom-types are no longer set automatically by the parser. Implicit hydrogen counts now follow the SMILES specification and provide improve conversion.

IChemObjectBuilder     builder = SilentChemObjectBuilder.getInstance();
SmilesParser           sp      = new SmilesParser(builder);
SmilesGenerator        sg      = new SmilesGenerator();
InChIGeneratorFactory  igf     = InChIGeneratorFactory.getInstance();

IAtomContainer m = sp.parseSmiles("[O]");
System.out.println(sg.create(m));                         // [O]
System.out.println(igf.getInChIGenerator(m).getInchi());  // InChI=1S/O

// configure atom types

System.out.println(sg.create(m));                         // O ([OH2])
System.out.println(igf.getInChIGenerator(m).getInchi());  // InChI=1S/H2O/h1H2

Aromatic molecules are automatically kekulised.

IAtomContainer m = smipar.parseSmiles("[nH]1cccc1"); // read as 'N1C=CC=C1'

If a molecule could not have a kekulue structure assigned without changing the formula an exception is thrown

IAtomContainer m = smipar.parseSmiles("c1cccc1"); // error!

Turning off kekulise allows parsing of problematic molecules but is not recommended as they will likely cause problems in other procedures.

IAtomContainer m = smipar.parseSmiles("c1cccc1"); 

Aromatic specification in the input is preserved on the CDK ISAROMATIC flag even if the molecule is not aromatic.

IAtomContainer m1 = smipar.parseSmiles("c1ccccc1"); // 6 aromatic atoms
IAtomContainer m2 = smipar.parseSmiles("c1ccc1");   // 4 aromatic atoms (note not really aromatic)
The SMILES generator has also be rewritten allowing the generation of different types of SMILES output. The following definitions are used to distinguish the types of output. These follow the Daylight SMILES specification and are used by other toolkits (e.g. Marvin.
Name Canonical Stereochemistry Isotope
Generic no no no
Isomeric no yes yes
Unique yes no no
Absolute yes yes yes
The new paradigm is to use one of static utilities to create a SmilesGenerator instance. The default instance (new SmilesGenerator()) produces generic SMILES. The method createSMILES has been deprecated and replaced with create(IAtomContainer) which can throw a checked exception.
// non-canonical, no stereochemistry or isotope information
SmilesGenerator smiggen = SmilesGenerator.generic();

// non-canonical, includes stereochemistry and isotope information
SmilesGenerator smiigen = SmilesGenerator.isomeric();

// canonical, no stereochemistry or isotope information
SmilesGenerator smiugen = SmilesGenerator.unique();

// canonical, includes stereochemistry and isotope information
SmilesGenerator smiagen = SmilesGenerator.absolute();

The labelling of the unique SMILES (using an equitable partition) has been optimised and it now very fast. Absolute SMILES currently uses the InChI to canonicalise the SMILES string. There are some problems due to InChI (correctly) delocalising charges in pi-bonding systems that in SMILES have distinct representations.

IMPORTANT: The canonical SMILES generated are different from previous versions of the CDK (1.4.x). There may be still be differences in future developer releases but this will be indicated in release notes.

Generated SMILES do not include lower-case aromatic symbols. This eliminates problems related to interrupting aromatic systems when reading. Canonical SMILES are written as the same resonance form and general the storage of structures as aromatic SMILES should be avoided. One possibly valid use for aromatic SMILES is the generator of string-fragment fingerprints (LINGOs). If you wish to write SMILES with lower-case symbols the an aromatic generator can be created as follows.
SmilesGenerator smigen = SmilesGenerator.unique()

The generator uses the ISAROMATIC flags present on the atoms and bonds, aromaticity is not re-perceived and for correct output the same aromaticity model (preferably Daylight's for SMILES) should applied before generation. Please see the section for more information on the new aromaticity API.

List<IAtomContainer> ms = ...; // molecules

Aromaticity arom = new Aromaticity(ElectronDonation.daylight(),
for (IAtomContainer m : ms) {
The method, create(IAtomContainer, int[]) now provides access to the output order of SMILES. This allows persistence of auxiliary atomic meta-data with SMILES output. The following example demonstrates how to append 2D coordinates to a SMILES output.
IAtomContainer  m      = ...; // a molecule with 2D depiction
SmilesGenerator smigen = SmilesGenerator.generic();

int[]  ord = new int[m.getAtomCount()];
String smi = smigen.create(m, ord);

// build auxiliary data
Point2d[] coords = new Point2d[ord.length];
for (int i = 0; i < coords.length; i++)
    coords[ord[i]] = m.getAtom(i).getPoint2D();

// suffix SMILES with coordinates (we use a string here but it could be encoded in binary)
smi += " " + Arrays.toString(coords);


Tetrahedral centres can now have an implicit part (hydrogen or lone-pair). Here is an example of obtaining the labelling on a sulfoxide.
IAtomContainer m = smipar.parseSmiles("CCC[S@](C)=O");
for (IStereoElement se : m.stereoElements()) {
    if (se instanceof ITetrahedralChirality) {
        // CIP_CHIRALITY.S
        CIP_CHIRALITY label = CIPTool.getCIPChirality(mol,
                                                      (ITetrahedralChirality) se); 
Stereochemistry can now be preserved between MDL Mol V2000, InChI and SMILES. When reading a mol file stereo elements are automatically created. If you wish to encode stereo elements from another format the StereoElementFactory can be used.
PDBReader      pdbr = new PDBReader(...);
IAtomContainer m    = pdbr.read(new AtomContainer());
The factory can create elements from 3D and 2D depictions (wedge/hatch bonds). The stereocenters are found in structures using the new Stereocenters API. Stereocenters identifies atoms that can support tetrahedral or double-bond stereo chemistry. It can also be used to verify stereocenters, consider [C@H](C)(C)O.

Atom that can support stereochemistry are found using the same rules as detailed in the InChI Technical Manual. The detection of stereocenter topology uses the method described by Razinger et al. A huge thanks to Tim Vandermeersch for helping explain some intricacies of the method. The currently implementation isn't yet complete and will only find some symmetric stereocenters but the coverage is relatively good. Future releases will aim to complete the detection in this class.

IAtomContainer m             = smipar.parseSmiles("C(C)(CC)N");
Stereocenters  stereocenters = Stereocenters.of(m);
for (int i = 0; i < m.getAtomCount(); i++)  {
    stereocenters.elementType(i); // Tetraherdal, Double-bond, etc.
    stereocenters.stereocenterType(i); // True, Para, Non, Potential     
Tetrahedral and double-bond stereochemistry is also now depicted by the structure diagram generator. Wedge/hatch bonds can be added to a pre-generated depiction (e.g. stored with SMILES) using the NonPlanarBonds - this class is currently package-private.


Previously the CDK provided three aromaticity implementations as AromaticityCalculator, CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector. Primarily the CDKHueckelAromaticityDetector was used internally within the library whilst new users would generally use AromaticityCalculator with it having the most intelligible name.

All three class are now deprecated with the functionality unified under a single Aromaticity class. It is well known that apart from "smelling nice" (DW) aromaticity is a bit of a loose concept in chemical information processing with difference opinions. The basic approach generally have two algorithmic differences, which atoms can contribute (delocalise) p electrons to the system and what rings (cycles) do we check Hückel's rule (4n+2). The new API makes these decisions explicit and allow the choice of ElectronDonation for each atom is and to what Cycles should we check this donation. The usage of the API will be smoother in future releases making it simpler to use a predefined combination of parameters.

The current electron donation models are:
  • piBonds() - a simple electron donation model which allows atoms next to cyclic pi bonds to donate a single p electron. Atom types are not required but bond orders should be specified.
  • cdk() - mirrors the model of the CDKHueckelAromaticityDetector, exocyclic groups (e.g. in quinone) are not considered. This model requires atom types.
  • cdkAllowingExocyclic() - mirrors the model of the DoubleBondAcceptingAromaticityDetector, exocyclic groups (e.g. in quinone) are allowed. Ketone's contribute 1 p electron, quinone is aromatic, guanine and caffeine are not. This model requires atom types.
  • daylight() - a model similar to that used by Daylight Chemical Information Systems. Atom types are not required but implicit hydrogen and bond orders must be assigned. Ketone's contribute 0 p electrons, quinone is not-aromatic, guanine and caffeine are.
A good set of cycles to use is allOrVertexShort, this set of cycles efficiently provides every possible cycle in the structure (e.g. large rings in azulene and porphyrin). If the computation of cycles was intractable it falls back to using a unique cycle set.
Aromaticity arom = new Aromaticity(ElectronDonation.daylight(),
for (IAtomContainer m : ms) {

The aromaticity of the deprecated CDKHueckelAromaticityDetector can mimicked with the following configuration.

Aromaticity arom = new Aromaticity(ElectronDonation.cdk(),
for (IAtomContainer m : ms) {
    AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(m); // required for model

The new aromaticity detector will clear all existing aromatic flags on a structure. This functionally allows us to normalise correctly and ensure the same aromaticity model is applied to all structures.

// m1 has aromatic flags set from input, m2 does not
IAtomContainer m1 = smipar.parseSmiles("c1ccc1");
IAtomContainer m2 = smipar.parseSmiles("C1=CC=C1");

// m1 still has aromatic flags set, m2 does not

// m1 and m2 have no aromatic flags set

(Sub)-structure Matching

A new Pattern API provides matching and mapping of structure queries. It provides a mapping of all matches (matchAll(IAtomContainer)) or the fist match (match(IAtomContainer)) and a conditional for convenience (matches(IAtomContainer)). The primary idea is to make the code descriptive but I'll demonstrate some other desirable features. The mappings are provided as a permutation of the query vertices represented as a fixed size array.

There are currently two implementations, Ullmann and VentoFoggia. The other matchers UniversalIsomorphismTester and SMSD could (will) be adapted to use the API.

The following example counts the number of times a substructure query was found in a list of targets.

IAtomContainer query   = ...;
Pattern        pattern = Ullmann.findSubstructure(query);

int hits = 0;
for (IAtomContainer target : targets)
    if (pattern.matches(target))
The mappings returned by the pattern are lazy and generated as needed. Implementing Iterable we can loop over the mappings directly.
for (int[] mapping : pattern.matchAll(target)) {


Utilising Guava utilities we can limit and count the number of mappings.

// first 5 matches

// does the pattern match exactly 5 times 
if (FluenentIterable.from(pattern.matchAll(target))
                    .size() == 5) {
This lazy generation is useful for matching stereochemistry (on by default) as we can find the fist match which has the correct configuration. Double-bond configurations are also matched.
       .matches(smipar.parseSmiles("[C@@H](CC)(C)O"));         // true! (note neighbour order)
       .matches(smipar.parseSmiles("[C@H](CC)(C)O"));          // false! (note neighbour order)
This will likely be cleaner in the next release but we can filter for the unique atom matches using the UniqueAtomMatches predicate.
for (int[] mapping : FluentIterable.from(Ullmann.findSubstructure(query)
                                   .filter(new UniqueAtomMatches())) {


The real power of defining the Pattern API is we can optionally add pre-screens to queries. A simple heuristic was already provided by the UniversalIsomorphismTester but new API makes the approach much more flexible. The following example shows how we can define a pattern which intercepts the the match and checks the fingerprints first. Unfortunately the CDK fingerprint generation is now much slower than the structure matching but it demonstrates a proof of concept for future releases with faster fingerprint implementation.
final Pattern pattern = new Pattern() {
    @Override public int[] match(IAtomContainer target) {
        if (!checkFingerprints(query, target))
            return new int[0];
        return Ullmann.findSubstructure(query).match(target);

for (IAtomContainer target : targets) {
    if (pattern.matches(target)) {


The CDK SMARTS functionality has been optimised and extended. Firstly, the matchers have been updated to use a SMARTSInvariants class which is attached to queries before matching. Previously there were several initialisation steps (performed by SMARTSQueryTool) and queries could not be used between threads. Isolation of the invariant values into this holder allows us to specify different schemes for matching patterns (i.e. what ring set should '[R6]' check?).

As with structure matching tetrahedral queries are now supported.
IChemObjectBuilder bldr   = SilentChemObjectBuilder.getInstance();
SmilesParser       smipar = new SmilesParser(bldr);

SMARTSQueryTool sqt = new SMARTSQueryTool("[C@](C)(N)CC", bldr);

sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 1 hit
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 0 hits
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 0 hits

sqt = new SMARTSQueryTool("[C@@](C)(N)CC", bldr);

sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 0 hits

sqt = new SMARTSQueryTool("[C@?](C)(N)CC", bldr);

sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 1 hit
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 0 hits
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 1 hit
Logical queries can be used on tetrahedral stereochemistry.
sqt = new SMARTSQueryTool("[C!@](C)(N)CC", bldr); // equivalent to @@?

sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 1 hit

sqt = new SMARTSQueryTool("[C@,N@@](O)(C)(N)CC", bldr);

sqt.matches(smipar.parseSmiles("[C@](O)(C)(N)CC"));    // 1 hit
sqt.matches(smipar.parseSmiles("[C@@](O)(C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("C(O)(C)(N)CC"));       // 0 hits
sqt.matches(smipar.parseSmiles("[N@+](O)(C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[N@@+](O)(C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("[N+](O)(C)(N)CC"));    // 0 hits
Double-bond configuration will also matched.
sqt = new SMARTSQueryTool("C/C=C/C", bldr);

sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 0 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 0 hits

sqt = new SMARTSQueryTool("C/C=C\\C", bldr);

sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 2 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 2 hits

The query /? is supported but logical operations such as C/C=C!/C or C/C=C/,\C have not yet been implemented.

sqt = new SMARTSQueryTool("C/C=C/?C", bldr);

sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 0 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 0 hits
Component level grouping has been added.
sqt = new SMARTSQueryTool("[#8].[#8]", bldr);

sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 2 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 2 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 2 hits

sqt = new SMARTSQueryTool("([#8].[#8])", bldr);

sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 2 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 2 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 0 hits

sqt = new SMARTSQueryTool("([#8]).([#8])", bldr);

sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 0 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 0 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 2 hits

The stereochemistry and component-level grouping will also match correctly in recursive SMARTS.

sqt = new SMARTSQueryTool("[O;D1;$(([a,A]).([A,a]))][CH]=O", bldr);

sqt.matches(smipar.parseSmiles("OC=O.c1ccccc1"));  // 1 hit
sqt.matches(smipar.parseSmiles("OC=O"));           // 0 hits


The IstopeFactory is now an abstract class (API CHANGE) with two instances, the XMLIsotopeFactory loads isotopes from the Blue Obelisk Data Repository (BODR) XML whilst the Isotopes uses an optimised binary encoding of the BODR. This not only provides a general improvement in performance but also allows its use on micro architectures. Egon has written an Isotopes app for Andriod that utilises this new functionality. The Isotopes also produces immutable IIsotope instances and is generally preferable.
IsotopeFactory iso = Isotopes.getInstance();
   .getMassNumber(); // 6 = atomic number
The MDL V2000 reader and writer now use the MDL Valence Model to define implicit hydrogen counts on non-query structures. A structure with aromatic bonds (MDL Bond Order 4) is considered a query structure when defining hydrogen counts. Tremendous thanks to Roger Sayle for this valuable contribution to the community - Explicit and Implicit Hydrogens: Taking liberties with valence - Nextmove Software.

Hydrogen atoms can be suppressed in the hash-code - this does not modify the structure but allows correct hash encoding.

MoleculeHashGenerator hashgen = new HashGeneratorMaker().depth(4)

// without suppressHydrogens we would generate 2 hash codes for these structures
IAtomContainer m1 = smipar.parseSmiles("C[C@@H](O)[C@H](O)[C@H](C)O");
IAtomContainer m2 = smipar.parseSmiles("C[C@@]([H])(O)[C@]([H])(O)[C@]([H])(C)O");


The hash code will also encode IStereoElements (as shown above) allowing identical hash codes from SMILES, InChI and 2D/3D depictions.

A Cycles utility provides a facade of cycles / ring set perception. The Cycles is an optimised representation referring to cycles as paths of vertex indices.

// a set of unique cycles
Cycles cs = Cycles.relevant(m);

Gotcha! - paths (walks) are closed and start an end in the same vertex

Cycles cs = Cycles.all(smipar.parseSmiles("c1ccccc1"));
cs.paths()[0].length; // 7! because the 6 member ring of benzene has the closed-walk {0, 1, 2, 3, 4, 5, 0}

Computing all cycles may throw an intractable exception.

try {
    Cycles cs = Cycles.all(m);
} catch (Intractable e) {
    // fullerene, cyclophane, etc.

To obtain a backwards compatible (but inefficient) IRingSet.

IRingSet ringSet = cs.toRingSet();

The bounds of a diagram can now be provided using the Bounds element ensuring atom adjuncts (e.g. hydrogen labels) are not cropped. The bounds will be used by generators in future releases. Alignment of atom symbol labels has been improved with more updates in future.

The GeneralPath API has been updated to allow rending of filled and colored shapes. This is shown in an example of a improved highlight (future releases).

CDK Highlight

Test Status

This release cleans up the number of test failures and errors. The remaining test failures are primarily found in the forcefield and qsarionpot modules due to legacy code.

20,639 tests
18 failures
18 coverage failures
0 errors


 436  Egon Willighagen 
 132  Stephan Beisken 
  48  John May 


   448  John May
    66  Egon Willighagen
     2  Rafel Israels
     2  Jonathan Alvarsson
     1  Diego Pedrosa
     1  Magda Oprian

Full Change Log

  • Bumping version number for release. 97b8cea
  • Removing test classes from the main/ source tree. e617299
  • Resolving a test failure in absolute SMILES and including the absolute SMILES tests in the inchi module (inchi is used to canonicalised the SMILES) 40de01f
  • Resolving several JavaDoc errors. 827e573
  • Resolving regressions in ‘cdk-hash’, the way the class need to be mocked was changed. 40dfc65
  • Removed an artifact file d6766ea
  • Removed a no longer existing package 6a1b466
  • Updated the Eclipse .classpath for Beam 0.4 eee3f7e
  • Don’t ignore tests which now pass. 5ddc419
  • Fail if a molecule was not atom typed - otherwise this cause a runtime exception in the aromaticity CDK model. d7c629b
  • The test assertion had the super and sub structure the wrong way round. 9abeacc
  • Updated SMARTS matcher finger MACCS keys which were missing. As an example bit 56 ([#8R]) was previously missing, this bit tests for an oxygen in a ring. This is the case in the test molecule ‘CC(=O)C1=CC2=C(OC(C)(C)C@@H[C@@H]2O)C=C1’ and the bit is now correctly set. 83669b2
  • Correcting regression due to a typo. 848617d
  • Resolving regressions in ‘cdk-hash’ we need to provide a non-empty ‘prev’ array when testing the geometry encoder. 53e2fd8
  • Cite Noel’s Universal SMILES article in the InChINumbersTool. c30d0ca
  • Missing header in Absolute SMILES test. 1a8fb6f
  • Documented the new way to create logical atoms. fb6ea68
  • A typo picked up during review. 6148f01
  • The new connectivity checker orders atoms differently. This test depended on the atoms being at a set index which has now changed. 1739e2c
  • Resolves some regressions in the hash module - when we read from MDL now StereoElements are created. This effectively meant each element was doubled in the hash. To avoid this amplifying affect we don’t modify the ‘next’ value but instead set the ‘next’ value to a modified ‘current’ invariant. No matter how many times the value is set, ‘next[i]’ remains the same. e6b1f42
  • Directional labels are now assigned to all substituents of double bonds. d2b13b0
  • Propagate aromatic setting from the SMILES writer the SMILES generator. cb337a4
  • Generate aromatic SMILES for unit test. 450e8ae
  • Ring connectivity is now alway set - it is invariant and useful to have when resting ring membership (logical). ca9317c
  • The aromatic option when generating InChI’s is on a singleton and should be reset. 9c5c0a8
  • The old API method didn’t throw and exception - this is reverted so existing code doesn’t need to be updated. 38f678f
  • Testing absolute SMILES on some molecules with stereo-chemistry. 6f5385d
  • Preliminary tests show we don’t want to pass aromatic flags to InChI. Perhaps this should now be the default as SMILES are kekulised on load. 5089f4a
  • Correct hydrogen labelling and additional transformations when generating absolute SMILES. 2ce25c9
  • Obtain the InChI canonical numbering via reflection, the cdk-inchi module must be present to generate SMILES but is not a dependency of the ‘cdk-smiles’ module. 5c6fdd9
  • Ensure canonical SMILES have the same kekule representation. 9d4fdf0
  • Access SMILES output order. 8b6c397
  • Latest version of beam includes performance improvements, custom sort operations (required for Universal SMILES), ability to resonate a structure and generate a canonical kekule representation (i.e. avoid aromatic SMILES), minor bug fixes. 9b0ba64
  • Rename arbitrary to generic - correct description. 753bb8a
  • More concise name for creating SMILES strings (most common op) the reaction smiles has a name to avoid using overloading. Existing names still work for downstream usage but are deprecated. f888d6c
  • Resolving regressions in cdk-fragment. f05f9ab
  • General SMILES tests are no longer canonical (should not change with canon changes), those which are canonical test for equivalence and did not need changing. ed085f3
  • Configuration options in the SMILES generator. We now use a different canon procedure which will require updates to tests. 44c2471
  • Circumvent CDK quirks of leaving nulls around. baf20fe
  • Beam will now auto-suppress hydrogen counts (library update in a few commits). 17a7826
  • tidy up imports and stray variable from debuging. 8bc1570
  • MDL MACCS bit was taking the majority of the time - we can do it faster without using SMARTS. 62fba11
  • Don’t actually partition the molecule - just count how many component there are. 2922be2
  • Avoid using SMARTQueryTool we can be more efficient as we know the number of matches we need to find. c48b7da
  • A filter for unique atom matches. 15602c7
  • Cache recursive entries - without the caching queries in the MACCS fingerprint runs a lot slower. We use guava cache builder and weak keys allow entries to be GC’d when the AtomContainer has no other reference (i.e. if we are iterating over a file). f237d46
  • Configuring ring membership (i.e. not size based) by default - this is cheap enough we should just do it. 77cf971
  • Load and parse SMARTS when the keys are defined. fb22a0a
  • Different style of comment allows us to skip easier. 4a72e70
  • Lazy loading of MACCS keys. b6cf711
  • Use a fresh set of chem objects for each generic read_chemobject test. Took a while to find the bug due to the same container was being used in different formats. f4d21cf
  • The container being read may already have atoms - we can’t confirm it’s a query and must set the valence correctly. 87a74f5
  • The primary MDL mol file used by the MDLV2000ReaderTest (i.e. SimpleChemObjectReaderTest) was not valid. The original bug (#58) reported that charges were not round tripped. This is because the charge column was not aligned. Adding the ‘M CHG’ of course resolved the issue but the file was not a valid MDL file in the first place. a6c9d17
  • Dependencies not inherited (they are on the classpath but not when tested individually). d4195b4
  • Checked exception now thrown for unset bond orders. 5f985bc
  • Don’t add null stereo elements (unspecified) to the container. 6716abb
  • Automatically read stereo configurations from MDL V2000 format. Can be turned of but is on by default. 9c6fef2
  • A factory to create stereo elements using 2D/3D coordinates. b108c9e
  • Identify atoms supporting tetrahedral and double bond stereo chemistry. Classify the atoms determining if they have constitutionally different ligands (true stereocenter) or have symmetric ligands (para stereocenters). 2276a45
  • Deprecating resolve overlaps - it does not work and should be used. It’s usage has been removed from the SDG. 8507887
  • Read double bond stereo configuration from InChI. 4cc644f
  • Indicate with a labelling is needed or only the symmetry classes. fe62e9e
  • SMILES generation needs to throw an exception for invalid bond types. The exceptions were already thrown but were unchecked. Ideally IOException would be best but down stream invocations rely on other parts throwing CDKExceptions so this was used instead. 2006536
  • Required dependency (not inherited). bd38292
  • Missing import 89000d8
  • Turn off kekulisation for only these parsers (note wrong variable name was used). 9e175a2
  • Including copyright information 6a8eca1
  • Resolving regressions due to the new connectivity checker. If atoms are removed but the stereo elements (involving the removed atom) are not there can be a null pointer. We now verifying the stereo element atoms are present in the components. Currently stereo elements can not be removed without clearing all stereo centres. d9dbf44
  • Correct layout to represent double-bond stereochemistry and label bonds to indicate tetrahedral configurations. 46a6c47
  • Assign numbers to unlabelled atoms (i.e. hydrogens). Beam will handle the correct hydrogen ordering (CDK API makes this difficult). dfc5a6f
  • Universal SMILES Rule E - don’t start on negatively charged oxygen. 5acf0bb
  • Deprecating old smiles parser option and writing documentation for the new parser. f224628
  • Provide the error message from Beam in full. acc0726
  • Pass correct hydrogen count when generating InChIs + trailing spaces. 4713c25
  • Classes missed when refactoring the bond length parameter. f7cc6c1
  • Compute scaling for disconnected structures - this is required to avoid adjunct label collisions when there is > 1 isolated atom. 8945efa
  • A new rendering element (Bounds) that the renderer will pick up and use to fit the diagram to the required size. A generator can optionally produce a bounds object for the space it requires - when multiple bounds are present the min/max coordinates of all bounds are used. 71de262
  • Don’t set the scale parameter at the same time as the affine transform. The scale needs to be set before generation but the transform can be set after (required for improved bounding). 588ffc8
  • Move bond length parameter from BasicBondGenerator to BasicSceneGenerator. From the changes it is easy to see the bond length is used in relation to scaling compounds - this isn’t really a parameter to change but instead is for normalising coordinate systems. 66fd708
  • More flexible GeneralPath element - allow filled shapes and stroke width specification. Utilities are provided to convert from Java2D API. f61d542
  • Aligning java 2D and cdk rendering element path data types to use arrays. 5cc07ae
  • Store winding rule in general path element. 01b8463
  • Improved stroke caching - we store sub pixels strokes so the graphics context can decide how best to handle. 725e4b8
  • Smoother joins between lines - could be a parameter. 886dc7d
  • Correct scaling of line thickness - we need consider the ZoomFactor and Scale. Both of these are included in the affine transform. 03ab7db
  • Draw lines using double precission coordinates - the graphics context will decided best how to interpret into pixels. cd198c5
  • Use a rounded rectangle instead of a square to fill in the background of the atom symbol (i.e. obscuring bonds). faa820d
  • Improved bounding box accuracy around atom symbols. 6cff115
  • Including invariant refinement in Canon. 36f346a
  • A utility for ranking indicies by a seperate array of values. The ranking is built around a sorting procedure with the comparisson baked in. 4b676f4
  • Much faster generation of initial invariants, these don’t distinguish all cases but can easily be substituted for better (more expensive) values later. Using the adjacency list representation to avoid using the AtomContainer results in nearly a 80x speed improvement. This also provides correct values - the previous invariants were mixing partial / formal charges and missing explicit hydrogens from the H count. The new values are also picky and complain at unset values - there have been many bug reports with non-canonical SMILES due to missing hydrogens. Also adding the primes to the class for internal use. 047d0db
  • Deprecating the canonical labeller and the prime numbers utility. 73debc8
  • Obtain numbering required by Universal SMILES. 1dcb8f0
  • Separate auxinfo generation from the numbers and allow options to be provided. f20c555
  • If the InChI generation provided a warning this is also okay… 7bbd9c0
  • Ignore aromatic information when converting atoms and bonds. a79ef2d
  • Don’t set geometric configuration for aromatic bonds. f8efa2b
  • Don’t thrown an error for low-charge values when deterring electron contribution. b40be9a
  • fixup failing unit test 3d029e1
  • Fixup - component grouping in recursive smarts 258503d
  • High-level tests verifying the smarts queries matches stereo correctly. 014577c
  • Rather than try to reason about what chirality specification was accepted we can instead match within the atoms. This makes the chirality matching in logical atoms easier to understand and also allows correct interpretation. Negation is now possible and ‘[!@]’ will match anything that is not anticlockwise (equivalent to ‘[@@?]’). 6a918e9
  • Rename StereoMatchPredicate to StereoMatch. Make the StereoMatcher for SMARTS public (temporary) which allows it to be called from the smiles.smarts.package (where SMARTSQueryTool is). Only apply stereo matching to non-query molecules - ideally we would automatically choose which one to apply but at the moment that would cause a cyclic dependancy. cf3d78f
  • Match SMARTS geometric (double-bond) stereo queries. 34f6b9a
  • Create double-bond stereo chemistry for queries. Currently functionality is restricted and doesn’t handle logical bond operators. 01b6173
  • Don’t use logical bond operations for the stereo/unspecified representation. This is now the same as how the atom-based chirality is represented. f9c1794
  • Match SMARTS chirality queries. Ideally this would be in with the other predicate but that would cause a cyclic dependency between isomorphism and smarts modules. a79f224
  • Access the chiralites of a matched atom. This currently pulls the required chiralities out of the matched atom which is a little complicated on logic. It might be better to push the tetrahedral specification in and test matching that way. e5efc79
  • Correct configuration of chirality. The suffix logic was incorrect and doesn’t match degree but is instead a permutation number. Usage of @1 = @, @2 = @@ is rare but could be added in later. 72207d1
  • Track correct order of neighbours - ugly but gets the job done. May well rewrite SMARTS in future. f75e026
  • Formatting SMARTS query visitor. 6a53712
  • Match component level grouping in recursive SMARTS - fixes bug #1312. Also spotted that the VF algorithm won't match disconnected queries. For now we put in an easy fix which isn't optimal but gets the job done. 9f9b6ad
  • Parse and test component-level grouping. The parser will also accept larger ring numbers now from 0-99 (conflated commit - sorry). d026e32
  • Moving the SMARTQueryTool over to the new substructure search. This caused one regression but depict match verified the new value is correct. 8074978
  • Respect component level grouping in substructure queries. ae037b3
  • Parse component-level grouping indication. dfe3c65
  • Despite the conversion - still faster. Also looks like the electron containers / stereo were never partitioned. 75947bc
  • Algorithm for connected components. 437c9a0
  • Simplify parsing of recursive SMARTS allowing multiple levels of recursion and easier handling of ring, components (next) and stereo (later). ce6b61d
  • Cleanup the rest of the logical atom expression (deprecation) and use the create using the new operators. cd92f6e
  • Less ‘stringy’ programming. Rather than define an ‘op’ as a string create separate objections. This will simplify the stereo-matching (later). 55c48f5
  • Simplified recursive query - the query is no longer mutable which means the recursive queries are rerun. We may want to reintroduce the cache but it should be ‘safe’ and not depend on the query being mutable. e6325ae
  • Remove specialised initialisation of recursive/hydrogen atoms. 0d3be50
  • Format SMARTS atoms that need to be updated. 2532eee
  • Store the ‘target’ as an invariant required for SMARTS. 8cc06f9
  • Fixup unit tests 3f5b634
  • Stereochemistry integration tests. fefeb68
  • A predicate to filter for (sub)graph-isomorphisms that also match stereochemistry. Implemented as predicate means any (sub)structure mapper can be made stereo sensitive. Currently the functionality doesn’t allow partial mappings (MCS) but could be adapted/extends to do this easily. cce5d38
  • Correcting typo - caught by a test. 17b26b7
  • High level tests for substructure matching. We need to put these in the smarts module for now to avoid a cyclic dependency - ‘test-smiles’ already depends on ‘isomorphism’. 394547a
  • Current front end API for Ullman. 410fd69
  • Missing annotations. 7fc2544
  • Implementation of the mapping state for the Ullmann algorithm. 14874a3
  • The (current) front end access to the Vento-Foggia substructure/isomorphism algorithm. 0b1fc9a
  • Stream matching states as an iterator. 3f7cb67
  • Including tests in module suite. 353c192
  • Concrete implementations of the VF for matching substructures (substate) and identity (state). The test are quite tricky to write but higher level integration tests will be added later. ca35398
  • Defines the internal state API for (subgraph)-isomorphism mappers and implements an abstract superclass for the Vento-Foggia (VF) algorithm. f3908b8
  • Atom and bond matchers allow us to move the matching logic outside of the (subgraph)-isomorphism mappers. 1ec06d6
  • A small optimisation thought of whilst writing up ring perception. If the graph is biconnected we know how many cycles there should be in the MCB. Adding this check doesn't show much improvement on datasets of chemical compounds but has a dramatic affect on fullerenes. 8a0aac0
  • Do not leave hydrogen counts unset - also load isotope information. 5ad5e0e
  • Use less temp arrays - makes sense to compute the invariants as we go. It's still useful do ringNumber/ringSize this way though. bf9b180
  • Only doing the SSSR when ring properties are tested does provide a decent boost but causes a regression in 'pcore'. The SMARTSQueryTool is mutable and one can change the query and or targets. This means the ring properties may now be calculated when they are actually needed. This optimisation is still useful but we would need new classes which restrict how users set the queries. 9f9370d
  • Okay - let's not depend on the hydrogen counts being set to 0 and instead test the actual bits which have been set. These particular fingerprints cannot be used to substructure filter these molecules. Checking which bits are set documents why the structure key matched the substructure but not the superstructure. e1d2a47
  • Ignoring ring set configuration tests for now - these were mainly to show there are different values. In reality there are other parts which can change also and providing a 'Daylight' or 'OpenSMARTS' scheme is probably better for usability. dd6825d
  • Default aromaticity model is now daylight - to use the CDK model it must be set. The test testBasicAmineOnDrugs_cdkAromaticModel causes an error because one or more atom types are not identified by the CDKAtomTypeMatcher. These atom types are likely to be fixed by another pending patch. ae9527e
  • Don't need to preserve the aromaticity anymore as we can apply the correct model and get the expected matches or non-matches. 54810ce
  • SMARTS extension matching for hybridisation requires atom typing. 2e52d0e
  • Use the atom invariants instead of the formal neighbour count. The first requires atom typing. 090d924
  • Using the new aromaticity model. 59bf1a8
  • Removing the option which was a work around for different aromaticity models. We now have different models and set that instead. 4f83c28
  • A tractable set of cycles for aromaticity perception - required as we're going to use the new aromaticity in the SMARTS Query Tool. c4efc8e
  • Including new test classes in module suite. 4db918c
  • SMARTS queries complain when bond orders are left unset - they're needed to compute the valence. 486aa5c
  • Ensuring implicit hydrogen count is not null. The fingerprint molcules may need more attention. 3d61ec0
  • Simplified identification of unique matches using sets. 38d9fac
  • Type safe lists and renaming atom mapping method. Atom mapping means given an atom in the query matched which atom in the target. This is not provided but instead only the set of atoms which were matched. 11497fb
  • Compute the invariant properties using the new class. dc4098c
  • Set the 'ISINRING' flag on the query container to indicate ring properties are required. c322441
  • Temporary class to allow access between the two SMARTS class trees. They're currently in separate package and the SMARTSAtomInvariants shouldn't really be a public class but for now we need a way to expose functionality. 251852b
  • Incorrect bug report (824) changed the meaning of degree in SMARTS. This should be put back but will likely need discussion first. d82a6c7
  • Clean up of some other matchers which didn't need adapting to the new invariants. 6f44381
  • Adapting and cleaning up several SMARTS query matchers to use the new invariants. The query value is now stored in the class and not on the query atom - this mirrors how other matchers work and generally makes things cleaner. Access to the values was not used (or currently useful). Serialization was removed - the class did not implement the interface and it looks like the UID was just added by default. Serialization was removed rather than fixed as it is rarely useful and there are much better techniques. Several unused classes were removed - these fullfilled the same functionality as others and it was confusing to have two for the same purpose. The RingAtom/SmallestRingAtom can now be done by using different invariants. The DegreeAtom was incorrect, matching charge, and the ExplicitConnectionAtom fulfilled it's use case. 2d7c628
  • Access the new atom invariants from every SMARTS atom. If the invariants are required by the matcher they must be set! 7d321e4
  • Storage of SMARTS invariants which replaces the setting of multiple properties with a single type safe data holder. Different invariants (i.e. for rings) can be computed but for now the default daylight implementation is provided. 61ef552
  • Using the pattern *Visitor.java ignores adding a cdk module tag to a generated class - avoid this by specifying the full the names of those we actually want to ignore. 7ece0dc
  • Three classes already have explicit @cdk.module annotation 26c28a4
  • Move SMARTSAtoms back to the smarts module. 6d7c4ee
  • Don't repurpose SMARTS query atoms for other uses. We can define our own matchers internally. 1029a5a
  • Minimising the scope of SMARTS parser classes. These classes are specific to the parser and don't need to be exposed in the public API. The atom matchers in isomorphism.matchers.smarts are still public but could also be hidden behind a factory. This commit removes about 50 classes from the JavaDoc making it easier to find other parts. 8f0926d
  • Ensure non-negative height is passed when drawing a rectangle - bug #1163. 408f409
  • Correctly consider the heavy bonds and hydrogen counts. When connectedHeavyAtoms == 2 the full bond list was checked for the aromatic bonds. As the full bond list could contain an explicit hydrogen we should filter it to ensure the aromatic bond checking works correctly. f4a3dbf
  • Don't check for aromatic atoms before checking charge. Carbon, 'C.plus' cation can also be aromatic and so the charge types should be checked first. 8d36482
  • Aromaticity requires atom typing. 7769b45
  • Update header/copyright information 4a410b8
  • Also include the dependencies of the smiles module 8d0a833
  • With the fix, the number of tries needed to get the expected accuracy is much less 0b6de9a
  • Improved random number generation in a range. b4eb6a8
  • [PATCH] Wrote missing test: Tanimoto on IBitfingerprints d827176
  • Removed a bad tests: the matcher uses and requires implicit hydrogens explicitly, and the test does not have them; the testFindMatchingAtomType_IAtomContainer_IAtom() does and works find 6c8280c
  • Deprecating old implementations. d4c923f
  • Making it clearer how to use the electron donation factory methods and updating the documentation with the new naming. af8b84c
  • Correcting copyright year. f69a8bc
  • Correcting typo in benzene. 777a3eb
  • Including copyright. 7884281
  • Changing factory method names for the CDK models. The factory does not take any attributes. b3b63c6
  • Mark cyclic atoms and bonds before removing non-aromatic atoms. 646cf60
  • Don't remove atoms whilst iterating. f03170a
  • Improved documentation - any more suggestions welcome. 1a4ac10
  • Aromaticity perception using configurable electron donation models and cycles. 5207409
  • Adding a generator for the cycles/rings used by the CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector. 8c6c50e
  • Use the edge to bond map for a small performance gain. 4a5ff1e
  • Removing some complexity by using the new edge to bond mapping. aeb0293
  • Utility to simplify bond lookup from index endpoints of the edge. 3359573
  • For the simple cyclic pi bonds aromaticity model - don't allow atoms which are next to two cyclic pi bonds. Documentation wording also improved. b193aa9
  • Not enough electrons - then it cannot participate. 062d720
  • Correcting coverage annotations. 114eb5b
  • Including other aromaticity models in the 'standard' module suite. 84433ff
  • The Daylight model for how many electrons are donated when computing aromaticity. a20922f
  • Correct check for cyclic bonds. 43352de
  • Correct name for compound being tested. 3e04187
  • A simple aromaticity model for MDL/Mol2 file formats. c9a6538
  • Rename ExoCyclicAtomTypeModelTest.java to ExocyclicAtomTypeModelTest.java 6e16157
  • Encoding the existing aromaticity model from CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector as a new interface ElectronDonation. This separation makes it easier to use different (or a custom) aromaticity model within the CDK. e28cd17
  • Ensure no modifications whilst iterating. e367ebf
  • Another NPE test: when the atomicNumber doesn't correspond to a isotope list 0debacc
  • Fixed unit tests: now that the addEx|ImplicitHydrogens() doesn't change atom type names anymore, we have to do that explicitly 1fb8356
  • Removing long running test which did not have any assertions. b03b177
  • Don't change anything except the hydrogen count 92577b0
  • Added line separator for multiline extra data (when length < 80 chars). 03d56bb
  • Added additional checks to fix NPE regressions e5a4220
  • Updated the source URI for updates from the SF platform e5e112c
  • javadoc 7c78cc8
  • Added missing testing and some missing JavaDoc 3c5d9bf
  • Some tuning: store atomicNum as byte, and use the actual file size when reading 965b787
  • Binary format for isotope data: no smaller jar, seemingly a small performance improvement 5e154d5
  • Added an index based on the element symbol, further speeding up isotope info lookup; also some further tests for unexpected input 126d85c
  • Moved the CML-based isotope reading to extra c10e3b3
  • Use the BODRIsotopes as much as possible, paving the way to move the XMLIsotopeFactory to the extra module (it must not be removed: the BODRIsotopeDumper class depends on it) 2e5c986
  • Added an abstract class with the shared functionality of BODRIsotopes and the old IsotopeFactory now called XMLIsotopeFactory 694b284
  • Added a .dat based isotope reader + tool to create a .dat file from the BODR .xml 05bdba5
  • Removing print to stdout. 06d535c
  • Use the silent AtomContainerSet, taking down the module's test suite time down from about 20secs to 4secs on my machine ce2c98c
  • Update test-valencycheck.libdepends cdd817d
  • Update valencycheck.libdepends b1d639c
  • Slow but convenient check for cyclic bonds. 31c9bdf
  • Allow the cyclic vertex searches to test if an edge is cyclic - simplifies some other code, fixes a bug (new RingSearch tests) and a allows us to provide a utility in RingSearch (next commit). 201e9b0
  • Simplify efficient creation and conversion (to IRing) of the various cycle sets. e5ca2ec
  • Inlined cyclic molecules for testing the new cycles utility. 536bd94
  • Truncate input from MCB (last vertex is the same which was not expected by new Path()). 9bf8f73
  • Incorrect class name in coverage annotations grrr. f344f7d
  • Path copying done down-stream. fd5e072
  • Minor optimisation to initial cycles allows it to skip a number of breath-first-searches if it is known the graph is bioconnected. 9683f93
  • An exception for when a result could not reached in reasonable time. a40d043
  • Removed an ancient, unused test file 869187f
  • Now the hydrogens must be present - we find that previously that this fragmenter was generating both 'C1CCC(C)C1' and 'CC1CCCC1' which are the same molecule so there is 1 less in the fragment count. The canonicalisation also changed in the counting below. a8e2b36
  • Kekule indole doesn't match (check depictmatch) its self but the aromatic form does. 2cb6ede
  • Molecules had null hydrogen counts. 1ce4f8c
  • Example of what you get if you don't redo the atom types - this is an isolated cases so is minimal effort to change the assertion here. The hydrogen count isn't updated when one of the bonds is removed and so the carbon only has 3 bonds. 00d381b
  • Also need to do the necessaries in the ExhaustiveFragment - next commit will show what you get if you don't do this. It might be more desirable but for now this matches the tests. a769754
  • Hydrogen count gives different canonical SMILES. e1b1874
  • Make it easier to inspect what is going on. 8ead845
  • Non-aromatic bonds between aromatic atoms are now correctly generated. 38094a1
  • Now the hydrogen counts are actually there the canonical fragments are different. a87dd79
  • SMILES parser interrupts the organic subset correctly - when fragments are made hydrogens are not added. Perhaps they shouldn't be but the tests expect this to be the case. 81e4a5f
  • Aromatic SMILES now written if the molecule is aromatic. 8223db6
  • Bonds go after branching not before (it will still parse okay) but this is typical (see specification). 5605d9c
  • Define configurations using double bond stereo elements. 0aa245f
  • Ensure correct hydrogen counts. c25013a
  • Redundant brackets are not produced. 9e75ad9
  • Define tetrahedral stereo chemistry using the stereo-elements. Note, testCisTransDecalin, was previously trying to be specified with cis/trans ring configuration but is now correctly specified with tetrahedral centres. 3ebcb39
  • Utility for cleanly defining tetrahedral chirality. 5010f0d
  • Generator now produces literal output - note this means the non-aromatic example are 6 singly bonded aliphatic carbons with 1 hydrogen (as specified). d68ff43
  • Atom typing doesn't preserve aromatic flags on atoms, doing aromaticity perception and then atom typing again (addImplicitHydrogens) removes the flags on the atoms. Moving the implicit hydrogen addition gives the correct output. 8cb9483
  • Bond symbol only written on opening of ring not closure. ad6b7dd
  • Redundant brackets are no longer included in generated SMILES. a1a569f
  • fixCarbonCount gave the wrong hydrogen count on one of the atoms. 91b44a6
  • Ring numbering changed - numbering now starts from the first ring opening instead of the first ring closure. ce59924
  • Bracket atoms must specify the number of hydrogens. 60c7ccf
  • Bond symbol is now only writen on the ring open - recomeneded by OpenSMILES. acb70c5
  • Unknown atom '*' doesn't need brackets - http://www.daylight.com/daycgi/depict?2a 8487de6
  • Intercept pseudo atoms and default null hydrogen counts to 0. The CDK will leave the nulls in place for pseudo atoms so we need this special exception, ded2e30
  • Remove errors due to hydrogen counts being null. 68e654f
  • Actually we also want to include the isotope number if there was no major isotope found. fdeb81d
  • Using Beam to generator the SMILES. ffb129e
  • Updating API calls in SmilesGeneratorTest - no test assertions corrected. 49c5ac9
  • Don't set mass for default isotopes - incompatibilities between what CDK/SMILES define. 4eb0643
  • API changes in other modules required to build - will come back and verify these work as expected later. b96d89d
  • Chiral SMILES = isomeric SMILES - this is now configured by the constructor and there is a single createSMILES method. c9cfcd0
  • Removing existing SmilesGenerator implementation. dab7341
  • Hotfix on beam, ring number 0 .. 99 inclusive - there are 100 rings not 99, doh. 924c292
  • Removing testing of atom types on molecules which no longer exist. 91d812d
  • Remove extra '}' from javadoc. 004ff4f
  • Added a new contributor d4740f1
  • try-catch substituted by @test(expected=InvalidSmilesException) in SmilesParserTest.java (= junior task 16). b4d5758
  • The ring info is not really being used anymore, so removed it aa58610
  • Use the RingSearch instead of the SpanningTree 390144d
  • Updated the .classpath for BEAM 0.3 1bb2324
  • Copyright header for EdgeShortCycles.java 34670dc
  • Copyright header for VertexShortCycles.java 1c9bd45
  • Check null on MCB constructor. af843cc
  • Edge short cycles. 82d9c20
  • Set of shortest cycles through each vertex. 267c8b4
  • Now the aromatic flags are kept we need to explicitly specify the single bond. c137394
  • Preserving the aromatic flags means this molecule doesn't match unless we strip of the flags and reassign them. 4da618a
  • Aromatic flags preserved on load now. 4b2e2d9
  • Dependency inheritance with maven will making updating library so much easier. d06a20c
  • Update beam so we can parse aromaticity flags to the CDK but maintain the nice bond order assignments. 6b02563
  • Fail fast on Nina's boron ball - I later fix will actually speed up the fingerprints so this case is tribal but for now we have an exception. 1a30dfb
  • Old parser didn't correctly handle, 'CCC[N+]1=c2c(=C(=O)NC1=O)[nH]cn2' (1676), and would match the SMARTS. New parser gives correct structure and no longer matches (verified by Depict Match). a51d66d
  • Bad modules, bad unit tests. I think what has happened is something in the atom typing or the aromaticity changed but the MCS is correct for these molecules seems correct from inspection (spent over an hour looking at this). The original bug report shows there were different counts due to some flags being set/unset etc and I think this is just a case where we handle things a little better. The first molecule here is actually invalid - someone has use the daylight-like aromaticity model to generate a molfile. As the MDL aromaticity model doesn't allow lone pair contribution it's impossible to know which nitrogen on the 6 member ring had the hydrogen. Depending on which nitrogen I choose I get different MCS counts. 44eb4ca
  • Additional PathTools method to fail fast if too many paths are generated. 1e0a8e7
  • Encoder factory for DoubleBondStereoChemistry IStereoElements. 3d1464e
  • Unit tests to check for encoding of DoubleBondStereochemistry 5b138ce
  • Moved the } to the correct line (merge conflict) 0898866
  • Remove IAtomParity, all implementations and tests. 89d1667
  • Remove atom parity from CDKToBeam converter. 2cbd12c
  • Remove atom parity test classes 6184484
  • Remove IChemObject builder instance registration. 83eb846
  • Don't test chem object builders for creation of IAtomParity. 829cca3
  • Replaced IAtomContainer test using IAtomParity with ITetrahedralChirality. 12eba45
  • Copy stereo elements correctly. No longer use the single method which just copied IAtomParity. c69d29c
  • Remove test from AtomContainerManipulator 2a6a6fb
  • Load ITetrahedralChirality instead of IAtomParity with InChI. dc60947
  • Don't use IAtomParities when generating InChIs. 4a90c9d
  • Strip trailing white space to git can match up the next commit. 067a378
  • According to me example data, a nitro nitrogen is N.pl3 c058109
  • Added rudimentary detection of the Co.oh atom type 5374cd7
  • Added rudimentary detection of the chromium atom types 3fdfde1
  • Added mappings for Mo 0cb199f
  • Implemented perception of the Sybyl O.co2 atom type 6c18d6c
  • Nitrate oxygens are not O.co2's b1e1995
  • A NO2 nitrogen is N.pl3 in Sybyl 34bb70c
  • Detection of out-of-ring planarity due to pi-pi interaction is out of scope of our algorithms right now cc818d0
  • We'll never agree on aromaticty... what model does Sybyl use anyway? 2e7f1e7
  • Removing print to stdout. 635980b
  • Can also now apply basic CIP rules to sulfinyl. 88ed0c9
  • Removes the safety checked for implicit tetrahedral neighbours. 90ea496
  • If the central atom is found in the ligand list replace it with an implicit hydrogen. 0f3e73c
  • Additional unit test of CID 42475007 stereoisomer. 84f1ce6
  • Ignore tests for unimplemented features 012101d
  • The assertion was wrong: with or without a double bond, this thing is aromatic; also added testing that the oxygen is not marked as aromatic f6f7550
  • The SMILES parser no longer automatically recognized aromaticity (it's OpenSMILES, not old SMILES), but the test was expecting IS_AROMATIC flags, so added perception e64edbd
  • Removed a test of which the SMILES was outright broken db55579
  • Added three tests which cannot be kekulized 97bada1
  • Updated the atom indices: the new parser has the atom orders in the carbonyl bonds the other way around 7a69c17
  • The new SMILES parser just kekulizes by default b99ab31
  • Atom type perception is no longer part of SMILES parsing 1052cfd
  • Aromatic bromine example is invalid - show that we can still load it but if we load it and kekulise (default) an exception is thrown. Also another unit test shows a valid aromatic bromine is kekulise properly. 216ec99
  • Added the ATASaturationCheckerTest to the test suite 0e93958
  • Readded the SMILES dependency as ATASaturationCheckerTest still uses it 62bd8a3
  • Updated the Eclipse .classpath for Beam 0.2 f3cc6ba
  • Removing valency check tests dependence on smiles module. f7c201b
  • Resolving regressions in fingerprint. f946093
  • Resolving regressions in cdk-core. 175dd42
  • Regressions in cip, sdg, signature, qsarmolecule, qsarbond, group, forcefield, inchi, charges, builder3d only required the dependency. 36366f0
  • The order of atoms in bonds changed and so the expected connection table output is different. c413ae5
  • These SMILES strings are not valid molecules. 3870f7a
  • Resolving regressions in standard - 1 remains in the HOSECodeGenerator. 1178391
  • Resolving regressions in smarts module - 1 remaining which needs more attention. 80144c2
  • Atom typing required by other tests in the SMILES module. e200c68
  • Utility to assign single or double flags to a container. 275ec49
  • Access bonds by their connected atoms. c10acba
  • Order of bonds in the container is now different - check bonds by looking up the connected atoms. a0408e3
  • 'Co' is invalid - show we can load it if needed and that also we can kekulise an acrylic molecule with 'Co' at the front. 7e5cc07
  • Beam will (currently) allow bare 'H', 'D' and 'T'. These are common mistakes and is no extra effort to parse it. Note D and T are auto-corrected to a hydrogen with a mass number '[2H]' and '[3H]'. d653215
  • Many tests which check atom types are present or that aromaticity is set are resolved. 643279a
  • Improved failure message. bcfff03
  • Input was invalide (amine nitrogen) added another test for the invalid case. 34bdd39
  • Bond order sums are now correct (once the molecule is kekulised) 3701865
  • Chiral hydrogens are no long converted to explicit atoms. bd713f4
  • Typos and check bond orders. fbbab6d
  • Kekule molecule is loaded - but we can apply the aromaticity if required. 24b85d7
  • Static import of assertTrue/assertFalse - no other changes. f8681a3
  • Beam gives the configuration in the absolute order of the atoms - tests were updated to account for this. Also atom objects were tested instead of symbols. c7d06e2
  • Convert tetrahedral centres with an implicit hydrogen or lone-pair to the CDK ITetrahedralChirality. b9f1b49
  • Documenting ITetrahedralChirality with the fact that one if the chiral atom is present in the ligands it indicates an implicit hydrogen or lone pair. 9cd20eb
  • Added missing aromatic bond 'C:1C:C:C:C:C1' is not the same as 'C:1:C:C:C:C:C1'. Also added an additionally test to show if loading normally the correct structure 'cyclo-hexane' is obtained. fe698f8
  • Atom typing not longer automatically applied. 9cd2740
  • Convenience method for writing new tests. c7459c1
  • Using Beam to read the SMILES string. 4ca3fef
  • Invert logical condition - new parser will automatically assign bond orders. This also allows us to throw a condition for cases just as invalid pyrrole (c1nccc1) which is one of the existing (and failing) units. a42fc33
  • Encapsulating parser and documenting fields and constructor. 9f3859f
  • Stripping out existing implementation - leaving only the public API. 0800aba
  • Beam version 0.2-SNAPSHOT - some minor tweaks to performance (more likely). e7301b5
  • Reformat to enable better visualisation of changes. eb69a5d
  • Added missing JavaDoc in the render and renderbasic modules 9055d4d
  • Added missing @cdk.githash tags 4779a45
  • Cleared the JavaDoc errors in the datadebug module: mostly explicit @Override and @inheritedDoc, but also a @cdk.githash or two e5d8f49
  • Added missing JavaDoc in the core module 9ede5db
  • Added missing JavaDoc in the cip module 50ac78c
  • Added missing JavaDoc in the atomtype module 9738bd6
  • Resolving regressions in fragment. 932dbed
  • Resolving regressions in qsaratomic. 9dda83d
  • Resolving regressions in qsarionpot. ecc9df4
  • Resolving regressions in reaction. 24bcd58
  • Resolve stuctgen regressions - we no longer need to add implicit hydrogens. fbeca60
  • Resolving regressions in tautomer. 6a01f6a
  • Resolving regressions in SMSD - it seems to be that the existing code won't match a aromatic double bond with an aromatic single bond (i.e. they both have to be single). 96794e1
  • All tests pass in the CIP module but the CIPTool needs updating to handle the implicit hydrogen / lone pair scheme. For the meantime we have a sanity check incase such a case is attempted. f30223d
  • Renamed the atom typer method to say ...Types a9f35c6
  • Added the Beam jars to the Eclipse classpath 5bde50a
  • Only set aromatic flags if the atoms are aromatic AND the bond was implicit. c089627
  • Missing license headers and removal of a duplicate line. bf58ffc
  • Noticeable performance improvement by avoiding invoking the factory each time. We cache a template atom/bond/container and then clone these when we need them. On large datasets provides a noticeable difference. 1266d3d
  • Removing redundant bracket. Generally such an error indicates something deeper is wrong. This test was checking hydrogen counts though so we can safely remove it. 89b8c20
  • Iodine only belongs to the aliphatic and not the aromatic subset - it can not be lower case and this test does not work as intended. A correct example is to have an aliphatic carbon and an aromatic oxygen which make the symbol of Cobalt (non organic subset atom). 068cc92
  • Conversion from Beam to CDK objects. c51d02d
  • Conversion of the CDK object model to Beam. 0076116
  • Including the beam library in the SMILES module. ee1335e
  • Ensure aromaticity is perceived. bf25c14
  • This test (which was passing) actually only has a single H bond acceptor. However we need to give it the a kekule structure - this test is for HBondAcceptor and not kekulisation so we provided the correct SMILES. 30b9ed1
  • An aromatic nitrogen which isn't connected to a pi bond cannot accept a hydrogen bond. 672dd11
  • Iterate over connected bonds. 762e450
  • An atom cannot be both a nitrogen and an oxygen. 906021b
  • Use atom iterator instead of indices. 145e697
  • Use parameterised generics. fe499e2
  • With the SMILES updated only 5 match. 8bf37f2
  • Correcting SMILES errors. b3d3058
  • Use isotope factory to get the deuterium and tritium isotope values. 6149174
  • Apply valence model after fixing hydrogen isotopes. fd1a5a3
  • Thread unsafe warning for PubChemFingerprinter. 105a379
  • Inline structure loading so that hydrogens are not present. b70324c
  • JavaDoc warnings in MACCS and ESTATE fingerprints. e89f583
  • Hydrogens not added for query structure (aromatic bonds in MDL are query structures). 6740a8e
  • Correct test class annotation and remove print to standard out. 9407f22
  • Additional tests didn't take into account the closed walk - last/first vertex the same. 036e014
  • Including additional tests for norborane and correcting spelling of naphthalene. e350b8c
  • Algorithm to compute triple short cycles. This includes the ESSSR and envelope rings. 6b658ab
  • Having the hydrogens present when reading the MDL Mol files means the canonicalisation now changes. One can confirm the SMILES are the same but only written differently. Two tests checked that ring closures could include double bond symbols 'C=1C=CC=CC=1'. This is no longer the case as the canonical form doesn't have the double bonds on the closures but the output is correct. 54c8a3a
  • Missing license headers. e532634
  • Use the encoder when chiral() hash codes are wanted. 97ac91e
  • A tetrahedral element encoder factory for encoding the existing CDK stereo element - ITetrahedralChirality. f4ce49f
  • Utility method for defining parities with a set value. 177ca3a
  • Test for encoding tetrahedral stereo elements - we try configuring different elements and compare the to the hash codes generated for 2D representations (i.e. MDL). 49def80
  • Separate the implicit/explicit versions of butan-2-ol between files. a5481d5
  • Including licence header and javadoc mistakes. 55073db
  • Ensuring hydrogen suppression is correct for double bond and extended tetrahedral (allene) stereo chemistry. 7c20719
  • Testing the suppression of hydrogen atoms and preservation of stereo ecoding. 98beb96
  • Allow creation of the new hash methods which suppress certain atoms. 7f183b4
  • Including atom suppression in perturbed hash generation. e8cfe0b
  • Using the AtomSuppression when generating seeds. 0406d02
  • Modifications allowing atoms to be suppressed when generating atom based hash code. dfb1e69
  • Class for computing atomic hash codes whilst suppressing certain atoms (i.e. hydrogens). This initial commit is a direct copy of the BasicAtomHashGenerator so that the modifications can be shown (next commit). 37ad045
  • Internal API and implementations for choosing which atoms to suppress in the hash code. ff1b68d
  • Internal API for suppressing vertices in the hash. c89cbaf
  • Correct handling of aromatic type - fixed typo and also check bond order is set to 'UNSET' not null. eff967a
  • Write valence if it does not match the MDL implied valence. The option to 'writeQeuryFormatValencies' has been removed - the valence field is a generic field and can always be written. df6471d
  • Use the MDL valence model when reading molecules with the V2000 reader. b15179f
  • MDL valence model. b83e2c8
  • Only add atom mappings when the number of atoms equals that of the query - patch from Roger Sayle. 12beba8
  • The SMARTS pattern '*()*' should not match cyclopropane - example for Roger Sayle. 28d8713
  • Previously disabled test now ignored 599d549
  • Previously disabled test now ignored. 9cdb572
  • Ignoring missing functionality tests to do with isotope handling in CML. 11039c1
  • Correct assertion in atomic tsar descriptor. Assertion was likely incorrect due using the number instead of the index. Using the index '12' to access atom 13 provides the expected result. The assertions were altered to test all added hydrogens - for which two are adjacent to an aromatic system. 3e63a75
  • Test that when the aromaticity is preserve the expected number matches are found. d10db9e
  • Altering assertions for what CDK perceives as aromatic. 900e676
  • There are 142 lines in 'data/smiles/drugs.smi' but only 141 non-empty lines. Technically according to the OpenSMILES specification an empty string is a valid SMILES string but the current IteratingSMILESReader skips empty molecules (a good thing). 63ab46d
  • Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive27. 547014d
  • Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive26. c85c215
  • Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive29. 2815d39
  • Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive28. ccb82bd
  • Reusing utilities for RecursiveTests aea2dcc
  • Overlap cutoff to 1/4th average bond length rather than 1/10th. 3eb97e1
  • Use the average bond length of the molecule. 6f25d47
  • Remove global bond length. dbc1b3d
  • Ensure adding and removing templates still works correctly. 9303613
  • Anonymise template molecules and queries when searching for templates to use for layout. d904fc4
  • Ensure that queries produce the expect match which is dependant on whether Daylight's or CDK's aromaticity model is used. 6367db1
  • Enable/disable automatic atom typing and aromaticity perception. 989b16a
  • Matching with different ring sets produces a different number of matchers. 36602fb
  • Using correct ring set in the test. The daylight SMARTS matcher uses SSSR for this example - we can now choose to do this. 4e23c23
  • Using utilities in existing match method. 5ac6afa
  • Utility methods so we can configure the SMARTS matching easier. 0b203a2
  • Choose which ringlet to use in SMARTS matches. a600a15
  • Configurable short cycles in SMARTS matching. de6140f
  • Choose which ring set to use. 41c0914
  • Resolving JavaDoc errors in SMARTSQueryTool. 62247de
  • Adding atoms/bonds to correct molecules. 8ee288b
  • Timeout to long running test - the test currently errors due to to much GC. Adding a timeout doesn't convert the error to a failure but does allow the 'cdk-standard' suite to run twice as quick. 59b0ab0
  • Longer variable names. de0db9c
  • Dependencies not inherited by ant. b45017b
  • Reformat - removing tab indents. fa56fc0
  • Extended documentation suggesting better approaches to what the exteneded connectivity was traditionally used for. Also explicitly made clear the numbers are not the canoniclal labelling as described in the original article and mearly the exteneded connectivity which is used in computing the lexicographic smallest unique labelling (canonical). e65f9d2
  • Linear search (getAtomicNumber(IAtom)) is fast enough to avoid precomputing the index map. c669ad7
  • As described in the original publication only use the connectivity value of non-hydrogen atoms. 75b20d1
  • Performance improvements to exteneded connectivity computation (morgan numbers). 19a58a0
  • Dependencies not inherited from ioformats. 16e96a3
  • Removing matching state from MDLRXN3000 format. f07a707
  • Allow PubChem Substance to not match PubChem Substances (plural). 5800444
  • Updating PubChem Compound XML format to not match PubChem Compounds (plural). 073aeb4
  • Adapting existing formats to new API. e55f373
  • Moving old API method from interface to abstract super-class. 95050f7
  • Default implementation to easily adapt existing matchers. fba55eb
  • Using new API to guess format f1762d7
  • Replacing mock method with new implementation. 883face
  • Updating abstract test to use new API. 1c8cdb9
  • Required dependencies. 2773154
  • New API for match ChemFormats - instead of checking line-by-line the entire header is passed. The matcher then indicates whether it matched, where it matched and what format it is. f37ff6e
  • Missing licence header. 84edd5d
  • Also customUnused for the full PMD reports eb7e55a
  • Put the PMD unused/migration reports in a separate folder to not overwrite the full reports 7112168
  • Added missing @cdk.githash tags 47917c7
  • Changed the order: now the copyright/license info is at the top of the file 70a3080
  • Fixed the class JavaDoc syntax, correcting a false fix in commit 4fc48372 7961ccf
  • Ignore the UnusedModifier PMD test 0d878b6
  • Missing false stop/period from exception message assertion. 2820228
  • Updated PMD from 5.0.1 to 5.0.4 c56d265
  • Avoid set the atomic number twice. Previously invoking 'new Element("H", null)' would still set the atomic number to '1'. Chaining the constructors so that all fields are assigned in the same places avoids this problem. d8712bb
  • Using the bootstrap seems to cause regressions. 0f410dc
  • Change in exception message. 41d3e68
  • JavaDoc error 9bc6af7
  • Throw an exception if the symbol isn't supported. 2eea2b6
  • Expect exception for a symbol we can't calculate the charge for - e.g. 'As' in this case. 59a7b98
  • Test that providing a non-chain as a chain throws an exception. b9c169c
  • Throw an exception when a non-chain is provided. Documentation was also updated. b8f14d0
  • Use model builder to place the test alkanes - the AtomPlacer3D is only for placing chains. 9c06749
  • Aromatic selenium can be parsed from SMILES. The current aromaticity perception doesn't consider Se but thats not related to SMILES parsing. a845e03
  • Required decency which is inherited from builder3d. e135f0e
  • Object equality null-safe testing. 68bf41e
  • Tests for ChemObject compare() methods. b377fa9
  • Set atomic numbers when creating an element from a symbol c34d8c7
  • If the proper behavior is to throw an exception, then the unit test should expect it a621922
  • Resolves unit test failure (previous commits). When a double bond is found and there are no 2D coordinates (e.g. unspecified configuration) then return then return '0' for the configuration value. 65d3585
  • Moves assertions on non-null bond length to the same loop. Also puts the getPoint3d() null test before the bond length check. If there is a null atom the bond length check will throw an NPE - better to fail on this. bdb5c39
  • Remove stdout and catch exception to fail the test instead of returning in error. 1a5c89b
  • Using typed list and index (so we can print a better failure message). 49932b9
  • When a reaction references a molecule which is unknown - automatically create one with that Id. This happens when the set of molecules is defined after the reaction. This commit resolve the test in error 'CML2Test. testBug2697568' a6c96d8
  • Exact mass and natural abundance is not preserved on reading/writing CML. As these attributes are boxed primitives they can be null and throw an exception when unboxed by 'assertEquals'. Before checking the values the attributes are not checked for nullity. ac5a3e7
  • Fail test instead of throwing an exception. 06e7224
  • Resolves two long standing errors in 'cdk-extra'. The method should throw an exception when one tries to attach to an invalid atom number (e.g. 7-chlorohexane). The existing error was using the ParseException constructor incorrectly and thus would throw an error. The constructor is for generating error messages to do with syntax. As this is a semantic error the constructor did not function as intended. Simply replacing the use of this constructor with a normal error message resolves the issue. 5bd0f1d
  • Missing dependency for test-qsarcml. c9c965d
  • Run junit as headless. fb59fae
  • SpanningTree documentation 19d5248
  • Added a missing dependency d83c377
  • setting unspecified bonds when generating and InChI bef3903
  • unit test for bug1295 f10979d
  • Bumped the copyright year to 2013 3dadf0d
  • Bumping version number (note new maven style snapshot version numbering), open for changes. 6f8e2b3