1.5.4 Release Notes

The main features of release include major updates to aromaticity, stereo-chemistry, isotopes, SMARTS and SMILES handling. A huge thanks to Egon and Stephan for reviewing so many patches, particularly at this busy time of year. There are a lot of bug fixes which have reduced information loss between formats and fixed several regressions.

Download available on Sourceforge.

Feature Summary
Test Status
Reviewers
Authors
Full Change Log

Feature Summary

This section summarises the new features in this release, some aspects will be expanded on in detail or more blog posts (see. efficient bits). I still need to update some of the documentation but the following sections give a summary and example usage.

SMILES

Rewrite of SMILES parser. Tetrahedral and double-bond stereo chemistry is now parsed and can be output by the SMILES generator. The SMILES parser no longer stores molecules internally allowing the same parser to be used across multiple threads. See also - [New SMILES parser behaviour](http://efficientbits.blogspot.co.uk/2013/12/new-smiles-behaviour-parsing-cdk-154.html).

IChemObjectBuilder blr    = SilentChemObjectBuilder.getInstance();
SmilesParser       smipar = new SmilesParser(blr);

for (String line : lines)
    IAtomContainer container = smipar.parseSmiles(line);

Atom-types are no longer set automatically by the parser. Implicit hydrogen counts now follow the SMILES specification and provide improve conversion.

IChemObjectBuilder     builder = SilentChemObjectBuilder.getInstance();
SmilesParser           sp      = new SmilesParser(builder);
SmilesGenerator        sg      = new SmilesGenerator();
InChIGeneratorFactory  igf     = InChIGeneratorFactory.getInstance();
        
IAtomContainer m = sp.parseSmiles("[O]");
System.out.println(sg.create(m));                         // [O]
System.out.println(igf.getInChIGenerator(m).getInchi());  // InChI=1S/O
        
// configure atom types
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(m);
CDKHydrogenAdder.getInstance(builder).addImplicitHydrogens(m);

System.out.println(sg.create(m));                         // O ([OH2])
System.out.println(igf.getInChIGenerator(m).getInchi());  // InChI=1S/H2O/h1H2

Aromatic molecules are automatically kekulised.

IAtomContainer m = smipar.parseSmiles("[nH]1cccc1"); // read as 'N1C=CC=C1'

If a molecule could not have a kekulue structure assigned without changing the formula an exception is thrown

IAtomContainer m = smipar.parseSmiles("c1cccc1"); // error!

Turning off kekulise allows parsing of problematic molecules but is not recommended as they will likely cause problems in other procedures.

smipar.kekulise(false);
IAtomContainer m = smipar.parseSmiles("c1cccc1");

Aromatic specification in the input is preserved on the CDK ISAROMATIC flag even if the molecule is not aromatic.

IAtomContainer m1 = smipar.parseSmiles("c1ccccc1"); // 6 aromatic atoms
IAtomContainer m2 = smipar.parseSmiles("c1ccc1");   // 4 aromatic atoms (note not really aromatic)

The SMILES generator has also be rewritten allowing the generation of different types of SMILES output. The following definitions are used to distinguish the types of output. These follow the Daylight SMILES specification and are used by other toolkits (e.g. [Marvin](http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#SMILES).

Name	Canonical	Stereochemistry	Isotope
Generic	no	no	no
Isomeric	no	yes	yes
Unique	yes	no	no
Absolute	yes	yes	yes

The new paradigm is to use one of static utilities to create a SmilesGenerator instance. The default instance (`new SmilesGenerator()`) produces generic SMILES. The method `createSMILES` has been deprecated and replaced with [`create(IAtomContainer)`](http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/smiles/SmilesGenerator.html) which can throw a checked exception.

// non-canonical, no stereochemistry or isotope information
SmilesGenerator smiggen = SmilesGenerator.generic();

// non-canonical, includes stereochemistry and isotope information
SmilesGenerator smiigen = SmilesGenerator.isomeric();

// canonical, no stereochemistry or isotope information
SmilesGenerator smiugen = SmilesGenerator.unique();

// canonical, includes stereochemistry and isotope information
SmilesGenerator smiagen = SmilesGenerator.absolute();

The labelling of the unique SMILES (using an equitable partition) has been optimised and it now very fast. Absolute SMILES currently uses the InChI to canonicalise the SMILES string. There are some problems due to InChI (correctly) delocalising charges in pi-bonding systems that in SMILES have distinct representations.

IMPORTANT: The canonical SMILES generated are different from previous versions of the CDK (1.4.x). There may be still be differences in future developer releases but this will be indicated in release notes.

Generated SMILES do not include lower-case aromatic symbols. This eliminates problems related to interrupting aromatic systems when reading. Canonical SMILES are written as the same resonance form and general the storage of structures as aromatic SMILES should be avoided. One possibly valid use for aromatic SMILES is the generator of string-fragment fingerprints (LINGOs). If you wish to write SMILES with lower-case symbols the an _aromatic_ generator can be created as follows.

SmilesGenerator smigen = SmilesGenerator.unique()
                                        .aromatic();

The generator uses the ISAROMATIC flags present on the atoms and bonds, aromaticity is not re-perceived and for correct output the same aromaticity model (preferably Daylight's for SMILES) should applied before generation. Please see the section for more information on the new aromaticity API.

List<IAtomContainer> ms = ...; // molecules

Aromaticity arom = new Aromaticity(ElectronDonation.daylight(),
                                   Cycles.allOrVertextShort());
for (IAtomContainer m : ms) {
    arom.apply(m);
    smigen.create(m);
}

The method, `create(IAtomContainer, int[])` now provides access to the output order of SMILES. This allows persistence of auxiliary atomic meta-data with SMILES output. The following example demonstrates how to append 2D coordinates to a SMILES output.

IAtomContainer  m      = ...; // a molecule with 2D depiction
SmilesGenerator smigen = SmilesGenerator.generic();

int[]  ord = new int[m.getAtomCount()];
String smi = smigen.create(m, ord);

// build auxiliary data
Point2d[] coords = new Point2d[ord.length];
for (int i = 0; i < coords.length; i++)
    coords[ord[i]] = m.getAtom(i).getPoint2D();

// suffix SMILES with coordinates (we use a string here but it could be encoded in binary)
smi += " " + Arrays.toString(coords);

Stereochemistry

Tetrahedral centres can now have an implicit part (hydrogen or lone-pair). Here is an example of obtaining the labelling on a sulfoxide.

IAtomContainer m = smipar.parseSmiles("CCC[S@](C)=O");
for (IStereoElement se : m.stereoElements()) {
    if (se instanceof ITetrahedralChirality) {
        // CIP_CHIRALITY.S
        CIP_CHIRALITY label = CIPTool.getCIPChirality(mol,
                                                      (ITetrahedralChirality) se); 
    }
}

Stereochemistry can now be preserved between MDL Mol V2000, InChI and SMILES. When reading a mol file stereo elements are automatically created. If you wish to encode stereo elements from another format the [`StereoElementFactory`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/stereo/StereoElementFactory.html) can be used.

PDBReader      pdbr = new PDBReader(...);
IAtomContainer m    = pdbr.read(new AtomContainer());
m.setStereoElements(StereoElementFactory.using3DCoordinates(m)
                                        .createAll());
smigen.create(m);

The factory can create elements from 3D and 2D depictions (wedge/hatch bonds). The stereocenters are found in structures using the new [`Stereocenters`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/stereo/Stereocenters.html) API. `Stereocenters` identifies atoms that can support tetrahedral or double-bond stereo chemistry. It can also be used to verify stereocenters, consider `[C@H](C)(C)O`.

Atom that can support stereochemistry are found using the same rules as detailed in the InChI Technical Manual. The detection of stereocenter topology uses the method described by Razinger et al. A huge thanks to Tim Vandermeersch for helping explain some intricacies of the method. The currently implementation isn't yet complete and will only find some symmetric stereocenters but the coverage is relatively good. Future releases will aim to complete the detection in this class.

IAtomContainer m             = smipar.parseSmiles("C(C)(CC)N");
Stereocenters  stereocenters = Stereocenters.of(m);
for (int i = 0; i < m.getAtomCount(); i++)  {
    stereocenters.elementType(i); // Tetraherdal, Double-bond, etc.
    stereocenters.stereocenterType(i); // True, Para, Non, Potential     
    stereocenters.isStereocenter(i); 
}

Tetrahedral and double-bond stereochemistry is also now depicted by the structure diagram generator. Wedge/hatch bonds can be added to a pre-generated depiction (e.g. stored with SMILES) using the [`NonPlanarBonds`](https://github.com/cdk/cdk/blob/master/src/main/org/openscience/cdk/layout/NonplanarBonds.java) - this class is currently package-private.

#### Aromaticity

Previously the CDK provided three aromaticity implementations as AromaticityCalculator, CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector. Primarily the CDKHueckelAromaticityDetector was used internally within the library whilst new users would generally use AromaticityCalculator with it having the most intelligible name.

All three class are now deprecated with the functionality unified under a single Aromaticity class. It is well known that apart from "smelling nice" (DW) aromaticity is a bit of a loose concept in chemical information processing with difference opinions. The basic approach generally have two algorithmic differences, which atoms can contribute (delocalise) p electrons to the system and what rings (cycles) do we check Hückel's rule (4n+2). The new API makes these decisions explicit and allow the choice of ElectronDonation for each atom is and to what Cycles should we check this donation. The usage of the API will be smoother in future releases making it simpler to use a predefined combination of parameters.

The current electron donation models are: - `piBonds()` - a simple electron donation model which allows atoms next to cyclic pi bonds to donate a single p electron. Atom types are not required but bond orders should be specified. - `cdk()` - mirrors the model of the `CDKHueckelAromaticityDetector`, exocyclic groups (e.g. in quinone) are not considered. This model requires atom types. - `cdkAllowingExocyclic()` - mirrors the model of the `DoubleBondAcceptingAromaticityDetector`, exocyclic groups (e.g. in quinone) are allowed. Ketone's contribute 1 p electron, quinone is aromatic, guanine and caffeine are not. This model requires atom types. - `daylight()` - a model similar to that used by [Daylight Chemical Information Systems](www.daylight.com). Atom types are not required but implicit hydrogen and bond orders must be assigned. Ketone's contribute 0 p electrons, quinone is not-aromatic, guanine and caffeine are.

A good set of cycles to use is `allOrVertexShort`, this set of cycles efficiently provides every possible cycle in the structure (e.g. large rings in azulene and porphyrin). If the computation of cycles was intractable it _falls back_ to using a unique cycle set.

Aromaticity arom = new Aromaticity(ElectronDonation.daylight(),
                                   Cycles.allOrVertexShort());
for (IAtomContainer m : ms) {
    arom.apply(m);
}

The aromaticity of the deprecated CDKHueckelAromaticityDetector can mimicked with the following configuration.

Aromaticity arom = new Aromaticity(ElectronDonation.cdk(),
                                   Cycles.cdkAromaticSet());
for (IAtomContainer m : ms) {
    AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(m); // required for model
    arom.apply(m);
}

The new aromaticity detector will clear all existing aromatic flags on a structure. This functionally allows us to normalise correctly and ensure the same aromaticity model is applied to all structures.

// m1 has aromatic flags set from input, m2 does not
IAtomContainer m1 = smipar.parseSmiles("c1ccc1");
IAtomContainer m2 = smipar.parseSmiles("C1=CC=C1");

// m1 still has aromatic flags set, m2 does not
CDKHueckelAromaticityDetector.detectAromaticity(m1);
CDKHueckelAromaticityDetector.detectAromaticity(m2);

// m1 and m2 have no aromatic flags set
arom.apply(m1);
arom.apply(m2);

### (Sub)-structure Matching

A new [`Pattern`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/isomorphism/Pattern.html) API provides matching and mapping of structure queries. It provides a mapping of all matches (`matchAll(IAtomContainer)`) or the fist match (`match(IAtomContainer)`) and a conditional for convenience (`matches(IAtomContainer)`). The primary idea is to make the code descriptive but I'll demonstrate some other desirable features. The mappings are provided as a permutation of the query vertices represented as a fixed size array.

There are currently two implementations, Ullmann and VentoFoggia. The other matchers UniversalIsomorphismTester and SMSD could (will) be adapted to use the API.

The following example counts the number of times a substructure query was found in a list of targets.

IAtomContainer query   = ...;
Pattern        pattern = Ullmann.findSubstructure(query);

int hits = 0;
for (IAtomContainer target : targets)
    if (pattern.matches(target))
        hits++;

The mappings returned by the pattern are lazy and generated as needed. Implementing `Iterable` we can loop over the mappings directly.

for (int[] mapping : pattern.matchAll(target)) {
    
}

Utilising Guava utilities we can limit and count the number of mappings.

// first 5 matches
FluenentIterable.from(pattern.matchAll(target))
                .limit(5);

// does the pattern match exactly 5 times 
if (FluenentIterable.from(pattern.matchAll(target))
                    .limit(5)
                    .size() == 5) {
}

This lazy generation is useful for matching stereochemistry (on by default) as we can find the fist match which has the correct configuration. Double-bond configurations are also matched.

Ullmann.findSubstructure(smipar.parseSmiles("[C@H](C)(CC)O"))
       .matches(smipar.parseSmiles("[C@@H](CC)(C)O"));         // true! (note neighbour order)
Ullmann.findSubstructure(smipar.parseSmiles("[C@H](C)(CC)O"))
       .matches(smipar.parseSmiles("[C@H](CC)(C)O"));          // false! (note neighbour order)

This will likely be cleaner in the next release but we can _filter_ for the unique atom matches using the [`UniqueAtomMatches`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/isomorphism/UniqueAtomMatches.html) predicate.

for (int[] mapping : FluentIterable.from(Ullmann.findSubstructure(query)
                                                     .matchAll(target))
                                   .filter(new UniqueAtomMatches())) {

}

The real power of defining the `Pattern` API is we can optionally add pre-screens to queries. A simple heuristic was already provided by the `UniversalIsomorphismTester` but new API makes the approach much more flexible. The following example shows how we can define a pattern which intercepts the the match and checks the fingerprints first. Unfortunately the CDK fingerprint generation is now much slower than the structure matching but it demonstrates a proof of concept for future releases with faster fingerprint implementation.

final Pattern pattern = new Pattern() {
    @Override public int[] match(IAtomContainer target) {
        if (!checkFingerprints(query, target))
            return new int[0];
        return Ullmann.findSubstructure(query).match(target);
    }
};

for (IAtomContainer target : targets) {
    if (pattern.matches(target)) {
        ...
    }
}

#### SMARTS

The CDK SMARTS functionality has been optimised and extended. Firstly, the matchers have been updated to use a SMARTSInvariants class which is attached to queries before matching. Previously there were several initialisation steps (performed by SMARTSQueryTool) and queries could not be used between threads. Isolation of the invariant values into this holder allows us to specify different schemes for matching patterns (i.e. what ring set should '[R6]' check?).

As with structure matching tetrahedral queries are now supported.

IChemObjectBuilder bldr   = SilentChemObjectBuilder.getInstance();
SmilesParser       smipar = new SmilesParser(bldr);
        
SMARTSQueryTool sqt = new SMARTSQueryTool("[C@](C)(N)CC", bldr);
        
sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 1 hit
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 0 hits
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 0 hits

sqt = new SMARTSQueryTool("[C@@](C)(N)CC", bldr);
        
sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 0 hits

sqt = new SMARTSQueryTool("[C@?](C)(N)CC", bldr);
        
sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 1 hit
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 0 hits
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 1 hit

Logical queries can be used on tetrahedral stereochemistry.

sqt = new SMARTSQueryTool("[C!@](C)(N)CC", bldr); // equivalent to @@?
        
sqt.matches(smipar.parseSmiles("[C@H](C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[C@@H](C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("C(C)(N)CC"));       // 1 hit

sqt = new SMARTSQueryTool("[C@,N@@](O)(C)(N)CC", bldr);
        
sqt.matches(smipar.parseSmiles("[C@](O)(C)(N)CC"));    // 1 hit
sqt.matches(smipar.parseSmiles("[C@@](O)(C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("C(O)(C)(N)CC"));       // 0 hits
sqt.matches(smipar.parseSmiles("[N@+](O)(C)(N)CC"));   // 0 hits
sqt.matches(smipar.parseSmiles("[N@@+](O)(C)(N)CC"));  // 1 hit
sqt.matches(smipar.parseSmiles("[N+](O)(C)(N)CC"));    // 0 hits

Double-bond configuration will also matched.

sqt = new SMARTSQueryTool("C/C=C/C", bldr);
        
sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 0 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 0 hits

sqt = new SMARTSQueryTool("C/C=C\\C", bldr);
        
sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 2 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 0 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 2 hits

The query /? is supported but logical operations such as C/C=C!/C or C/C=C/,\C have not yet been implemented.

sqt = new SMARTSQueryTool("C/C=C/?C", bldr);
        
sqt.matches(smipar.parseSmiles("C/C=C/C"));          // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C\\C"));         // 0 hits
sqt.matches(smipar.parseSmiles("C/C(/C)=C(/C)\\C")); // 4 hits
sqt.matches(smipar.parseSmiles("CC=CC"));            // 2 hits
sqt.matches(smipar.parseSmiles("C/C=C(/O)C"));       // 0 hits

Component level grouping has been added.

sqt = new SMARTSQueryTool("[#8].[#8]", bldr);
        
sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 2 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 2 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 2 hits

sqt = new SMARTSQueryTool("([#8].[#8])", bldr);
        
sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 2 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 2 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 0 hits

sqt = new SMARTSQueryTool("([#8]).([#8])", bldr);
        
sqt.matches(smipar.parseSmiles("O"));    // 0 hits
sqt.matches(smipar.parseSmiles("O=O"));  // 0 hits
sqt.matches(smipar.parseSmiles("OCCO")); // 0 hits
sqt.matches(smipar.parseSmiles("O.O"));  // 2 hits

The stereochemistry and component-level grouping will also match correctly in recursive SMARTS.

sqt = new SMARTSQueryTool("[O;D1;$(([a,A]).([A,a]))][CH]=O", bldr);
        
sqt.matches(smipar.parseSmiles("OC=O.c1ccccc1"));  // 1 hit
sqt.matches(smipar.parseSmiles("OC=O"));           // 0 hits

Miscellaneous

The `IstopeFactory` is now an abstract class (**API CHANGE**) with two instances, the [`XMLIsotopeFactory`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/config/XMLIsotopeFactory.html) loads isotopes from the Blue Obelisk Data Repository (BODR) XML whilst the [`Isotopes`](http://cdk.github.io/cdk/1.5/docs/api/index.html?org/openscience/cdk/config/Isotopes.html) uses an optimised binary encoding of the BODR. This not only provides a general improvement in performance but also allows its use on micro architectures. Egon has written an [Isotopes app](http://chem-bla-ics.blogspot.co.uk/2013/10/isotopes-my-very-first-android-app-hits.html) for Andriod that utilises this new functionality. The `Isotopes` also produces immutable `IIsotope` instances and is generally preferable.

IsotopeFactory iso = Isotopes.getInstance();
iso.getMajorIsotope(6)
   .getMassNumber(); // 6 = atomic number

The MDL V2000 reader and writer now use the [MDL Valence Model](http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/valence.html) to define implicit hydrogen counts on non-query structures. A structure with aromatic bonds (MDL Bond Order 4) is considered a query structure when defining hydrogen counts. Tremendous thanks to Roger Sayle for this valuable contribution to the community - [Explicit and Implicit Hydrogens: Taking liberties with valence - Nextmove Software](http://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/).

Hydrogen atoms can be suppressed in the hash-code - this does not modify the structure but allows correct hash encoding.

MoleculeHashGenerator hashgen = new HashGeneratorMaker().depth(4)
                                                        .elemental()
                                                        .chiral()
                                                        .suppressHydrogens()
                                                        .molecular();

// without suppressHydrogens we would generate 2 hash codes for these structures
IAtomContainer m1 = smipar.parseSmiles("C[C@@H](O)[C@H](O)[C@H](C)O");
IAtomContainer m2 = smipar.parseSmiles("C[C@@]([H])(O)[C@]([H])(O)[C@]([H])(C)O");
        
System.out.println(Long.toHexString(hashgen.generate(m1)));
System.out.println(Long.toHexString(hashgen.generate(m2)));

The hash code will also encode IStereoElements (as shown above) allowing identical hash codes from SMILES, InChI and 2D/3D depictions.

A Cycles utility provides a facade of cycles / ring set perception. The Cycles is an optimised representation referring to cycles as paths of vertex indices.

// a set of unique cycles
Cycles cs = Cycles.relevant(m);

Gotcha! - paths (walks) are closed and start an end in the same vertex

Cycles cs = Cycles.all(smipar.parseSmiles("c1ccccc1"));
cs.paths()[0].length; // 7! because the 6 member ring of benzene has the closed-walk {0, 1, 2, 3, 4, 5, 0}

Computing all cycles may throw an intractable exception.

try {
    Cycles cs = Cycles.all(m);
} catch (Intractable e) {
    // fullerene, cyclophane, etc.
}

To obtain a backwards compatible (but inefficient) IRingSet.

IRingSet ringSet = cs.toRingSet();

The bounds of a diagram can now be provided using the Bounds element ensuring atom adjuncts (e.g. hydrogen labels) are not cropped. The bounds will be used by generators in future releases. Alignment of atom symbol labels has been improved with more updates in future.

The GeneralPath API has been updated to allow rending of filled and colored shapes. This is shown in an example of a improved highlight (future releases).

CDK Highlight

Test Status

This release cleans up the number of test failures and errors. The remaining test failures are primarily found in the forcefield and qsarionpot modules due to legacy code.

20,639 tests
18 failures
18 coverage failures
0 errors

Reviewers

 436  Egon Willighagen 
 132  Stephan Beisken 
  48  John May

Authors

   448  John May
    66  Egon Willighagen
     2  Rafel Israels
     2  Jonathan Alvarsson
     1  Diego Pedrosa
     1  Magda Oprian

Full Change Log

Bumping version number for release. 97b8cea
Removing test classes from the main/ source tree. e617299
Resolving a test failure in absolute SMILES and including the absolute SMILES tests in the inchi module (inchi is used to canonicalised the SMILES) 40de01f
Resolving several JavaDoc errors. 827e573
Resolving regressions in ‘cdk-hash’, the way the class need to be mocked was changed. 40dfc65
Removed an artifact file d6766ea
Removed a no longer existing package 6a1b466
Updated the Eclipse .classpath for Beam 0.4 eee3f7e
Don’t ignore tests which now pass. 5ddc419
Fail if a molecule was not atom typed - otherwise this cause a runtime exception in the aromaticity CDK model. d7c629b
The test assertion had the super and sub structure the wrong way round. 9abeacc
Updated SMARTS matcher finger MACCS keys which were missing. As an example bit 56 ([#8R]) was previously missing, this bit tests for an oxygen in a ring. This is the case in the test molecule ‘CC(=O)C1=CC2=C(OC(C)(C)C@@H[C@@H]2O)C=C1’ and the bit is now correctly set. 83669b2
Correcting regression due to a typo. 848617d
Resolving regressions in ‘cdk-hash’ we need to provide a non-empty ‘prev’ array when testing the geometry encoder. 53e2fd8
Cite Noel’s Universal SMILES article in the InChINumbersTool. c30d0ca
Missing header in Absolute SMILES test. 1a8fb6f
Documented the new way to create logical atoms. fb6ea68
A typo picked up during review. 6148f01
The new connectivity checker orders atoms differently. This test depended on the atoms being at a set index which has now changed. 1739e2c
Resolves some regressions in the hash module - when we read from MDL now StereoElements are created. This effectively meant each element was doubled in the hash. To avoid this amplifying affect we don’t modify the ‘next’ value but instead set the ‘next’ value to a modified ‘current’ invariant. No matter how many times the value is set, ‘next[i]’ remains the same. e6b1f42
Directional labels are now assigned to all substituents of double bonds. d2b13b0
Propagate aromatic setting from the SMILES writer the SMILES generator. cb337a4
Generate aromatic SMILES for unit test. 450e8ae
Ring connectivity is now alway set - it is invariant and useful to have when resting ring membership (logical). ca9317c
The aromatic option when generating InChI’s is on a singleton and should be reset. 9c5c0a8
The old API method didn’t throw and exception - this is reverted so existing code doesn’t need to be updated. 38f678f
Testing absolute SMILES on some molecules with stereo-chemistry. 6f5385d
Preliminary tests show we don’t want to pass aromatic flags to InChI. Perhaps this should now be the default as SMILES are kekulised on load. 5089f4a
Correct hydrogen labelling and additional transformations when generating absolute SMILES. 2ce25c9
Obtain the InChI canonical numbering via reflection, the cdk-inchi module must be present to generate SMILES but is not a dependency of the ‘cdk-smiles’ module. 5c6fdd9
Ensure canonical SMILES have the same kekule representation. 9d4fdf0
Access SMILES output order. 8b6c397
Latest version of beam includes performance improvements, custom sort operations (required for Universal SMILES), ability to resonate a structure and generate a canonical kekule representation (i.e. avoid aromatic SMILES), minor bug fixes. 9b0ba64
Rename arbitrary to generic - correct description. 753bb8a
More concise name for creating SMILES strings (most common op) the reaction smiles has a name to avoid using overloading. Existing names still work for downstream usage but are deprecated. f888d6c
Resolving regressions in cdk-fragment. f05f9ab
General SMILES tests are no longer canonical (should not change with canon changes), those which are canonical test for equivalence and did not need changing. ed085f3
Configuration options in the SMILES generator. We now use a different canon procedure which will require updates to tests. 44c2471
Circumvent CDK quirks of leaving nulls around. baf20fe
Beam will now auto-suppress hydrogen counts (library update in a few commits). 17a7826
tidy up imports and stray variable from debuging. 8bc1570
MDL MACCS bit was taking the majority of the time - we can do it faster without using SMARTS. 62fba11
Don’t actually partition the molecule - just count how many component there are. 2922be2
Avoid using SMARTQueryTool we can be more efficient as we know the number of matches we need to find. c48b7da
A filter for unique atom matches. 15602c7
Cache recursive entries - without the caching queries in the MACCS fingerprint runs a lot slower. We use guava cache builder and weak keys allow entries to be GC’d when the AtomContainer has no other reference (i.e. if we are iterating over a file). f237d46
Configuring ring membership (i.e. not size based) by default - this is cheap enough we should just do it. 77cf971
Load and parse SMARTS when the keys are defined. fb22a0a
Different style of comment allows us to skip easier. 4a72e70
Lazy loading of MACCS keys. b6cf711
Use a fresh set of chem objects for each generic read_chemobject test. Took a while to find the bug due to the same container was being used in different formats. f4d21cf
The container being read may already have atoms - we can’t confirm it’s a query and must set the valence correctly. 87a74f5
The primary MDL mol file used by the MDLV2000ReaderTest (i.e. SimpleChemObjectReaderTest) was not valid. The original bug (#58) reported that charges were not round tripped. This is because the charge column was not aligned. Adding the ‘M CHG’ of course resolved the issue but the file was not a valid MDL file in the first place. a6c9d17
Dependencies not inherited (they are on the classpath but not when tested individually). d4195b4
Checked exception now thrown for unset bond orders. 5f985bc
Don’t add null stereo elements (unspecified) to the container. 6716abb
Automatically read stereo configurations from MDL V2000 format. Can be turned of but is on by default. 9c6fef2
A factory to create stereo elements using 2D/3D coordinates. b108c9e
Identify atoms supporting tetrahedral and double bond stereo chemistry. Classify the atoms determining if they have constitutionally different ligands (true stereocenter) or have symmetric ligands (para stereocenters). 2276a45
Deprecating resolve overlaps - it does not work and should be used. It’s usage has been removed from the SDG. 8507887
Read double bond stereo configuration from InChI. 4cc644f
Indicate with a labelling is needed or only the symmetry classes. fe62e9e
SMILES generation needs to throw an exception for invalid bond types. The exceptions were already thrown but were unchecked. Ideally IOException would be best but down stream invocations rely on other parts throwing CDKExceptions so this was used instead. 2006536
Required dependency (not inherited). bd38292
Missing import 89000d8
Turn off kekulisation for only these parsers (note wrong variable name was used). 9e175a2
Including copyright information 6a8eca1
Resolving regressions due to the new connectivity checker. If atoms are removed but the stereo elements (involving the removed atom) are not there can be a null pointer. We now verifying the stereo element atoms are present in the components. Currently stereo elements can not be removed without clearing all stereo centres. d9dbf44
Correct layout to represent double-bond stereochemistry and label bonds to indicate tetrahedral configurations. 46a6c47
Assign numbers to unlabelled atoms (i.e. hydrogens). Beam will handle the correct hydrogen ordering (CDK API makes this difficult). dfc5a6f
Universal SMILES Rule E - don’t start on negatively charged oxygen. 5acf0bb
Deprecating old smiles parser option and writing documentation for the new parser. f224628
Provide the error message from Beam in full. acc0726
Pass correct hydrogen count when generating InChIs + trailing spaces. 4713c25
Classes missed when refactoring the bond length parameter. f7cc6c1
Compute scaling for disconnected structures - this is required to avoid adjunct label collisions when there is > 1 isolated atom. 8945efa
A new rendering element (Bounds) that the renderer will pick up and use to fit the diagram to the required size. A generator can optionally produce a bounds object for the space it requires - when multiple bounds are present the min/max coordinates of all bounds are used. 71de262
Don’t set the scale parameter at the same time as the affine transform. The scale needs to be set before generation but the transform can be set after (required for improved bounding). 588ffc8
Move bond length parameter from BasicBondGenerator to BasicSceneGenerator. From the changes it is easy to see the bond length is used in relation to scaling compounds - this isn’t really a parameter to change but instead is for normalising coordinate systems. 66fd708
More flexible GeneralPath element - allow filled shapes and stroke width specification. Utilities are provided to convert from Java2D API. f61d542
Aligning java 2D and cdk rendering element path data types to use arrays. 5cc07ae
Store winding rule in general path element. 01b8463
Improved stroke caching - we store sub pixels strokes so the graphics context can decide how best to handle. 725e4b8
Smoother joins between lines - could be a parameter. 886dc7d
Correct scaling of line thickness - we need consider the ZoomFactor and Scale. Both of these are included in the affine transform. 03ab7db
Draw lines using double precission coordinates - the graphics context will decided best how to interpret into pixels. cd198c5
Use a rounded rectangle instead of a square to fill in the background of the atom symbol (i.e. obscuring bonds). faa820d
Improved bounding box accuracy around atom symbols. 6cff115
Including invariant refinement in Canon. 36f346a
A utility for ranking indicies by a seperate array of values. The ranking is built around a sorting procedure with the comparisson baked in. 4b676f4
Much faster generation of initial invariants, these don’t distinguish all cases but can easily be substituted for better (more expensive) values later. Using the adjacency list representation to avoid using the AtomContainer results in nearly a 80x speed improvement. This also provides correct values - the previous invariants were mixing partial / formal charges and missing explicit hydrogens from the H count. The new values are also picky and complain at unset values - there have been many bug reports with non-canonical SMILES due to missing hydrogens. Also adding the primes to the class for internal use. 047d0db
Deprecating the canonical labeller and the prime numbers utility. 73debc8
Obtain numbering required by Universal SMILES. 1dcb8f0
Separate auxinfo generation from the numbers and allow options to be provided. f20c555
If the InChI generation provided a warning this is also okay… 7bbd9c0
Ignore aromatic information when converting atoms and bonds. a79ef2d
Don’t set geometric configuration for aromatic bonds. f8efa2b
Don’t thrown an error for low-charge values when deterring electron contribution. b40be9a
fixup failing unit test 3d029e1
Fixup - component grouping in recursive smarts 258503d
High-level tests verifying the smarts queries matches stereo correctly. 014577c
Rather than try to reason about what chirality specification was accepted we can instead match within the atoms. This makes the chirality matching in logical atoms easier to understand and also allows correct interpretation. Negation is now possible and ‘[!@]’ will match anything that is not anticlockwise (equivalent to ‘[@@?]’). 6a918e9
Rename StereoMatchPredicate to StereoMatch. Make the StereoMatcher for SMARTS public (temporary) which allows it to be called from the smiles.smarts.package (where SMARTSQueryTool is). Only apply stereo matching to non-query molecules - ideally we would automatically choose which one to apply but at the moment that would cause a cyclic dependancy. cf3d78f
Match SMARTS geometric (double-bond) stereo queries. 34f6b9a
Create double-bond stereo chemistry for queries. Currently functionality is restricted and doesn’t handle logical bond operators. 01b6173
Don’t use logical bond operations for the stereo/unspecified representation. This is now the same as how the atom-based chirality is represented. f9c1794
Match SMARTS chirality queries. Ideally this would be in with the other predicate but that would cause a cyclic dependency between isomorphism and smarts modules. a79f224
Access the chiralites of a matched atom. This currently pulls the required chiralities out of the matched atom which is a little complicated on logic. It might be better to push the tetrahedral specification in and test matching that way. e5efc79
Correct configuration of chirality. The suffix logic was incorrect and doesn’t match degree but is instead a permutation number. Usage of @1 = @, @2 = @@ is rare but could be added in later. 72207d1
Track correct order of neighbours - ugly but gets the job done. May well rewrite SMARTS in future. f75e026
Formatting SMARTS query visitor. 6a53712
Match component level grouping in recursive SMARTS - fixes bug #1312. Also spotted that the VF algorithm won't match disconnected queries. For now we put in an easy fix which isn't optimal but gets the job done. 9f9b6ad
Parse and test component-level grouping. The parser will also accept larger ring numbers now from 0-99 (conflated commit - sorry). d026e32
Moving the SMARTQueryTool over to the new substructure search. This caused one regression but depict match verified the new value is correct. 8074978
Respect component level grouping in substructure queries. ae037b3
Parse component-level grouping indication. dfe3c65
Despite the conversion - still faster. Also looks like the electron containers / stereo were never partitioned. 75947bc
Algorithm for connected components. 437c9a0
Simplify parsing of recursive SMARTS allowing multiple levels of recursion and easier handling of ring, components (next) and stereo (later). ce6b61d
Cleanup the rest of the logical atom expression (deprecation) and use the create using the new operators. cd92f6e
Less ‘stringy’ programming. Rather than define an ‘op’ as a string create separate objections. This will simplify the stereo-matching (later). 55c48f5
Simplified recursive query - the query is no longer mutable which means the recursive queries are rerun. We may want to reintroduce the cache but it should be ‘safe’ and not depend on the query being mutable. e6325ae
Remove specialised initialisation of recursive/hydrogen atoms. 0d3be50
Format SMARTS atoms that need to be updated. 2532eee
Store the ‘target’ as an invariant required for SMARTS. 8cc06f9
Fixup unit tests 3f5b634
Stereochemistry integration tests. fefeb68
A predicate to filter for (sub)graph-isomorphisms that also match stereochemistry. Implemented as predicate means any (sub)structure mapper can be made stereo sensitive. Currently the functionality doesn’t allow partial mappings (MCS) but could be adapted/extends to do this easily. cce5d38
Correcting typo - caught by a test. 17b26b7
High level tests for substructure matching. We need to put these in the smarts module for now to avoid a cyclic dependency - ‘test-smiles’ already depends on ‘isomorphism’. 394547a
Current front end API for Ullman. 410fd69
Missing annotations. 7fc2544
Implementation of the mapping state for the Ullmann algorithm. 14874a3
The (current) front end access to the Vento-Foggia substructure/isomorphism algorithm. 0b1fc9a
Stream matching states as an iterator. 3f7cb67
Including tests in module suite. 353c192
Concrete implementations of the VF for matching substructures (substate) and identity (state). The test are quite tricky to write but higher level integration tests will be added later. ca35398
Defines the internal state API for (subgraph)-isomorphism mappers and implements an abstract superclass for the Vento-Foggia (VF) algorithm. f3908b8
Atom and bond matchers allow us to move the matching logic outside of the (subgraph)-isomorphism mappers. 1ec06d6
A small optimisation thought of whilst writing up ring perception. If the graph is biconnected we know how many cycles there should be in the MCB. Adding this check doesn't show much improvement on datasets of chemical compounds but has a dramatic affect on fullerenes. 8a0aac0
Do not leave hydrogen counts unset - also load isotope information. 5ad5e0e
Use less temp arrays - makes sense to compute the invariants as we go. It's still useful do ringNumber/ringSize this way though. bf9b180
Only doing the SSSR when ring properties are tested does provide a decent boost but causes a regression in 'pcore'. The SMARTSQueryTool is mutable and one can change the query and or targets. This means the ring properties may now be calculated when they are actually needed. This optimisation is still useful but we would need new classes which restrict how users set the queries. 9f9370d
Okay - let's not depend on the hydrogen counts being set to 0 and instead test the actual bits which have been set. These particular fingerprints cannot be used to substructure filter these molecules. Checking which bits are set documents why the structure key matched the substructure but not the superstructure. e1d2a47
Ignoring ring set configuration tests for now - these were mainly to show there are different values. In reality there are other parts which can change also and providing a 'Daylight' or 'OpenSMARTS' scheme is probably better for usability. dd6825d
Default aromaticity model is now daylight - to use the CDK model it must be set. The test testBasicAmineOnDrugs_cdkAromaticModel causes an error because one or more atom types are not identified by the CDKAtomTypeMatcher. These atom types are likely to be fixed by another pending patch. ae9527e
Don't need to preserve the aromaticity anymore as we can apply the correct model and get the expected matches or non-matches. 54810ce
SMARTS extension matching for hybridisation requires atom typing. 2e52d0e
Use the atom invariants instead of the formal neighbour count. The first requires atom typing. 090d924
Using the new aromaticity model. 59bf1a8
Removing the option which was a work around for different aromaticity models. We now have different models and set that instead. 4f83c28
A tractable set of cycles for aromaticity perception - required as we're going to use the new aromaticity in the SMARTS Query Tool. c4efc8e
Including new test classes in module suite. 4db918c
SMARTS queries complain when bond orders are left unset - they're needed to compute the valence. 486aa5c
Ensuring implicit hydrogen count is not null. The fingerprint molcules may need more attention. 3d61ec0
Simplified identification of unique matches using sets. 38d9fac
Type safe lists and renaming atom mapping method. Atom mapping means given an atom in the query matched which atom in the target. This is not provided but instead only the set of atoms which were matched. 11497fb
Compute the invariant properties using the new class. dc4098c
Set the 'ISINRING' flag on the query container to indicate ring properties are required. c322441
Temporary class to allow access between the two SMARTS class trees. They're currently in separate package and the SMARTSAtomInvariants shouldn't really be a public class but for now we need a way to expose functionality. 251852b
Incorrect bug report (824) changed the meaning of degree in SMARTS. This should be put back but will likely need discussion first. d82a6c7
Clean up of some other matchers which didn't need adapting to the new invariants. 6f44381
Adapting and cleaning up several SMARTS query matchers to use the new invariants. The query value is now stored in the class and not on the query atom - this mirrors how other matchers work and generally makes things cleaner. Access to the values was not used (or currently useful). Serialization was removed - the class did not implement the interface and it looks like the UID was just added by default. Serialization was removed rather than fixed as it is rarely useful and there are much better techniques. Several unused classes were removed - these fullfilled the same functionality as others and it was confusing to have two for the same purpose. The RingAtom/SmallestRingAtom can now be done by using different invariants. The DegreeAtom was incorrect, matching charge, and the ExplicitConnectionAtom fulfilled it's use case. 2d7c628
Access the new atom invariants from every SMARTS atom. If the invariants are required by the matcher they must be set! 7d321e4
Storage of SMARTS invariants which replaces the setting of multiple properties with a single type safe data holder. Different invariants (i.e. for rings) can be computed but for now the default daylight implementation is provided. 61ef552
Using the pattern *Visitor.java ignores adding a cdk module tag to a generated class - avoid this by specifying the full the names of those we actually want to ignore. 7ece0dc
Three classes already have explicit @cdk.module annotation 26c28a4
Move SMARTSAtoms back to the smarts module. 6d7c4ee
Don't repurpose SMARTS query atoms for other uses. We can define our own matchers internally. 1029a5a
Minimising the scope of SMARTS parser classes. These classes are specific to the parser and don't need to be exposed in the public API. The atom matchers in isomorphism.matchers.smarts are still public but could also be hidden behind a factory. This commit removes about 50 classes from the JavaDoc making it easier to find other parts. 8f0926d
Ensure non-negative height is passed when drawing a rectangle - bug #1163. 408f409
Correctly consider the heavy bonds and hydrogen counts. When connectedHeavyAtoms == 2 the full bond list was checked for the aromatic bonds. As the full bond list could contain an explicit hydrogen we should filter it to ensure the aromatic bond checking works correctly. f4a3dbf
Don't check for aromatic atoms before checking charge. Carbon, 'C.plus' cation can also be aromatic and so the charge types should be checked first. 8d36482
Aromaticity requires atom typing. 7769b45
Update header/copyright information 4a410b8
Also include the dependencies of the smiles module 8d0a833
With the fix, the number of tries needed to get the expected accuracy is much less 0b6de9a
Improved random number generation in a range. b4eb6a8
[PATCH] Wrote missing test: Tanimoto on IBitfingerprints d827176
Removed a bad tests: the matcher uses and requires implicit hydrogens explicitly, and the test does not have them; the testFindMatchingAtomType_IAtomContainer_IAtom() does and works find 6c8280c
Deprecating old implementations. d4c923f
Making it clearer how to use the electron donation factory methods and updating the documentation with the new naming. af8b84c
Correcting copyright year. f69a8bc
Correcting typo in benzene. 777a3eb
Including copyright. 7884281
Changing factory method names for the CDK models. The factory does not take any attributes. b3b63c6
Mark cyclic atoms and bonds before removing non-aromatic atoms. 646cf60
Don't remove atoms whilst iterating. f03170a
Improved documentation - any more suggestions welcome. 1a4ac10
Aromaticity perception using configurable electron donation models and cycles. 5207409
Adding a generator for the cycles/rings used by the CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector. 8c6c50e
Use the edge to bond map for a small performance gain. 4a5ff1e
Removing some complexity by using the new edge to bond mapping. aeb0293
Utility to simplify bond lookup from index endpoints of the edge. 3359573
For the simple cyclic pi bonds aromaticity model - don't allow atoms which are next to two cyclic pi bonds. Documentation wording also improved. b193aa9
Not enough electrons - then it cannot participate. 062d720
Correcting coverage annotations. 114eb5b
Including other aromaticity models in the 'standard' module suite. 84433ff
The Daylight model for how many electrons are donated when computing aromaticity. a20922f
Correct check for cyclic bonds. 43352de
Correct name for compound being tested. 3e04187
A simple aromaticity model for MDL/Mol2 file formats. c9a6538
Rename ExoCyclicAtomTypeModelTest.java to ExocyclicAtomTypeModelTest.java 6e16157
Encoding the existing aromaticity model from CDKHueckelAromaticityDetector and DoubleBondAcceptingAromaticityDetector as a new interface ElectronDonation. This separation makes it easier to use different (or a custom) aromaticity model within the CDK. e28cd17
Ensure no modifications whilst iterating. e367ebf
Another NPE test: when the atomicNumber doesn't correspond to a isotope list 0debacc
Fixed unit tests: now that the addEx|ImplicitHydrogens() doesn't change atom type names anymore, we have to do that explicitly 1fb8356
Removing long running test which did not have any assertions. b03b177
Don't change anything except the hydrogen count 92577b0
Added line separator for multiline extra data (when length < 80 chars). 03d56bb
Added additional checks to fix NPE regressions e5a4220
Updated the source URI for updates from the SF platform e5e112c
javadoc 7c78cc8
Added missing testing and some missing JavaDoc 3c5d9bf
Some tuning: store atomicNum as byte, and use the actual file size when reading 965b787
Binary format for isotope data: no smaller jar, seemingly a small performance improvement 5e154d5
Added an index based on the element symbol, further speeding up isotope info lookup; also some further tests for unexpected input 126d85c
Moved the CML-based isotope reading to extra c10e3b3
Use the BODRIsotopes as much as possible, paving the way to move the XMLIsotopeFactory to the extra module (it must not be removed: the BODRIsotopeDumper class depends on it) 2e5c986
Added an abstract class with the shared functionality of BODRIsotopes and the old IsotopeFactory now called XMLIsotopeFactory 694b284
Added a .dat based isotope reader + tool to create a .dat file from the BODR .xml 05bdba5
Removing print to stdout. 06d535c
Use the silent AtomContainerSet, taking down the module's test suite time down from about 20secs to 4secs on my machine ce2c98c
Update test-valencycheck.libdepends cdd817d
Update valencycheck.libdepends b1d639c
Slow but convenient check for cyclic bonds. 31c9bdf
Allow the cyclic vertex searches to test if an edge is cyclic - simplifies some other code, fixes a bug (new RingSearch tests) and a allows us to provide a utility in RingSearch (next commit). 201e9b0
Simplify efficient creation and conversion (to IRing) of the various cycle sets. e5ca2ec
Inlined cyclic molecules for testing the new cycles utility. 536bd94
Truncate input from MCB (last vertex is the same which was not expected by new Path()). 9bf8f73
Incorrect class name in coverage annotations grrr. f344f7d
Path copying done down-stream. fd5e072
Minor optimisation to initial cycles allows it to skip a number of breath-first-searches if it is known the graph is bioconnected. 9683f93
An exception for when a result could not reached in reasonable time. a40d043
Removed an ancient, unused test file 869187f
Now the hydrogens must be present - we find that previously that this fragmenter was generating both 'C1CCC(C)C1' and 'CC1CCCC1' which are the same molecule so there is 1 less in the fragment count. The canonicalisation also changed in the counting below. a8e2b36
Kekule indole doesn't match (check depictmatch) its self but the aromatic form does. 2cb6ede
Molecules had null hydrogen counts. 1ce4f8c
Example of what you get if you don't redo the atom types - this is an isolated cases so is minimal effort to change the assertion here. The hydrogen count isn't updated when one of the bonds is removed and so the carbon only has 3 bonds. 00d381b
Also need to do the necessaries in the ExhaustiveFragment - next commit will show what you get if you don't do this. It might be more desirable but for now this matches the tests. a769754
Hydrogen count gives different canonical SMILES. e1b1874
Make it easier to inspect what is going on. 8ead845
Non-aromatic bonds between aromatic atoms are now correctly generated. 38094a1
Now the hydrogen counts are actually there the canonical fragments are different. a87dd79
SMILES parser interrupts the organic subset correctly - when fragments are made hydrogens are not added. Perhaps they shouldn't be but the tests expect this to be the case. 81e4a5f
Aromatic SMILES now written if the molecule is aromatic. 8223db6
Bonds go after branching not before (it will still parse okay) but this is typical (see specification). 5605d9c
Define configurations using double bond stereo elements. 0aa245f
Ensure correct hydrogen counts. c25013a
Redundant brackets are not produced. 9e75ad9
Define tetrahedral stereo chemistry using the stereo-elements. Note, testCisTransDecalin, was previously trying to be specified with cis/trans ring configuration but is now correctly specified with tetrahedral centres. 3ebcb39
Utility for cleanly defining tetrahedral chirality. 5010f0d
Generator now produces literal output - note this means the non-aromatic example are 6 singly bonded aliphatic carbons with 1 hydrogen (as specified). d68ff43
Atom typing doesn't preserve aromatic flags on atoms, doing aromaticity perception and then atom typing again (addImplicitHydrogens) removes the flags on the atoms. Moving the implicit hydrogen addition gives the correct output. 8cb9483
Bond symbol only written on opening of ring not closure. ad6b7dd
Redundant brackets are no longer included in generated SMILES. a1a569f
fixCarbonCount gave the wrong hydrogen count on one of the atoms. 91b44a6
Ring numbering changed - numbering now starts from the first ring opening instead of the first ring closure. ce59924
Bracket atoms must specify the number of hydrogens. 60c7ccf
Bond symbol is now only writen on the ring open - recomeneded by OpenSMILES. acb70c5
Unknown atom '*' doesn't need brackets - http://www.daylight.com/daycgi/depict?2a 8487de6
Intercept pseudo atoms and default null hydrogen counts to 0. The CDK will leave the nulls in place for pseudo atoms so we need this special exception, ded2e30
Remove errors due to hydrogen counts being null. 68e654f
Actually we also want to include the isotope number if there was no major isotope found. fdeb81d
Using Beam to generator the SMILES. ffb129e
Updating API calls in SmilesGeneratorTest - no test assertions corrected. 49c5ac9
Don't set mass for default isotopes - incompatibilities between what CDK/SMILES define. 4eb0643
API changes in other modules required to build - will come back and verify these work as expected later. b96d89d
Chiral SMILES = isomeric SMILES - this is now configured by the constructor and there is a single createSMILES method. c9cfcd0
Removing existing SmilesGenerator implementation. dab7341
Hotfix on beam, ring number 0 .. 99 inclusive - there are 100 rings not 99, doh. 924c292
Removing testing of atom types on molecules which no longer exist. 91d812d
Remove extra '}' from javadoc. 004ff4f
Added a new contributor d4740f1
try-catch substituted by @test(expected=InvalidSmilesException) in SmilesParserTest.java (= junior task 16). b4d5758
The ring info is not really being used anymore, so removed it aa58610
Use the RingSearch instead of the SpanningTree 390144d
Updated the .classpath for BEAM 0.3 1bb2324
Copyright header for EdgeShortCycles.java 34670dc
Copyright header for VertexShortCycles.java 1c9bd45
Check null on MCB constructor. af843cc
Edge short cycles. 82d9c20
Set of shortest cycles through each vertex. 267c8b4
Now the aromatic flags are kept we need to explicitly specify the single bond. c137394
Preserving the aromatic flags means this molecule doesn't match unless we strip of the flags and reassign them. 4da618a
Aromatic flags preserved on load now. 4b2e2d9
Dependency inheritance with maven will making updating library so much easier. d06a20c
Update beam so we can parse aromaticity flags to the CDK but maintain the nice bond order assignments. 6b02563
Fail fast on Nina's boron ball - I later fix will actually speed up the fingerprints so this case is tribal but for now we have an exception. 1a30dfb
Old parser didn't correctly handle, 'CCC[N+]1=c2c(=C(=O)NC1=O)[nH]cn2' (1676), and would match the SMARTS. New parser gives correct structure and no longer matches (verified by Depict Match). a51d66d
Bad modules, bad unit tests. I think what has happened is something in the atom typing or the aromaticity changed but the MCS is correct for these molecules seems correct from inspection (spent over an hour looking at this). The original bug report shows there were different counts due to some flags being set/unset etc and I think this is just a case where we handle things a little better. The first molecule here is actually invalid - someone has use the daylight-like aromaticity model to generate a molfile. As the MDL aromaticity model doesn't allow lone pair contribution it's impossible to know which nitrogen on the 6 member ring had the hydrogen. Depending on which nitrogen I choose I get different MCS counts. 44eb4ca
Additional PathTools method to fail fast if too many paths are generated. 1e0a8e7
Encoder factory for DoubleBondStereoChemistry IStereoElements. 3d1464e
Unit tests to check for encoding of DoubleBondStereochemistry 5b138ce
Moved the } to the correct line (merge conflict) 0898866
Remove IAtomParity, all implementations and tests. 89d1667
Remove atom parity from CDKToBeam converter. 2cbd12c
Remove atom parity test classes 6184484
Remove IChemObject builder instance registration. 83eb846
Don't test chem object builders for creation of IAtomParity. 829cca3
Replaced IAtomContainer test using IAtomParity with ITetrahedralChirality. 12eba45
Copy stereo elements correctly. No longer use the single method which just copied IAtomParity. c69d29c
Remove test from AtomContainerManipulator 2a6a6fb
Load ITetrahedralChirality instead of IAtomParity with InChI. dc60947
Don't use IAtomParities when generating InChIs. 4a90c9d
Strip trailing white space to git can match up the next commit. 067a378
According to me example data, a nitro nitrogen is N.pl3 c058109
Added rudimentary detection of the Co.oh atom type 5374cd7
Added rudimentary detection of the chromium atom types 3fdfde1
Added mappings for Mo 0cb199f
Implemented perception of the Sybyl O.co2 atom type 6c18d6c
Nitrate oxygens are not O.co2's b1e1995
A NO2 nitrogen is N.pl3 in Sybyl 34bb70c
Detection of out-of-ring planarity due to pi-pi interaction is out of scope of our algorithms right now cc818d0
We'll never agree on aromaticty... what model does Sybyl use anyway? 2e7f1e7
Removing print to stdout. 635980b
Can also now apply basic CIP rules to sulfinyl. 88ed0c9
Removes the safety checked for implicit tetrahedral neighbours. 90ea496
If the central atom is found in the ligand list replace it with an implicit hydrogen. 0f3e73c
Additional unit test of CID 42475007 stereoisomer. 84f1ce6
Ignore tests for unimplemented features 012101d
The assertion was wrong: with or without a double bond, this thing is aromatic; also added testing that the oxygen is not marked as aromatic f6f7550
The SMILES parser no longer automatically recognized aromaticity (it's OpenSMILES, not old SMILES), but the test was expecting IS_AROMATIC flags, so added perception e64edbd
Removed a test of which the SMILES was outright broken db55579
Added three tests which cannot be kekulized 97bada1
Updated the atom indices: the new parser has the atom orders in the carbonyl bonds the other way around 7a69c17
The new SMILES parser just kekulizes by default b99ab31
Atom type perception is no longer part of SMILES parsing 1052cfd
Aromatic bromine example is invalid - show that we can still load it but if we load it and kekulise (default) an exception is thrown. Also another unit test shows a valid aromatic bromine is kekulise properly. 216ec99
Added the ATASaturationCheckerTest to the test suite 0e93958
Readded the SMILES dependency as ATASaturationCheckerTest still uses it 62bd8a3
Updated the Eclipse .classpath for Beam 0.2 f3cc6ba
Removing valency check tests dependence on smiles module. f7c201b
Resolving regressions in fingerprint. f946093
Resolving regressions in cdk-core. 175dd42
Regressions in cip, sdg, signature, qsarmolecule, qsarbond, group, forcefield, inchi, charges, builder3d only required the dependency. 36366f0
The order of atoms in bonds changed and so the expected connection table output is different. c413ae5
These SMILES strings are not valid molecules. 3870f7a
Resolving regressions in standard - 1 remains in the HOSECodeGenerator. 1178391
Resolving regressions in smarts module - 1 remaining which needs more attention. 80144c2
Atom typing required by other tests in the SMILES module. e200c68
Utility to assign single or double flags to a container. 275ec49
Access bonds by their connected atoms. c10acba
Order of bonds in the container is now different - check bonds by looking up the connected atoms. a0408e3
'Co' is invalid - show we can load it if needed and that also we can kekulise an acrylic molecule with 'Co' at the front. 7e5cc07
Beam will (currently) allow bare 'H', 'D' and 'T'. These are common mistakes and is no extra effort to parse it. Note D and T are auto-corrected to a hydrogen with a mass number '[2H]' and '[3H]'. d653215
Many tests which check atom types are present or that aromaticity is set are resolved. 643279a
Improved failure message. bcfff03
Input was invalide (amine nitrogen) added another test for the invalid case. 34bdd39
Bond order sums are now correct (once the molecule is kekulised) 3701865
Chiral hydrogens are no long converted to explicit atoms. bd713f4
Typos and check bond orders. fbbab6d
Kekule molecule is loaded - but we can apply the aromaticity if required. 24b85d7
Static import of assertTrue/assertFalse - no other changes. f8681a3
Beam gives the configuration in the absolute order of the atoms - tests were updated to account for this. Also atom objects were tested instead of symbols. c7d06e2
Convert tetrahedral centres with an implicit hydrogen or lone-pair to the CDK ITetrahedralChirality. b9f1b49
Documenting ITetrahedralChirality with the fact that one if the chiral atom is present in the ligands it indicates an implicit hydrogen or lone pair. 9cd20eb
Added missing aromatic bond 'C:1C:C:C:C:C1' is not the same as 'C:1:C:C:C:C:C1'. Also added an additionally test to show if loading normally the correct structure 'cyclo-hexane' is obtained. fe698f8
Atom typing not longer automatically applied. 9cd2740
Convenience method for writing new tests. c7459c1
Using Beam to read the SMILES string. 4ca3fef
Invert logical condition - new parser will automatically assign bond orders. This also allows us to throw a condition for cases just as invalid pyrrole (c1nccc1) which is one of the existing (and failing) units. a42fc33
Encapsulating parser and documenting fields and constructor. 9f3859f
Stripping out existing implementation - leaving only the public API. 0800aba
Beam version 0.2-SNAPSHOT - some minor tweaks to performance (more likely). e7301b5
Reformat to enable better visualisation of changes. eb69a5d
Added missing JavaDoc in the render and renderbasic modules 9055d4d
Added missing @cdk.githash tags 4779a45
Cleared the JavaDoc errors in the datadebug module: mostly explicit @Override and @inheritedDoc, but also a @cdk.githash or two e5d8f49
Added missing JavaDoc in the core module 9ede5db
Added missing JavaDoc in the cip module 50ac78c
Added missing JavaDoc in the atomtype module 9738bd6
Resolving regressions in fragment. 932dbed
Resolving regressions in qsaratomic. 9dda83d
Resolving regressions in qsarionpot. ecc9df4
Resolving regressions in reaction. 24bcd58
Resolve stuctgen regressions - we no longer need to add implicit hydrogens. fbeca60
Resolving regressions in tautomer. 6a01f6a
Resolving regressions in SMSD - it seems to be that the existing code won't match a aromatic double bond with an aromatic single bond (i.e. they both have to be single). 96794e1
All tests pass in the CIP module but the CIPTool needs updating to handle the implicit hydrogen / lone pair scheme. For the meantime we have a sanity check incase such a case is attempted. f30223d
Renamed the atom typer method to say ...Types a9f35c6
Added the Beam jars to the Eclipse classpath 5bde50a
Only set aromatic flags if the atoms are aromatic AND the bond was implicit. c089627
Missing license headers and removal of a duplicate line. bf58ffc
Noticeable performance improvement by avoiding invoking the factory each time. We cache a template atom/bond/container and then clone these when we need them. On large datasets provides a noticeable difference. 1266d3d
Removing redundant bracket. Generally such an error indicates something deeper is wrong. This test was checking hydrogen counts though so we can safely remove it. 89b8c20
Iodine only belongs to the aliphatic and not the aromatic subset - it can not be lower case and this test does not work as intended. A correct example is to have an aliphatic carbon and an aromatic oxygen which make the symbol of Cobalt (non organic subset atom). 068cc92
Conversion from Beam to CDK objects. c51d02d
Conversion of the CDK object model to Beam. 0076116
Including the beam library in the SMILES module. ee1335e
Ensure aromaticity is perceived. bf25c14
This test (which was passing) actually only has a single H bond acceptor. However we need to give it the a kekule structure - this test is for HBondAcceptor and not kekulisation so we provided the correct SMILES. 30b9ed1
An aromatic nitrogen which isn't connected to a pi bond cannot accept a hydrogen bond. 672dd11
Iterate over connected bonds. 762e450
An atom cannot be both a nitrogen and an oxygen. 906021b
Use atom iterator instead of indices. 145e697
Use parameterised generics. fe499e2
With the SMILES updated only 5 match. 8bf37f2
Correcting SMILES errors. b3d3058
Use isotope factory to get the deuterium and tritium isotope values. 6149174
Apply valence model after fixing hydrogen isotopes. fd1a5a3
Thread unsafe warning for PubChemFingerprinter. 105a379
Inline structure loading so that hydrogens are not present. b70324c
JavaDoc warnings in MACCS and ESTATE fingerprints. e89f583
Hydrogens not added for query structure (aromatic bonds in MDL are query structures). 6740a8e
Correct test class annotation and remove print to standard out. 9407f22
Additional tests didn't take into account the closed walk - last/first vertex the same. 036e014
Including additional tests for norborane and correcting spelling of naphthalene. e350b8c
Algorithm to compute triple short cycles. This includes the ESSSR and envelope rings. 6b658ab
Having the hydrogens present when reading the MDL Mol files means the canonicalisation now changes. One can confirm the SMILES are the same but only written differently. Two tests checked that ring closures could include double bond symbols 'C=1C=CC=CC=1'. This is no longer the case as the canonical form doesn't have the double bonds on the closures but the output is correct. 54c8a3a
Missing license headers. e532634
Use the encoder when chiral() hash codes are wanted. 97ac91e
A tetrahedral element encoder factory for encoding the existing CDK stereo element - ITetrahedralChirality. f4ce49f
Utility method for defining parities with a set value. 177ca3a
Test for encoding tetrahedral stereo elements - we try configuring different elements and compare the to the hash codes generated for 2D representations (i.e. MDL). 49def80
Separate the implicit/explicit versions of butan-2-ol between files. a5481d5
Including licence header and javadoc mistakes. 55073db
Ensuring hydrogen suppression is correct for double bond and extended tetrahedral (allene) stereo chemistry. 7c20719
Testing the suppression of hydrogen atoms and preservation of stereo ecoding. 98beb96
Allow creation of the new hash methods which suppress certain atoms. 7f183b4
Including atom suppression in perturbed hash generation. e8cfe0b
Using the AtomSuppression when generating seeds. 0406d02
Modifications allowing atoms to be suppressed when generating atom based hash code. dfb1e69
Class for computing atomic hash codes whilst suppressing certain atoms (i.e. hydrogens). This initial commit is a direct copy of the BasicAtomHashGenerator so that the modifications can be shown (next commit). 37ad045
Internal API and implementations for choosing which atoms to suppress in the hash code. ff1b68d
Internal API for suppressing vertices in the hash. c89cbaf
Correct handling of aromatic type - fixed typo and also check bond order is set to 'UNSET' not null. eff967a
Write valence if it does not match the MDL implied valence. The option to 'writeQeuryFormatValencies' has been removed - the valence field is a generic field and can always be written. df6471d
Use the MDL valence model when reading molecules with the V2000 reader. b15179f
MDL valence model. b83e2c8
Only add atom mappings when the number of atoms equals that of the query - patch from Roger Sayle. 12beba8
The SMARTS pattern '**()' should not match cyclopropane - example for Roger Sayle. 28d8713
Previously disabled test now ignored 599d549
Previously disabled test now ignored. 9cdb572
Ignoring missing functionality tests to do with isotope handling in CML. 11039c1
Correct assertion in atomic tsar descriptor. Assertion was likely incorrect due using the number instead of the index. Using the index '12' to access atom 13 provides the expected result. The assertions were altered to test all added hydrogens - for which two are adjacent to an aromatic system. 3e63a75
Test that when the aromaticity is preserve the expected number matches are found. d10db9e
Altering assertions for what CDK perceives as aromatic. 900e676
There are 142 lines in 'data/smiles/drugs.smi' but only 141 non-empty lines. Technically according to the OpenSMILES specification an empty string is a valid SMILES string but the current IteratingSMILESReader skips empty molecules (a good thing). 63ab46d
Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive27. 547014d
Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive26. c85c215
Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive29. 2815d39
Demonstrating that the recursive mismatch of the test is due to Daylight/CDK aromatic differences - testRecursive28. ccb82bd
Reusing utilities for RecursiveTests aea2dcc
Overlap cutoff to 1/4th average bond length rather than 1/10th. 3eb97e1
Use the average bond length of the molecule. 6f25d47
Remove global bond length. dbc1b3d
Ensure adding and removing templates still works correctly. 9303613
Anonymise template molecules and queries when searching for templates to use for layout. d904fc4
Ensure that queries produce the expect match which is dependant on whether Daylight's or CDK's aromaticity model is used. 6367db1
Enable/disable automatic atom typing and aromaticity perception. 989b16a
Matching with different ring sets produces a different number of matchers. 36602fb
Using correct ring set in the test. The daylight SMARTS matcher uses SSSR for this example - we can now choose to do this. 4e23c23
Using utilities in existing match method. 5ac6afa
Utility methods so we can configure the SMARTS matching easier. 0b203a2
Choose which ringlet to use in SMARTS matches. a600a15
Configurable short cycles in SMARTS matching. de6140f
Choose which ring set to use. 41c0914
Resolving JavaDoc errors in SMARTSQueryTool. 62247de
Adding atoms/bonds to correct molecules. 8ee288b
Timeout to long running test - the test currently errors due to to much GC. Adding a timeout doesn't convert the error to a failure but does allow the 'cdk-standard' suite to run twice as quick. 59b0ab0
Longer variable names. de0db9c
Dependencies not inherited by ant. b45017b
Reformat - removing tab indents. fa56fc0
Extended documentation suggesting better approaches to what the exteneded connectivity was traditionally used for. Also explicitly made clear the numbers are not the canoniclal labelling as described in the original article and mearly the exteneded connectivity which is used in computing the lexicographic smallest unique labelling (canonical). e65f9d2
Linear search (getAtomicNumber(IAtom)) is fast enough to avoid precomputing the index map. c669ad7
As described in the original publication only use the connectivity value of non-hydrogen atoms. 75b20d1
Performance improvements to exteneded connectivity computation (morgan numbers). 19a58a0
Dependencies not inherited from ioformats. 16e96a3
Removing matching state from MDLRXN3000 format. f07a707
Allow PubChem Substance to not match PubChem Substances (plural). 5800444
Updating PubChem Compound XML format to not match PubChem Compounds (plural). 073aeb4
Adapting existing formats to new API. e55f373
Moving old API method from interface to abstract super-class. 95050f7
Default implementation to easily adapt existing matchers. fba55eb
Using new API to guess format f1762d7
Replacing mock method with new implementation. 883face
Updating abstract test to use new API. 1c8cdb9
Required dependencies. 2773154
New API for match ChemFormats - instead of checking line-by-line the entire header is passed. The matcher then indicates whether it matched, where it matched and what format it is. f37ff6e
Missing licence header. 84edd5d
Also customUnused for the full PMD reports eb7e55a
Put the PMD unused/migration reports in a separate folder to not overwrite the full reports 7112168
Added missing @cdk.githash tags 47917c7
Changed the order: now the copyright/license info is at the top of the file 70a3080
Fixed the class JavaDoc syntax, correcting a false fix in commit 4fc48372 7961ccf
Ignore the UnusedModifier PMD test 0d878b6
Missing false stop/period from exception message assertion. 2820228
Updated PMD from 5.0.1 to 5.0.4 c56d265
Avoid set the atomic number twice. Previously invoking 'new Element("H", null)' would still set the atomic number to '1'. Chaining the constructors so that all fields are assigned in the same places avoids this problem. d8712bb
Using the bootstrap seems to cause regressions. 0f410dc
Change in exception message. 41d3e68
JavaDoc error 9bc6af7
Throw an exception if the symbol isn't supported. 2eea2b6
Expect exception for a symbol we can't calculate the charge for - e.g. 'As' in this case. 59a7b98
Test that providing a non-chain as a chain throws an exception. b9c169c
Throw an exception when a non-chain is provided. Documentation was also updated. b8f14d0
Use model builder to place the test alkanes - the AtomPlacer3D is only for placing chains. 9c06749
Aromatic selenium can be parsed from SMILES. The current aromaticity perception doesn't consider Se but thats not related to SMILES parsing. a845e03
Required decency which is inherited from builder3d. e135f0e
Object equality null-safe testing. 68bf41e
Tests for ChemObject compare() methods. b377fa9
Set atomic numbers when creating an element from a symbol c34d8c7
If the proper behavior is to throw an exception, then the unit test should expect it a621922
Resolves unit test failure (previous commits). When a double bond is found and there are no 2D coordinates (e.g. unspecified configuration) then return then return '0' for the configuration value. 65d3585
Moves assertions on non-null bond length to the same loop. Also puts the getPoint3d() null test before the bond length check. If there is a null atom the bond length check will throw an NPE - better to fail on this. bdb5c39
Remove stdout and catch exception to fail the test instead of returning in error. 1a5c89b
Using typed list and index (so we can print a better failure message). 49932b9
When a reaction references a molecule which is unknown - automatically create one with that Id. This happens when the set of molecules is defined after the reaction. This commit resolve the test in error 'CML2Test. testBug2697568' a6c96d8
Exact mass and natural abundance is not preserved on reading/writing CML. As these attributes are boxed primitives they can be null and throw an exception when unboxed by 'assertEquals'. Before checking the values the attributes are not checked for nullity. ac5a3e7
Fail test instead of throwing an exception. 06e7224
Resolves two long standing errors in 'cdk-extra'. The method should throw an exception when one tries to attach to an invalid atom number (e.g. 7-chlorohexane). The existing error was using the ParseException constructor incorrectly and thus would throw an error. The constructor is for generating error messages to do with syntax. As this is a semantic error the constructor did not function as intended. Simply replacing the use of this constructor with a normal error message resolves the issue. 5bd0f1d
Missing dependency for test-qsarcml. c9c965d
Run junit as headless. fb59fae
SpanningTree documentation 19d5248
Added a missing dependency d83c377
setting unspecified bonds when generating and InChI bef3903
unit test for bug1295 f10979d
Bumping version number (note new maven style snapshot version numbering), open for changes. 6f8e2b3

JavaDoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly