ChemScanner library attempts to extract and interpret reactions/molecules information from ChemDraw-related files format: CDX, CDXML, embedded CDX within DOC and DOCX, Perkin Elmer ELN.
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
$ gem install chem_scanner
UI for ChemScanner
- Export to Excel and CML.
- Preview of the original scheme.
- Import directly to Chemotion ELN
- Add comment for each extracted scheme. These comments would also appear in the export and Chemotion ELN imported molecules/reactions.
To scan/extract a single CDX file
require 'chem_scanner' cdx = ChemScanner::Cdx.new cdx.read('/path/to/cdx/file') # Get array of scanned Canonical SMILES cdx.molecules.map(&:get_cano_smiles) # Get array of scanned Reactions in SMILES cdx.reactions.map(&:reaction_smiles)
There are 5 classes correspond to 5 supported file formats: CDX, CDXML, DOC, DOCX, PerkinELN.
- Access "scanned" molecules
# Molecules - array of scanned molecules cdx.molecules # Get array of scanned Canonical SMILES cdx.molecules.map(&:get_cano_smiles) # Get one molecule molecule = cdx.molecules.first # Number of scanned molecules cdx.molecules.count
- Molecule class:
# Canonical SMILES molecule.get_cano_smiles # Molfile molecule.get_mdl # RDKIT RWMol (https://www.rdkit.org/docs/cppapi/classRDKit_1_1RWMol.html) molecule.rw_mol # Molecule label (bold text near molecule) molecule.label # Molecule text (molecule description) molecule.text # Molecule details (additional information from Perkin Elmer ELN) molecule.details
We are using a ruby-binding version of
RDKit as a dependency of
Reaction consist of 3 groups of molecules:
products. Each group is and array of molecules, which each element is an object of
Molecule class. In addition, some abbreviations belong to the reaction are represented by SMILES. Those could be access via
reaction = cdx.reactions.first # Access extracted structure group reactants = reaction.reactants reagents = reaction.reagents products = reaction.products reagent_smiles = reaction.reagent_smiles
Further manipulation of each group would be similar to
- Reaction properties
Reaction itself has
details properties. All these properties are extracted from the ChemDraw scheme, excep
details field are additional information from
- Reaction step
Some multi-step reactions can also be recognized. If a reaction is a multi-step reaction, the "steps" could be accessed via:
# Get first scanned reaction reaction = cdx.reactions.first # Access first step step = reaction.steps.first step.number # Should be 1 step.description step.time step.temperature # List reagents SMILES step.reagents
Each step has these following properties:
Supported File Formats
CDX, CDXML, PerkinELN usage and API are described above. Their outputs are simple
DOC and DOCX classes are little bit different. Since DOC and DOCX file can contain more than 1 embedded ChemDraw schemes, which each embedded scheme is 1 CDX scheme.
ChemScanner attempts to extract all of them and put into one
Hash map, called
require 'chem_scanner' doc = ChemScanner::Doc.new doc.read('/path/to/doc/file') doc.cdx_map.each do |key, cdx| puts cdx.reactions.map(&:reaction_smiles) end # Access all molecules in all CDXs doc.molecules.map(&:get_cano_smiles) # Access all reactions in all CDXs doc.reactions.map(&:get_cano_smiles)
DOCX is a bit different,
ChemScanner can extract the CDX together with its preview image within the documents.
require 'chem_scanner' docx = ChemScanner::Docx.new docx.read('/path/to/docx/file') docx.cdx_map.each do |key, cdx_info| # Get the CDX scheme cdx = cdx_info[:cdx] puts cdx.reactions.map(&:reaction_smiles) # Preview images, used for ChemScanner UI img_ext = cdx_info[:img_ext] # Could be '.png', '.emf' img_b64 = cdx_info[:img_b64] # Base64 encoded of image end # Access all molecules in all CDXs docx.molecules.map(&:get_cano_smiles) # Access all reactions in all CDXs docx.reactions.map(&:get_cano_smiles)
After checking out the repo, run
bin/setup to install dependencies. Then, run
rake spec to run the tests. You can also run
bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run
bundle exec rake install. To release a new version, update the version number in
version.rb, and then run
bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the
.gem file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/ComPlat/chem_scanner. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
The gem is available as open source under the terms of the GNU AGPLv3 License.