Skip to content

NCI Yeast Anticancer Drug Screen: the large scale data used in "Large-Scale Graph Mining using Backbone Refinement Classes"

Notifications You must be signed in to change notification settings

amaunz/data-yeast-ac

Repository files navigation

http://dtp.nci.nih.gov/yacds/download.html
und
http://dtpsearch.ncifcrf.gov/FTP/OPEN_BABEL_SMILES.TXT

Created two activity-files:
A structure was classified as active, if
- at least one strain was >= 0.7 (ac_onestrain.class)
- all strains were >= 0.7 (ac_allstrains.class)

|-- README                             this file

|-- ac_alt.smi                         compounds in SMILES format
|-- ac_alt_allstrains.active           sfgm compatible input format (actives)
|-- ac_alt_allstrains.class            lazar and libfminer compatible input format (all)
|-- ac_alt_allstrains.inactive         sfgm compatible input format (inactives)
|-- ac_alt_onestrain.active
|-- ac_alt_onestrain.class
|-- ac_alt_onestrain.inactive

|-- remove.rb                          remove trees without edges in gSpan file
|-- replace.rb                         restore correct numbering from ac.smi
|-- report.rb                          report trees without edges in gSpan file
|-- sdf2gsp.pl                         convert SDF Molfile to gSpan input format

How to create gSpan input file format (perl and ruby required):

1) Remove numbers from ac.smi file to obtain 1 SMILES per row in file ac.no_nr.smi
2) do 'babel -d -ismi ac.no_nr.smi -osdf ac.sdf' to convert to MOL format
3) do 'perl sdf2gsp.pl < ac.sdf > ac.no_nr.gsp' to convert to gSpan input format
4) do 'ruby replace.rb ac.no_nr.gsp ac.smi > ac.f.gsp' to restore correct numbering from .smi file
[ optional: do 'ruby report.rb ac.f.gsp' to report trees without edges ]
5) do 'ruby remove.rb ac.f.gsp > ac.gsp' to remove trees without edges
[ optional: do 'ruby report.rb ac.gsp' to report trees without edges ]


POSTPROCESSED THE DATA (IMPORTANT)!
COMMENTS: 
- Alternative Database (therefore the 'alt' in filenames).
- See file 'bad.txt' for removed molecules and reasons for removal.

About

NCI Yeast Anticancer Drug Screen: the large scale data used in "Large-Scale Graph Mining using Backbone Refinement Classes"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published