Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Gosu Other
branch: master

This branch is 1 commit ahead of amaunz:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
folds
folds2
src
README
bad.txt
bloodbarr.alt.actives
bloodbarr.alt.class
bloodbarr.alt.csv
bloodbarr.alt.gsp
bloodbarr.alt.inactives
bloodbarr.alt.smi
nctrer.actives
nctrer.class
nctrer.csv
nctrer.gsp
nctrer.inactives
nctrer.smi
remove.rb
replace.rb
report.rb
sdf2gsp.pl
yoshida.actives
yoshida.class
yoshida.csv
yoshida.gsp
yoshida.inactives
yoshida.smi

README

This repository contains data from:

Ulrich Rückert and Stefan Kramer, Optimizing Feature Sets for Structured Data.
In Stan Matwin and Dunja Mladenic, editors, 18th ECML. Springer, 2007.
http://www.springerlink.com/content/x22202075j328w46/

Target classes: active (1) and inactive (0)

|-- README                       this file

|-- .smi                         lazar and libfminer compatible input format (compounds in SMILES format)
|-- .class                       lazar and libfminer compatible input format (target class)
|-- .active                      sfgm compatible input format (target class active)
|-- .inactive                    sfgm compatible input format (target class inactive)
|-- .gsp                         gSpan compatible input format (compounds)

How to create gSpan format from SMILES format (perl and ruby required):

1) Remove numbers from filename.smi file to obtain 1 SMILES per row in file filename.no_nr.smi
2) do 'babel -d -ismi filename.no_nr.smi -osdf filename.sdf' to convert to MOL format
3) do 'perl sdf2gsp.pl < filename.sdf > filename.no_nr.gsp' to convert to gSpan input format
4) do 'ruby replace.rb filename.no_nr.gsp filename.smi > filename.f.gsp' to restore correct numbering from .smi file
[ optional: do 'ruby report.rb filename.f.gsp' to report trees without edges ]
5) do 'ruby remove.rb filename.f.gsp > filename.gsp' to remove trees without edges
[ optional: do 'ruby report.rb filename.gsp' to report trees without edges ]


POSTPROCESSED THE DATA (IMPORTANT)!
COMMENTS:
- Alternative Database (therefore the 'alt' in filenames).
- See file 'bad.txt' for removed molecules and reasons for removal.

Something went wrong with that request. Please try again.