Skip to content
bow edited this page Jun 29, 2012 · 5 revisions

The purpose of this Wiki is to serve as a 'sandbox' for the future SearchIO documentation. For now, I will try to note the important points that should later be covered in the real documentation.

Outline:

Main SearchIO Intro

  • Main functions: parse, read, to_dict, index, index_db, convert, write
  • Supported programs and formats
  • Object model: QueryResult, Hit, HSP
  • Other things to notice (coordinate base (0 instead of 1), dna vs protein search coordinates, ID and Desc behavior, shallow vs deep copies)
  • Format-specific usage guide
  • Contributing your parser

Format-specific documentation

  • Attributes of common objects (e.g. seq_len or acc)
  • Custom parsing / writing behavior (e.g. parsing custom columns in blast-tab, writing PSL files with headers)

For each of the formats + programs below, we should mention:

  • Which program flavor and version do we support (or we're sure is supported)
  • The custom attributes present in that format (e.g. HSP.z_score in fasta-m10 or HSP.cluster_num in hmmer-tab)
  • Custom behavior not covered by the main API (e.g. custom blast-tab nonstandard column parsing)
  • Gotchas / tricks that could be useful

BLAST

  • blast-xml ~ dealing with blast-generated Query and/or Hit IDs
  • blast-tab ~ parsing + writing files with custom column order
  • blast-tabc ~ writing files with custom column order
  • blast-text ~ the extent of Biopython's support

BLAT

  • blat-psl ~ reading files with track lines (used in ensembl), dealing with non-dna sequence searches
  • blat-pslx

FASTA

  • fasta-m10 ~ dealing with custom output (full alignment display), dealing with E2() values

HMMER

~ general note on hmm{from,to} and ali{from,to} coordinates * hmmer-text ~ dealing with alignment annotations * hmmer-tab * hmmer-domtab (hmmscan-domtab, hmmsearch-domtab, phmmer-domtab)

Clone this wiki locally