Skip to content
Steve Bond edited this page Oct 25, 2016 · 5 revisions

--extract_regions, -er

Description

Pull out sub-sequences from each record. If using a richly annotated format, like GenBank, features are deleted or adjusted appropriately.

Arguments

Positions ( str )

SeqBuddy uses a custom syntax to specify what regions should be extracted from each sequence, and multiple regions can either be passed in as separate arguments or combined into a single comma-separated string.

Single positions: This is the simplest syntax, consisting of a comma-separated list of each position you want extracted.

e.g., "1,2,4,45,79,305"

Ranges: Use two numbers separated by a colon to designate a range of residues, similar to python list notation. If the left side of the range is left blank, the range starts at the first residue, and if the right side is left blank, the range extends to the final residue. Negative numbers represent the number of residues from the end of the sequence.

e.g., "5:200" "400:-1" ":245"

Every Nth residue: Use a forward slash to indicate ordered, but non-contiguous, sequences. For example, every 10th residue. The left side of the slash can also accept the colon notation to specify a sub-range.

e.g., "1/10" "1:10/100"

Example

Input file: Mle-Panxα12.gb

LOCUS       Mle-Panxα12              403 aa                     UNA 02-JAN-2015
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE
  ORGANISM  . . .
            .
FEATURES             Location/Qualifiers
     CDS             1..403
                     /label="ML25997a"
                     /created_by="User"
     TMD1            28..48
     TMD2            131..151
     TMD3            215..235
     TMD4            299..329
ORIGIN
        1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
       61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
      121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
      181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
      241 gsfvsygsew atslpegdne ttlvkdklfp kmvaceikrw gptgleeeqg mcvlapnvin
      301 qylflilwfa iifciacncl svlfaltklv fvlgsykrll asaflkdelh ykhmffnigt
      361 sgrvllqiva tnvsprvfes imanlatkli aerlkgngkg sv*
//

Usage example 1

Extract a range of residues, using the colon (:) operator.

$: sb Mle-Panxα12.gb -er "11:100"

Output

LOCUS       Mle-Panxα12               90 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..90
                     /created_by="User"
                     /label="ML25997a"
     TMD1            18..38
ORIGIN
        1 gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk ysgsfsedyc
       61 wtqglytike aydlltmnvp ypgvipedmp
//

Usage example 2

Leave the left side of the range empty to begin extracting from the start of the sequence.

$: sb Mle-Panxα12.gb -er ":250"

Output

LOCUS       Mle-Panxα12              250 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..250
                     /created_by="User"
                     /label="ML25997a"
     TMD1            28..48
     TMD2            131..151
     TMD3            215..235
ORIGIN
        1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
       61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
      121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
      181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
      241 gsfvsygsew
//

Usage example 3

Leave the right side of the range empty to extract until the end of the sequence.

$: sb Mle-Panxα12.gb -er "250:"

Output

LOCUS       Mle-Panxα12              154 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..154
                     /created_by="User"
                     /label="ML25997a"
     TMD4            50..80
ORIGIN
        1 watslpegdn ettlvkdklf pkmvaceikr wgptgleeeq gmcvlapnvi nqylflilwf
       61 aiifciacnc lsvlfaltkl vfvlgsykrl lasaflkdel hykhmffnig tsgrvllqiv
      121 atnvsprvfe simanlatkl iaerlkgngk gsv*
//

Usage example 4

Use negative numbers to specify distance from the rear of the sequence.

$: sb Mle-Panxα12.gb -er "100:-100"

Output

LOCUS       Mle-Panxα12              205 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..205
                     /created_by="User"
                     /label="ML25997a"
     TMD2            32..52
     TMD3            116..136
     TMD4            200..205
ORIGIN
        1 ptcierelin ggrvscpdpe tvkpptrvyh lwyqwvpfyf wlaaaafffp yliykhfgvg
       61 dlkpliqmlh npivdegdqn cmaekasmwl fyklnvfmne ntifailtek hrlffivmlv
      121 kvlyliisil alyltdemfh igsfvsygse watslpegdn ettlvkdklf pkmvaceikr
      181 wgptgleeeq gmcvlapnvi nqylf
//

Usage example 5

Pull out all hydrophobic residues from the transmembrane domains by specifying individual residues and ranges

$: sb Mle-Panxα12.gb -er "32,34,35,37,38,42,43" "135,141,151" "215:219,221,222,224:226,228,229,231,233" "305:307,311,312,315,320,322,323,326"

Output

LOCUS       Mle-Panxα12               34 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..34
                     /created_by="User"
                     /label="ML25997a"
     TMD1            1..7
     TMD2            8..10
     TMD3            11..24
     TMD4            25..34
ORIGIN
        1 mvlvlvvvll ivmlvvllii illlliliii lvll
//

Usage example 6

Extract every tenth residue using the forward-slash (/) operator (starting at residue #1).

$: sb Mle-Panxα12.gb -er "1/10"

Output

LOCUS       Mle-Panxα12               40 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..40
                     /created_by="User"
                     /label="ML25997a"
     TMD1            3..4
     TMD2            14..15
     TMD3            22..23
     TMD4            30..32
ORIGIN
        1 klsggkcepp gtlwydncfn hkaiwepwgn alvlhtasig
//

Usage example 7

Extract the first three residues of every ten by mixing the colon (:) and forward-slash (/) operators.

$: sb Mle-Panxα12.gb -er "1:3/10"

Output

LOCUS       Mle-Panxα12              123 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..123
                     /created_by="User"
                     /label="ML25997a"
     TMD1            10..15
     TMD2            40..46
     TMD3            67..72
     TMD4            91..99
ORIGIN
        1 mvigitddgf mftvvgiiys gwtqaydypg tcigrvvkpw yqlaaliylk ppivmaeykl
       61 tifrlfvlyl ylgsfatstt lkmvgptmcv qyliifsvlf vlasaykhsg rtnvimaaer
      121 sv*
//

Usage example 8

Wacky example to illustrate how flexible the syntax is. NOTE! If you use a minus sign (-), make sure there is a space between your quotation mark and the minus. Otherwise python thinks you're including a new flag.

$: sb Mle-Panxα12.gb -er " -5:8/10,45,124" "60:-100,5:42,78,-5" "1/50"

Output

LOCUS       Mle-Panxα12              325 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..325
                     /created_by="User"
                     /label="ML25997a"
     TMD1            25..43
     TMD2            119..139
     TMD3            203..223
     TMD4            287..301
ORIGIN
        1 milsgfkgit pfkgitlddg wdqinrsfmf vlcvlmgtvv rqygdgfkys gsfsedycwt
       61 qglytikeay dlltmnvpyp gvipedmptc ierelinggr vscpdpetvk pptrvyhlwy
      121 qwvpfyfwla aaafffpyli ykhfgvgdlk pliqmlhnpi vdegdqncma ekasmwlfyk
      181 lnvfmnenti failtekhrl ffivmlvkvl yliisilaly ltdemfhigs fvsygsewat
      241 slpegdnett lvkdklfpkm vaceikrwgp tgleeeqgmc vlapnvinqy lfilwacnlt
      301 kykrkdeyfn ilqirvfatk gngks
//

Main Toolkit Pages





Further Reading

Clone this wiki locally