Skip to content

Commit

Permalink
Merge pull request #2 from sacdallago/output_4
Browse files Browse the repository at this point in the history
Add output option 4 for i/o and B/b H/h
  • Loading branch information
BernhoferM committed Aug 10, 2022
2 parents 20988f0 + ff4d90f commit 12af867
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 3 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,11 +129,12 @@ If you run into "out of memory" issues, try reducing the batch size.

## Prediction output

TMbed supports four different output formats:
TMbed supports five different output formats:
- `0`: 3-line format with directed segments.
- `1`: 3-line format with undirected segments.
- `2`: Tabular format with directed segments.
- `3`: Tabular format with undirected segments.
- `4`: 3-line format with directed segments and explicit inside/outside prediction (a mix of format `0` and `1`).

Predicted residue classes are encoded by single letters.\
In 3-line format, every protein is represented by three lines: header, sequence, labels.\
Expand Down Expand Up @@ -168,7 +169,7 @@ In tabular format, every protein is represented by a table containing sequence,
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSiiiiiiiiiBBBBBBBBBBoooooooooooooooooBBBBBBBBBBBiiiiiBBBBBBBBBBoooooooooooooooooooooooooooooooBBBBBBBBBBiiiiiiiiiBBBBBBoooooooooooooooBBBBBBBBiiiiBBBBBBBBooooooooooooooooooooBBBBBBBBiiiiiiBBBBBBBBooooooooooooooooooooooooooBBBBBBBBiiiiiiiiiiBBBBBBBBBBooooooooooooooooooooooooooooBBBBBBBBBBiiiiiBBBBBBBBooooooooooooooooooooooooooooooooooooooooooooooBBBBBBBBBiiiiiBBBBBBBBooooooooooooooooooooooooooooBBBBBBBBBiiiiiiiiiiiiiiiiiiBBBBBBBBBBoooooooooooooooBBBBBBBBBi
```

2. `--out-format=2` and `--out-format=3`
3. `--out-format=2` and `--out-format=3`

- `AA`: Amino acid
- `PRD`: Predicted class label
Expand All @@ -195,6 +196,22 @@ In tabular format, every protein is represented by a table containing sequence,
...
```

4. `--out-format=4`

- `B`: Transmembrane beta strand (IN-->OUT orientation)
- `b`: Transmembrane beta strand (OUT-->IN orientation)
- `H`: Transmembrane alpha helix (IN-->OUT orientation)
- `h`: Transmembrane alpha helix (OUT-->IN orientation)
- `S`: Signal peptide
- `i`: Non-Transmembrane, inside
- `o`: Non-Transmembrane, outside

```
>7acg_A|P18895|ALGE_PSEAE
MNSSRSVNPRPSFAPRALSLAIALLLGAPAFAANSGEAPKNFGLDVKITGESENDRDLGTAPGGTLNDIGIDLRPWAFGQWGDWSAYFMGQAVAATDTIETDTLQSDTDDGNNSRNDGREPDKSYLAAREFWVDYAGLTAYPGEHLRFGRQRLREDSGQWQDTNIEALNWSFETTLLNAHAGVAQRFSEYRTDLDELAPEDKDRTHVFGDISTQWAPHHRIGVRIHHADDSGHLRRPGEEVDNLDKTYTGQLTWLGIEATGDAYNYRSSMPLNYWASATWLTGDRDNLTTTTVDDRRIATGKQSGDVNAFGVDLGLRWNIDEQWKAGVGYARGSGGGKDGEEQFQQTGLESNRSNFTGTRSRVHRFGEAFRGELSNLQAATLFGSWQLREDYDASLVYHKFWRVDDDSDIGTSGINAALQPGEKDIGQELDLVVTKYFKQGLLPASMSQYVDEPSALIRFRGGLFKPGDAYGPGTDSTMHRAFVDFIWRF
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSiiiiiiiiiBBBBBBBBBBooooooooooooooooobbbbbbbbbbbiiiiiBBBBBBBBBBooooooooooooooooooooooooooooooobbbbbbbbbbiiiiiiiiiBBBBBBooooooooooooooobbbbbbbbiiiiBBBBBBBBoooooooooooooooooooobbbbbbbbiiiiiiBBBBBBBBoooooooooooooooooooooooooobbbbbbbbiiiiiiiiiiBBBBBBBBBBoooooooooooooooooooooooooooobbbbbbbbbbiiiiiBBBBBBBBoooooooooooooooooooooooooooooooooooooooooooooobbbbbbbbbiiiiiBBBBBBBBoooooooooooooooooooooooooooobbbbbbbbbiiiiiiiiiiiiiiiiiiBBBBBBBBBBooooooooooooooobbbbbbbbbi
```


## Precomputed predictions

Expand Down
4 changes: 3 additions & 1 deletion tmbed/tmbed.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,10 @@ def predict(fasta_file: Path = ARGS.fasta,
pred_map = {0: 'B', 1: 'b', 2: 'H', 3: 'h', 4: 'S', 5: '.', 6: '.'}
elif out_format in {OutFmt.F1, OutFmt.F3}:
pred_map = {0: 'B', 1: 'B', 2: 'H', 3: 'H', 4: 'S', 5: 'i', 6: 'o'}
elif out_format in {OutFmt.F4}:
pred_map = {0: 'B', 1: 'b', 2: 'H', 3: 'h', 4: 'S', 5: 'i', 6: 'o'}

if out_format in {OutFmt.F0, OutFmt.F1}:
if out_format in {OutFmt.F0, OutFmt.F1, OutFmt.F4}:
write_3_line(output_file, proteins, predictions, pred_map)
elif out_format in {OutFmt.F2, OutFmt.F3}:
write_tabular(output_file, proteins, predictions, pred_map)
Expand Down
1 change: 1 addition & 0 deletions tmbed/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class OutFmt(str, Enum):
F1 = '1'
F2 = '2'
F3 = '3'
F4 = '4'


@dataclass
Expand Down

0 comments on commit 12af867

Please sign in to comment.