### Env setup

In [None]:
conda create -n Bio

In [None]:
mamba install blast

In [None]:
conda install prodigal diamond samtools hmmer

In [None]:
mamba install prodigal prokka diamond biopython

In [None]:
export result=/mnt/d/Lab/Bile-Acid/result

In [1]:
export data=/mnt/d/Lab/Bile-Acid

### Data

https://pubseed.theseed.org//SubsysEditor.cgi?page=ShowSpreadsheet&subsystem=Bile_acids_transformations_HGM

### transeq + Diamond 

### prodigal + HMMer/Diamond 

#### prodigal

In [None]:
prodigal -i my.genome.fna -o gene.coords.gbk -a protein.translations.faa

```shell
Input/Output Parameters

  -i, --input_file:     Specify input file (default stdin).
  -o, --output_file:    Specify output file (default stdout).
  -a, --protein_file:   Specify protein translations file.
  -d, --mrna_file:      Specify nucleotide sequences file.
  -s, --start_file:     Specify complete starts file.
  -w, --summ_file:      Specify summary statistics file.
  -f, --output_format:  Specify output format.
                          gbk:  Genbank-like format (Default)
                          gff:  GFF format
                          sqn:  Sequin feature table format
                          sco:  Simple coordinate output
  -q, --quiet:          Run quietly (suppress logging output).
```

```shell
prodigal -i /mnt/d/Lab/Bile-Acid/rawdata/GCF_000154445.1_ASM15444v1_genomic.fna -o /mnt/d/Lab/Bile-Acid/result/prodigal/GCF_000154445.1_ASM15444v1_genomic.gff -a /mnt/d/Lab/Bile-Acid/result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa -f gff
```

#### HMMer

In [None]:
/mnt/d/Lab/Bile-Acid/result/hmmer

In [None]:
/mnt/d/Lab/Bile-Acid/phmm/CBAH.hmm

```shell
hmmsearch --cpu 4 --tblout /mnt/d/Lab/Bile-Acid/result/hmmer/GCF_000154445.1_ASM15444v1_genomic.tbl /mnt/d/Lab/Bile-Acid/phmm/CBAH.hmm /mnt/d/Lab/Bile-Acid/result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa > /dev/null
```

HMM output table

use biopython to handle the `.tbl` output filem

#### Diamond

```shell
diamond blastp -q $result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa -o $result/diamond/test_out --evalue 1.0 --max-target-seqs 1 --outfmt 6 --db $data/db/BSH_anno.fasta -p 8
```

Parse Blast like tab output

https://biopython.org/docs/1.75/api/Bio.SearchIO.BlastIO.html

In [2]:
diamond_result = SearchIO.read("/mnt/d/Lab/Bile-Acid/result/hmmer/GCF_000154445.1_ASM15444v1_genomic.tbl", "blast-tab")

### Blastp

In [2]:
conda activate Bio

(Bio) 

: 1

In [3]:
blastp -help

USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-ipglist filename]
    [-negative_ipglist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-c

: 1

```shell
blastp -out /mnt/d/Lab/Bile-Acid/result/blastp/test.tsv -outfmt 6 -query /mnt/d/Lab/Bile-Acid/result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa -db /mnt/d/Lab/Bile-Acid/db/BSH_anno.fasta -evalue 0.001 -max_target_seqs 1
```

```shell
blastp -out /mnt/d/Lab/Bile-Acid/result/blastp/test.xml -outfmt 5 -query /mnt/d/Lab/Bile-Acid/result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa -db /mnt/d/Lab/Bile-Acid/db/BSH_anno.fasta -evalue 0.001
```

In [None]:
blastp -db /mnt/d/Lab/Bile-Acid/db/BSH_anno.fasta –query $result/prodigal/GCF_000154445.1_ASM15444v1_genomic.faa –out $result/blastp

### Custom DataBase

`diamond makedb --in aa.fasta -d aa.fasta`

`makeblastdb -in nr.fasta  -dbtype nucl`

In [None]:
diamond makedb --in BSH_anno.fasta -d BSH_anno.fasta

In [None]:
diamond makedb --in HSDH.fasta -d HSDH

In [None]:
makeblastdb -in BSH_anno.fasta -dbtype prot

In [2]:
conda activate Bio

(Bio) 

: 1

In [4]:
blastp -help

USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-ipglist filename]
    [-negative_ipglist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-c

: 1

### Test data

Clostridium hiranonis DSM 13275 (500633.7)

Clostridium bartlettii DSM 16795 (445973.7)

Ruminococcus gnavus ATCC 29149 (411470.6)

Methanosphaera stadtmanae DSM 3091 (339860.6)

Erysipelotrichaceae bacterium 3_1_53 (658659.3)