# Pynteny: example

In [None]:
from pathlib import Path

from pynteny.src.utils import CommandArgs
from pynteny.src.subcommands import synteny_search, download_hmms, build_database

## Download PGAP profile HMM database

Firt, let's download [PGAP](https://academic.oup.com/nar/article/49/D1/D1020/6018440)'s profile HMM database from the NCBI webpage. To this end, we will use pynteny subcommand `download`, which will unzip and store files in the specified output directory. The metadata file will be parsed and filtered to remove HMM entries which are not available in the downloaded database (this is to avoid possible downstream errors).

In [22]:
%%bash

pynteny download --outdir data/hmms --unpack

## Build peptide sequence database

For this example we are going to use the [MAR reference](https://mmp2.sfb.uit.no/marref/) database (currently version _v7_), a collection of 970 fully sequenced prokaryotic genomes from the marine environment. Specifically, we will use the assembly data file containing the assembled nucleotide sequences.

Our final goal is to build a peptide sequence database in a single FASTA file where each record corresponds to a inferred ORF, which will display the positional information (i.e. ORF number within the parent contig as well as the DNA strand). To this end, we will run pynteny's subcommand `build`, which will take care of:

- Predict and translate ORFs with [prodigal]()
- Label each ORF with a unique identifier and add positional metadata (with respect to the parent contig)

To follow this example, you should have previously downloaded the assembly data file, `assembly.fa`, from [MAR ref](https://mmp2.sfb.uit.no/marref/).

In [18]:
%%bash

pynteny build \
    --data data/assembly.fa\
    --outfile data/labelled_marref.fasta

## Search synteny structure in MAR ref

Finally, we are going to use pynteny's `search` subcommand to search for a specific syntenic block withinn the previously built peptide database. Specifically, we are interested in

In [21]:
%%bash

pynteny search \
    --synteny_struc "" \
    --data "" \
    --outdir "" \
    --hmm_dir "" \
    --hmm_meta "" \
    --gene_ids

In [None]:
search_state = CommandArgs(
    data="data/MG1655.fasta",
    synteny_struc=None,
    hmm_dir="data/hmms",
    hmm_meta="data/hmms/hmm_PGAP_no_missing.tsv",
    outdir=Path("results"),
    prefix="",
    hmmsearch_args=None,
    gene_ids=False,
    logfile=True,
    processes=None,
    unordered=False,
    )

synhits = synteny_search(search_state).getSyntenyHits()

In [None]:
search_state.synteny_struc = "<TIGR00171.1 0 <TIGR00170.1 1 <TIGR00973.1"

synhits = synteny_search(search_state).getSyntenyHits()