Skip to content

PanPhlAn download pangenome 3_0

leonarDubois edited this page Jan 20, 2021 · 6 revisions

Pangenomes are build for species for which at least 2 reference genomes are available. These files are available on this DropBox. They can also be easily downloaded using the panphlan_download_pangenome.py script.

Example:

panphlan_download_pangenome.py -i Eubacterium_rectale

Input

  • -i INPUT_NAME the name of a species

Output

The tar.bz2 archive is downloaded if available and uncompressed at the location given by the --output argument. If none is provided, the pangenome folder will be created in the local directory.

Output content

The retrieved folder contains the pangenome contigs in a multi-FASTA .fna file, the bowtie2 indexes, an annotation .tsv file mapping gene families (UniRef) to GO, KO, KEGG, Pfam, eggNOG... and a pangenome .tsv file containing all information needed to map the genes to the sequences.
The organization of this last file is UniRef90 cluster IDs, gene ID, genome ID, contig ID, start position, stop position

Help -h

usage: panphlan_download_pangenome.py [-h] -i INPUT_NAME -o OUTPUT [-v] [--retry RETRY] [--wait WAIT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Show progress information
  --retry RETRY         Number of retry in pangenome download. Default is 5
  --wait WAIT           Number of second spend waiting between download retries. Default 60
  -o OUTPUT, --output OUTPUT
                        output location

required arguments:
  -i INPUT_NAME, --input_name INPUT_NAME
                        Name of species to download