Skip to content

Latest commit

 

History

History
88 lines (57 loc) · 3.39 KB

README.md

File metadata and controls

88 lines (57 loc) · 3.39 KB

Lep busco painter

Paints chromosomes of lepidopteran genomes with BUSCOs.

Installation

conda env create -n buscopaint python=3.9 
conda activate buscopaint
conda install samtools 
conda install -c conda-forge r-base
conda install -c r r-tidyverse
conda install -c bioconda r-optparse

Running the scripts

1. Assign each BUSCO to a chromosome

buscopainter.py takes the full_table.tsv output file generated by BUSCOs for a "reference" genome and a "query" genome, along with an optional prefix (specified with -p, default "buscopainter") snf assigns each BUSCO to a chromosome and states whether it belongs to the dominant group of BUSCOs per chromosome ('self') or not.

buscopainter.py -r test_data/ilAglIoxx1_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv
buscopainter.py -r test_data/Merian_elements_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv

It will write three TSV files:

  • [PREFIX]_complete_summary.tsv which contains a summary of the chromosomal assignments
  • [PREFIX]_complete_location.tsv which contains the location and status of all shared complete BUSCOs.
  • [PREFIX]_duplicated_location.tsv which contains the location and status of all duplicated BUSCOs.

2. Plotting

The [PREFIX]_location.tsv files can be plotted using plot_buscopainter.R. This plots the chromosomes of the query genome as rectangles and paints the positions of complete/duplicated BUSCOs as lines which are coloured by their assigned chromosome in the reference genome. This script has one required argument - thelocation.tsv file. Optional arguments are:

  • Plot title (-p)
  • Index file (-i) - enables chromosomes to be drawn to size (rather than based on the last orthologs position)
  • Merian element mode (-m True) - paint chromosomes with Merian elements rather than query genome orthologs
  • Only plot differences mode (-d True) - only paint orthologs which do not belong to the dominant chromosome based on the reference
  • Custom threshold of orthologs (-n) - minimum number of orthologs on a given query chromosome for it to be displayed (this helps to filter out unplaced scaffolds). Default is >=3 orthologs.
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' 
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' -i ilAglIoxx1.fai -m True -d True

Full usage:

Options:
	-f CHARACTER, --file=CHARACTER
		location.tsv file

	-p CHARACTER, --prefix=CHARACTER
		prefix for plot title

	-i CHARACTER, --index=CHARACTER
		genome index file

	-m CHARACTER, --merians=CHARACTER
		use this flag if you are comparing a genome to Merian elements

	-d CHARACTER, --differences=CHARACTER
		only colour orthologs that have moved from the dominant chromosome

	-n NUMBER, --minimum=NUMBER
		minimum number of orthologs 

	-h, --help
		Show this help message and exit

NB: the index file can be generated via samtools faidx fasta.

Example output

Comparison of two genomes - painting all shared single-copy orthologs.

Comparison of one genome to Merian elements - painting only single-copy orthologs that have moved relative to Merian elements.