Skip to content

A tool to assign ancestral linkage units and/or identify fusion/fission events in Lepidopteran chromosomes based on a set of reference BUSCO genes as markers.

License

Notifications You must be signed in to change notification settings

charlottewright/lep_fusion_fission_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lep fusion fission finder

A tool to assign ancestral linkage units and/or identify fusion/fission events in Lepidopteran chromosomes based on a set of reference BUSCO genes as markers.

Running the scripts

1.) Find fusions/fissions

lep_fusion_fission_finder.py takes the full_table.tsv output file for two species, along with an optional prefix (specified with -f, default "fsf"). The default window size for lepidoptera is 17 BUSCOs but this can be changed with the -w flag e.g.:

python3 lep_fusion_split_finder.py -q test_data/Aglais_io_full_table.tsv -r test_data/Melitaea_cinxia_full_table.tsv -f Aglais-w 17`

This will write three files:

  • Aglais_chromosome_assignments.tsv: a summary of the assignments for each scaffold in the query genome. For fused/fission chromosomes, their putative origins are listed.

  • Aglais_warnings.tsv: list of warnings - lists any contigs with under the threshold of BUSCOs specified (default: 17). Also records the number linkage units found if not 31 as expected. Also records total number of units if not the expected 31.

  • Aglais_fusion_positions.tsv: for each chromosome that is inferred to be a product of fusion, the start and end position of each ancestral block is reported.

Full usage:

usage: lep_fusion_fission_finder.py [-h] -r REFERENCE_TABLE -q QUERY_TABLE [-f PREFIX] [-w WINDOW_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  -r REFERENCE_TABLE, --reference_table REFERENCE_TABLE
                        full_table.tsv file for reference species
  -q QUERY_TABLE, --query_table QUERY_TABLE
                        full_table.tsv for query species
  -f PREFIX, --prefix PREFIX
                        Prefix for all output files
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        Number of BUSCOs to be used per window (must be odd)

2.) Place fusions/fissions in a phylogenetic context

map_fusion_fissions.py takes the output of fusion_split_finder.py and infers where fusion/fission occured in a given tree.

./map_fusion_fissions.py -i chr_assignments/ -tree spp.treefile -t 1 -o output/ -f test_run

This will a file called mapped_fusions_fissions.tsv which contains a list of each fusion/fission event

Full uage:

usage: map_fusion_fissions_client.py [-h] -i INPUT_DATA [-tree TREE] -o OUTPUT [-f PREFIX] [-t THRESHOLD] [-l LABEL_STATUS]
optional arguments:
 -h, --help            show this help message and exit
 -i INPUT_DATA, --input_data INPUT_DATA
                       path to lep_fusion_fission_finder output
 -tree TREE, --tree TREE
                       Phylogenetic tree
 -o OUTPUT, --output OUTPUT
                       output location relative to working directory
 -f PREFIX, --prefix PREFIX
                       Prefix for all output files
 -t THRESHOLD, --threshold THRESHOLD
                       Threshold for rearrangement to be shared between tips
 -l LABEL_STATUS, --label_status LABEL_STATUS
                       Specify if tree already contains internal node labels

Additional scripts:

adjust_coordinates_of_fusions.py takes the tsv file containing the fusion coordinates (generated by lep_fusion_fission_finder.py) and adjusts the final portion of each fusion chromosome such that the reported coordinate reflects the end of the chromosome (i.e. the chromosome length) rather than the position of the last detected ortholog. This produces an adjusted tsv that can be used for downstream exploration of fusions.

./adjust_coordinates_of_fusions.py  --help
usage: adjust_coordinates_of_fusions.py [-h] -f FUSIONS -i INDEX -p PREFIX

optional arguments:
  -h, --help            show this help message and exit
  -f FUSIONS, --fusions FUSIONS
                        Fusion position output from LFSF
  -i INDEX, --index INDEX
                        index file for genome
  -p PREFIX, --prefix PREFIX
                        prefix for output file

About

A tool to assign ancestral linkage units and/or identify fusion/fission events in Lepidopteran chromosomes based on a set of reference BUSCO genes as markers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages