title | author | reviewed by | date |
---|---|---|---|
Adephaga_UCE |
CR Cardenas |
Dr. Jeremy Gauthier |
2023.09.26 |
- double check synteny workflow
- add R markdown/script files
- check grammar in markdown files
The goal is to map probes to genomes in order to realize the total overalp of all probes used n the Adephaga data for concatenation/merging multiple UCEs (as in Van Dam et al 2021) on the same gene and UCE characterization. In general, folks will only have one probeset to use. But because I am integrating anchored hybrid enrichment data, Adephaga UCE probesAdephaga UCE probes, and the original Coleoptera UCE probes I will need to create a merged dataset (nameed joined).
Two important assumptions being made about the genome and genes being used:
- Intergenic sequences close to genes are not being considered as promotors. This is mainly due to this information being unknown.
- There is no alternative splicing or overlapping genes (ex: overlaping genes).
- identify what probes are genetic or intergenic
- identify the overlap/intersect of probesets used between datasets
- create a new probeset that integrates all probes for use in Phyluce
- create a list of probes that should be concatenated in a partition file for phylogenetic analysis
- create a script to integrate it based on an existing partition file (e.g., output of Phyluce)
Recomended installation procedure:
conda create --name characterization
conda activate characterization
conda install -c bioconda bedtools blast bwa samtools seqkit
conda install -c anaconda natsort
This pipeline should be able to run on a personal laptop (windows with linux subsystem and linux, uncertain about mac) with sufficient storage, memory and CPU available.
All scripts are found in the scripts directory
To follow this workflow, for now see the complete markdown files; or follow the step by step workflow:
- Data processing
- Map probes to gene features and intersect
- Integrate probesets
- Concatenate probes by gene
An additional downstream result of this workflow is testing the effect of flanking+core-probe-regions, flanking, and core-probe-regions of sequence data in phylogenetic analysis inside phyluce. Initial analyses showed incredibly gappy alignments, likely due to the depth of evolutionary relationships and integration of a diverse set of genomic data collection methods.
The following markdown file: contains the scripts necessary to "slice" flanking regions from UCE data.
- Slice flanking from core probe region
- Use synteny for a comparison of probe targets between genomes.
- Identify feature changes of probes between genomes inprep