Skip to content

crcardenas/Adephaga_UCE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title author reviewed by date
Adephaga_UCE
CR Cardenas
Dr. Jeremy Gauthier
2023.09.26

Adephaga_UCE README

Tasks

  1. double check synteny workflow
  2. add R markdown/script files
  3. check grammar in markdown files

UCE characerization and concatenation for phylogenomic analysis of Adephaga

The goal is to map probes to genomes in order to realize the total overalp of all probes used n the Adephaga data for concatenation/merging multiple UCEs (as in Van Dam et al 2021) on the same gene and UCE characterization. In general, folks will only have one probeset to use. But because I am integrating anchored hybrid enrichment data, Adephaga UCE probesAdephaga UCE probes, and the original Coleoptera UCE probes I will need to create a merged dataset (nameed joined).

Two important assumptions being made about the genome and genes being used:

  1. Intergenic sequences close to genes are not being considered as promotors. This is mainly due to this information being unknown.
  2. There is no alternative splicing or overlapping genes (ex: overlaping genes).

Goals of this workflow:

  1. identify what probes are genetic or intergenic
  2. identify the overlap/intersect of probesets used between datasets
  3. create a new probeset that integrates all probes for use in Phyluce
  4. create a list of probes that should be concatenated in a partition file for phylogenetic analysis
    1. create a script to integrate it based on an existing partition file (e.g., output of Phyluce)

Software and packages used

!!! create a json file with environment for download and easy duplication of the environent

Recomended installation procedure:

conda create --name characterization
conda activate characterization
conda install -c bioconda bedtools blast bwa samtools seqkit 
conda install -c anaconda natsort

This pipeline should be able to run on a personal laptop (windows with linux subsystem and linux, uncertain about mac) with sufficient storage, memory and CPU available.

All scripts are found in the scripts directory

To follow this workflow, for now see the complete markdown files; or follow the step by step workflow:

  1. Data processing
  2. Map probes to gene features and intersect
  3. Integrate probesets
  4. Concatenate probes by gene

Additional Adephaga UCE scripts

An additional downstream result of this workflow is testing the effect of flanking+core-probe-regions, flanking, and core-probe-regions of sequence data in phylogenetic analysis inside phyluce. Initial analyses showed incredibly gappy alignments, likely due to the depth of evolutionary relationships and integration of a diverse set of genomic data collection methods.

The following markdown file: contains the scripts necessary to "slice" flanking regions from UCE data.

  1. Slice flanking from core probe region
  2. Use synteny for a comparison of probe targets between genomes.
  3. Identify feature changes of probes between genomes inprep

About

Characterization of adephaga UCE loci

Resources

Stars

Watchers

Forks

Languages