Skip to content

Collection of Scripts used for Population Genetic Study of RNAi Genes Duplications in Drosophila

Notifications You must be signed in to change notification settings

danangcrysnanto/RNAi-duplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAi genes duplications in Drosophila

Drosophila

All supporting scripts that I use to analyze genomic data in my Master's Dissertation. The detailed explanations for each script is given below

Variant Calling

  • I used Bowtie2 , samtools v1.4 and GATK v3.5 to collect variants of 15-20 genes from short reads population genomic data of Drosophila pseudoobscura (12 strains), Drosophila miranda (12 strains) see McGaugh et.al, 2012 and Drosophila athabsca (28 strains) see Miller et al. ,2017. Finally, I used fasta_formatter from FastX-Toolkit to organize the fasta files according to the genes. The script is available here: varcall.sh
  • Any linkage information in heterozygous individual is recovered using FastPhase. I created a R script to parse fasta files into format suitable for FastPhase, running FastPhase and reconvert the output back into Fasta format. The script is available here: FastPhaseIntercovert.R. The original script credits to Dr Darren Obbard and I modified the script so that it compatible with current version of FastPhase.
  • To gain information on the effect of weakly deleterious mutations, I created a R script which removes variants with MAF less than 0.15. The scripts run by parsing fasta files into matrix, remove variants with MAF less than 0.15 and replace those variants with major variants at corresponding site and finally, convert back the matrix into Fasta format. The script is available here: MAFRemover.R

VarCall

Expression Analysis

Phylogenetic Analysis

  • I collected gene sequences using tBLASTN of local database. To automate the process, I write a script which will take sequences query, doing BLAST search and outputing fasta from BLAST hits output (and reverse complementing when the hits map in reverse direction). The script is available here: Reverse-ComplementBLASThits.sh and I also created script to manually select region of interest from sequence database (which I find quite handy when inspecting sequence and looking for the surrounding sequence). This script is available here: BLAST search
  • For the species that only has genomic reads, I used a targetted assembly approach to assemble only reads that match with known RNAi protein. I identify the reads using Diamond and then use single cell/bacterial assembly program, Spades. The script is available here: Diamond_Spades.sh.

About

Collection of Scripts used for Population Genetic Study of RNAi Genes Duplications in Drosophila

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published