Skip to content
This repository has been archived by the owner on Mar 1, 2020. It is now read-only.

A buffet of analysis scripts available after running Assemblytics

Notifications You must be signed in to change notification settings

MariaNattestad/post-Assemblytics

Repository files navigation

Post-Assemblytics analysis buffet

This repository contains a collection of scripts for doing additional analysis beyond calling variants in Assemblytics. These are not executable but rather just notes and code where you can substitute your own files. There is no input validation here, so make sure what each step is doing makes sense on your own files.

First use Assemblytics at qb.cshl.edu/assemblytics to call variants, download the zip file with all results, and use the .bed file of variants for these analyses.

Analyses available so far:

  • Basic intersection with genes (needs gff annotation file) using bedtools
  • Homopolymer analysis of insertions (needs the query fasta file, the one used in nucmer alignment)
    • this could be adapted to extract deletions as well, or any other features
    • the count_homopolymers.py script looks for homopolymers in a file containing the inserted sequence as well as the flanking sequences.
  • Alu element analysis using RepeatMasker for intersecting deletions, and then extracting insertions from the query fasta file so we can BLAST them. This extraction from the query/assembly can also be done using bedtools, which may be faster as used in the homopolymer analysis.

About

A buffet of analysis scripts available after running Assemblytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages