Finding novel variations of germline immunoglobulin genes using WGS data

This project is dedicated to find new Ig genes from whole genome sequencing data.


Main method of our investigation is aligning. We try to align reads from WGS data to templates of already known V-genes and find new variants. Now we are using Biopython built-in local aligner which is similar in results to aligner from EMBOSS.


  1. Mandatory - these are necessary to launch
    • python >= 3.6
    • Biopython >= 1.70
  2. Facultative - using in simulation
    • numpy >= 1.14.0
    • bowtie2 >= 2.2.6
    • samtools >= 1.7
    • bedtools >= 2.25.0


To copy everything into directory bioinf_vsegments type in shell:

git clone


For now all fuctionality is inside these scripts:

  • - align sequences
  • - generate full V segments
  • - generate reads with noise from sequence

It is subject to change. We are going to make it available for usage from cli, so you shouldn`t try to use until we make everything as it should be.



It is raw version where we are configuring everything on IgH V-segment to make it work properly.

Just wait until release where everything will be done first for IgH V-segment and afterwards for all other chains.


  • Yana Safonova
  • Andrey Slabodkin
  • Sasha Ilin


I`d like to thank everybody especially Bioinformatics Institute.

