Skip to content

Commit

Permalink
Merge pull request #2002 from Clinical-Genomics/release/mip11.1
Browse files Browse the repository at this point in the history
Release/mip11.1
  • Loading branch information
jemten committed Dec 14, 2022
2 parents 69923a3 + 230147a commit 67561a6
Show file tree
Hide file tree
Showing 116 changed files with 5,402 additions and 948 deletions.
84 changes: 75 additions & 9 deletions CHANGELOG.md
Expand Up @@ -3,6 +3,54 @@
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).

## [11.1.0]

- Save raw files from ExpansionHunter
- Run UPD and subsequently chromograph on unaffected children
- Annotate SV variants with the caller that reported the variant
- Produce files for CNV analysis in Gens
- Updated SO terms for new version of VEP
- ExACpLI -> pLI, see [vep issue 108](https://github.com/Ensembl/VEP_plugins/issues/108)
- Use REVEL_score rather than REVEL_rankscore for the ranking algorithm
- Use BWA-mem2 instead of BWA mem for mapping
- Set default annotation overlap for structural variants to 0.5 (previously 0.8), due to change in TIDDIT
- Turn on Stringtie and gffcompare by default
- Run varg on research vcf
- Increase max for coverage calculation to 500x
- Separate list of ranked SO terms for structural variants to ensure that the right SO term gets picked as the most severe for SVs
- Adds option to use bedpe files with svdb query

### Tools

- Arriba: 2.1.0 -> 2.3.0
- Chromograph 1.1.4 -> 1.3.0
- DeepVariant: 1.1.0 -> 1.4.0
- ExpansionHunter: 4.0.2 -> 5.0.0
- GATK: 4.2.2.0 -> 4.2.6.1
- HTSlib: 1.13 -> 1.15.1
- MultiQC: 1.11 -> 1.12
- Peddy: 0.4.3 -> 0.4.8
- Picard: 2.25.0 -> 2.27.2
- SMNCopyNumberCaller 1.1.1 -> 1.1.2
- Star Fusion: 1.10.1 -> 1.11.0
- Stranger: 0.8.0 -> 0.8.1
- Stringtie: 2.1.3b -> 2.2.1
- Tiddit: 2.12.1 -> 3.3.2
- Trimgalore: 0.6.4 -> 0.6.7
- VEP: 104.3 -> 107.0
- svdb: 2.4.0 -> 2.7.0
- vcf2cytosure v0.5.1 -> v0.8

### Databases

- clinvar: 20211010 -> 20220829
- dbnsfp: 4.1a -> 4.3a (grch38 only)
- gnomad: r3.1.1 -> r3.1.2 (grch38 only)
- giab: 3.3.2 -> 4.2.1
- loqusdb dump: 20210921 -> 20220905
- nist: v3.3.2 -> v4.2.1
- vcf2cytosure blacklist: 200520

## [11.0.3]

- Initiate conda prior to activation
Expand All @@ -12,6 +60,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
Updates chromograph

### Tools

chromograph 1.1.4 -> 1.1.5

## [11.0.1]
Expand All @@ -31,15 +80,16 @@ chromograph 1.1.4 -> 1.1.5

### Tools

cyrius v1.1 -> v1.1.1
deeptrio 1.1.0-gpu -> 1.2.0-gpu
gatk 4.2.0.0 -> 4.2.2.0
glnexus v1.3.1 -> v1.4.1
HmtNote: 0.7.2
htslib: 1.10.2 -> 1.13
multiqc 1.10.1 -> v1.11
star-fusion 1.10.0 -> 1.10.1
vep release_103.1 -> release_104.3
- cyrius v1.1 -> v1.1.1
- deepvariant 1.1.0 -> 1.2.0
- deeptrio 1.1.0 -> 1.2.0
- gatk 4.2.0.0 -> 4.2.2.0
- glnexus v1.3.1 -> v1.4.1
- HmtNote: 0.7.2
- htslib: 1.10.2 -> 1.13
- multiqc 1.10.1 -> v1.11
- star-fusion 1.10.0 -> 1.10.1
- vep release_103.1 -> release_104.3

### References

Expand All @@ -63,6 +113,22 @@ vep release_103.1 -> release_104.3

- Updates Chromograph to version 1.1.4

## [10.2.5]

- Allow slurm quality of service flag to be set to 'express'

## [10.2.4]

- Split Star-Fusion alignment and detection into two recipes
- Use temp directory with Star-Fusion
- Resource bump for RNA
- Limit memory for glnexus
- Use non-gpu version of Deepvariant by default

## [10.2.3]

- Updates Chromograph to version 1.1.4

## [10.2.2]

- Adds missing median coverage metrics to metrics deliverable file
Expand Down
10 changes: 5 additions & 5 deletions README.md
Expand Up @@ -29,7 +29,7 @@ PMID:25495354

## Overview

**MIP is being rewritten in NextFlow as a part of the [nf-core](https://nf-co.re/) project. This repo will mainly receive bugfixes as we are focusing our resources on the new pipeline.**
**MIP is being rewritten in NextFlow as a part of the [nf-core](https://nf-co.re/) project. This repo will mainly receive bugfixes as we are focusing our resources on the new pipeline.**
**You can follow the progress here :point_right: [raredisease](https://github.com/nf-core/raredisease).**

MIP performs whole genome or target region analysis of sequenced single-end and/or paired-end reads from the Illumina platform in fastq\(.gz\) format to generate annotated ranked potential disease causing variants.
Expand Down Expand Up @@ -135,7 +135,7 @@ $ cd MIP

```Bash
$ bash mip_install_perl.sh -e [mip] -p [$HOME/miniconda3]
```
```

##### 3. Test conda and mip installation files (optional, but recommended)

Expand Down Expand Up @@ -176,7 +176,7 @@ $ perl t/mip_analyse_rd_dna.test

MIP is called from the command line and takes input from the command line \(precedence\) or falls back on defaults where applicable.

Lists are supplied as repeated flag entries on the command line or in the config using the yaml format for arrays.
Lists are supplied as repeated flag entries on the command line or in the config using the yaml format for arrays.
Only flags that will actually be used needs to be specified and MIP will check that all required parameters are set before submitting to SLURM.

Recipe parameters can be set to "0" \(=off\), "1" \(=on\) and "2" \(=dry run mode\). Any recipe can be set to dry run mode and MIP will create the sbatch scripts, but not submit them to SLURM. MIP can be restarted from any recipe using the ``--start_with_recipe`` flag and after any recipe using the `--start_after_recipe` flag.
Expand Down Expand Up @@ -233,6 +233,6 @@ MIP will place any generated data files in the output data directory specified b
[Miniconda]: http://conda.pydata.org/miniconda.html
[Pedigree file]: https://github.com/Clinical-Genomics/MIP/tree/master/templates/643594-miptest_pedigree.yaml
[Perl]:https://www.perl.org/
[Rank model file]: https://github.com/Clinical-Genomics/MIP/blob/master/templates/rank_model_-v1.33-.ini
[SV rank model file]: https://github.com/Clinical-Genomics/MIP/blob/master/templates/svrank_model_-v1.8-.ini
[Rank model file]: https://github.com/Clinical-Genomics/MIP/blob/master/templates/rank_model_-v1.34-.ini
[SV rank model file]: https://github.com/Clinical-Genomics/MIP/blob/master/templates/svrank_model_-v1.9-.ini
[Qc regexp file]: https://github.com/Clinical-Genomics/MIP/blob/master/templates/qc_regexp_-v1.26-.yaml
24 changes: 10 additions & 14 deletions containers/bootstrapann/Dockerfile
Expand Up @@ -4,27 +4,23 @@ FROM python:2.7-slim

################## METADATA ######################

LABEL base_image="python:2.7-slim"
LABEL "base_image"="python:2.7-slim"
LABEL version="2"
LABEL software="BootstrapAnn"
LABEL software.version="e557dd3"
LABEL extra.binaries="BootstrapAnn.py"
LABEL maintainer="Clinical-Genomics/MIP"

RUN apt-get update && apt-get install -y git
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN apt-get update && apt-get install -y --no-install-recommends git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
pip install --no-cache-dir numpy scipy && \
git clone https://github.com/J35P312/BootstrapAnn.git /usr/local/BootstrapAnn

RUN pip install numpy scipy
WORKDIR /usr/local/BootstrapAnn

## Clone git repository
RUN git clone https://github.com/J35P312/BootstrapAnn.git /usr/local/BootstrapAnn

RUN cd /usr/local/BootstrapAnn && git checkout e557dd3

RUN cd /usr/local/BootstrapAnn && \
chmod a+x BootstrapAnn.py

RUN ln --symbolic --force /usr/local/BootstrapAnn/BootstrapAnn.py /usr/local/bin/BootstrapAnn.py
RUN git checkout e557dd3 && \
chmod a+x BootstrapAnn.py && \
ln --symbolic --force /usr/local/BootstrapAnn/BootstrapAnn.py /usr/local/bin/BootstrapAnn.py

WORKDIR /data/
10 changes: 5 additions & 5 deletions containers/chromograph/Dockerfile
Expand Up @@ -5,9 +5,9 @@ FROM clinicalgenomics/mip_base:2.1
################## METADATA ######################

LABEL base_image="clinicalgenomics/mip_base:2.1"
LABEL version="13"
LABEL version="14"
LABEL software="chromograph"
LABEL software.version="1.1.5"
LABEL software.version="1.3.0"
LABEL extra.binaries="chromograph"
LABEL maintainer="Clinical-Genomics/MIP"

Expand All @@ -22,9 +22,9 @@ RUN conda install pip python=3.9 matplotlib && \

WORKDIR /opt/conda/share

RUN wget --no-verbose https://github.com/mikaell/chromograph/archive/refs/tags/v1.1.5.zip && \
unzip v1.1.5.zip && \
cd chromograph-1.1.5 && \
RUN wget --no-verbose https://github.com/mikaell/chromograph/archive/refs/tags/v1.3.0.zip && \
unzip v1.3.0.zip && \
cd chromograph-1.3.0 && \
python -m pip install --no-cache-dir .

WORKDIR /data/
32 changes: 22 additions & 10 deletions containers/expansionhunter/Dockerfile
@@ -1,19 +1,31 @@
################## BASE IMAGE ######################

FROM clinicalgenomics/mip_base:2.1
FROM ubuntu:bionic

################## METADATA ######################

LABEL base_image="clinicalgenomics/mip_base:2.1"
LABEL version="3"
LABEL software="expansionhunter"
LABEL software.version="4.0.2"
LABEL extra.binaries="expansionhunter"
LABEL base_image="ubuntu:bionic"
LABEL version="4"
LABEL software="ExpanionHunter"
LABEL software.version="5.0.0"
LABEL extra.binaries="ExpanionHunter"
LABEL maintainer="Clinical-Genomics/MIP"

RUN conda install -c bioconda expansionhunter=4.0.2
## Install wget
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
wget \
libreadline-dev && \
apt-get clean && \
apt-get purge && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

## Clean up after conda
RUN /opt/conda/bin/conda clean -ya
WORKDIR /app

WORKDIR /data/
RUN wget -nv https://github.com/Illumina/ExpansionHunter/releases/download/v5.0.0/ExpansionHunter-v5.0.0-linux_x86_64.tar.gz && \
tar -xvf ExpansionHunter-v5.0.0-linux_x86_64.tar.gz && \
rm ExpansionHunter-v5.0.0-linux_x86_64.tar.gz

ENV PATH=/app/ExpansionHunter-v5.0.0-linux_x86_64/bin:${PATH}
4 changes: 4 additions & 0 deletions containers/gens_preproc/Dockerfile
@@ -0,0 +1,4 @@
# syntax=docker/dockerfile:1
FROM clinicalgenomics/htslib:1.13
WORKDIR /bin
COPY . .
131 changes: 131 additions & 0 deletions containers/gens_preproc/generate_gens_data.pl
@@ -0,0 +1,131 @@
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use List::Util qw(sum);
use File::Basename qw(dirname);

if ( $ARGV[0] eq "--version" ) {
print "generate_gens_data.pl 1.0.2\n";
exit 0;
}

my $SCRIPT_ROOT = dirname($0);

my @COV_WINDOW_SIZES = ( 100000, 25000, 5000, 1000, 100 );
my @BAF_SKIP_N = ( 160, 40, 10, 4, 1 );
my @PREFIXES = qw( o a b c d );
my $cov_fn = $ARGV[0];
my $gvcf_fn = $ARGV[1];

my $SAMPLE_ID = $ARGV[2];
my $GNOMAD = $ARGV[3];

my $COV_OUTPUT = $SAMPLE_ID . ".cov.bed";
my $BAF_OUTPUT = $SAMPLE_ID . ".baf.bed";

print STDERR "Calculating coverage data\n";

# Calculate coverage data
open( COVOUT, ">" . $COV_OUTPUT );
for my $i ( 0 .. $#COV_WINDOW_SIZES ) {
generate_cov_bed( $cov_fn, $COV_WINDOW_SIZES[$i], $PREFIXES[$i] );
}
close COVOUT;

print STDERR "Calculating BAFs from gvcf...\n";

# Calculate BAFs
system( $SCRIPT_ROOT. "/gvcfvaf.pl " . "$gvcf_fn $GNOMAD > baf.tmp" );
open( BAFOUT, ">" . $BAF_OUTPUT );
for my $i ( 0 .. $#BAF_SKIP_N ) {
print STDERR "Outputting BAF $PREFIXES[$i]...\n";
generate_baf_bed( "baf.tmp", $BAF_SKIP_N[$i], $PREFIXES[$i] );
}
close BAFOUT;

system("bgzip -f -\@10 $BAF_OUTPUT");
system("tabix -f -p bed $BAF_OUTPUT.gz");
system("bgzip -f -\@10 $COV_OUTPUT");
system("tabix -f -p bed $COV_OUTPUT.gz");
unlink("baf.tmp");

sub generate_baf_bed {
my ( $fn, $skip, $prefix ) = @_;
open( my $fh, $fn );
my $i = 0;
while (<$fh>) {
if ( $i++ % $skip == 0 ) {
chomp;
my @a = split /\t/;
print BAFOUT $prefix . "_"
. $a[0] . "\t"
. ( $a[1] - 1 ) . "\t"
. $a[1] . "\t"
. $a[2] . "\n";
}
}
close $fh;
}

sub generate_cov_bed {

my ( $fn, $win_size, $prefix ) = @_;

open( my $fh, $fn );
my ( $reg_start, $reg_end, $reg_chr, $force_end );
my @reg_ratios;
while (<$fh>) {
next if /^@/ or /^CONTIG/;
chomp;
my ( $chr, $start, $end, $ratio ) = split /\t/;
my $orig_end = $end;
unless ($reg_start) {
$reg_start = $start;
$reg_end = $end;
$reg_chr = $chr;
}

if ( $chr eq $reg_chr ) {
if ( $start - $reg_end < $win_size ) {
push @reg_ratios, $ratio;
$reg_end = $end;
}

# If there is a large gap to the next region, prematurely end region
else {
$force_end = 1;
$end = $reg_end;
}
}
else {
$force_end = 1;
$end = $reg_end;
}
if ( $end - $reg_start + 1 >= $win_size or $force_end ) {
my $mid_point = $reg_start + int( ( $end - $reg_start ) / 2 );
print COVOUT $prefix . "_"
. $reg_chr . "\t"
. ( $mid_point - 1 ) . "\t"
. $mid_point . "\t"
. mean(@reg_ratios) . "\n";
undef $reg_start;
undef $reg_end;
undef $reg_chr;
undef @reg_ratios;
}

if ($force_end) {
$reg_start = $start;
$reg_end = $orig_end;
$reg_chr = $chr;
push @reg_ratios, $ratio;
undef $force_end;
}
}
close $fh;
}

sub mean {
return sum(@_) / @_;
}

0 comments on commit 67561a6

Please sign in to comment.