Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Commit

Permalink
Adding end-to-end tests hg19-chr22 (#61)
Browse files Browse the repository at this point in the history
Closes: #61
Related-Issue: #61
Projected-Results-Impact: none
  • Loading branch information
holtgrewe committed Sep 12, 2022
1 parent 2bf2810 commit c5ea649
Show file tree
Hide file tree
Showing 38 changed files with 527 additions and 6 deletions.
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/tests/**/*.fa* filter=lfs diff=lfs merge=lfs -text
/tests/**/*.ser filter=lfs diff=lfs merge=lfs -text
/tests/**/*.vcf* filter=lfs diff=lfs merge=lfs -text
15 changes: 12 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,28 @@ jobs:

steps:
- uses: actions/checkout@v2
with:
lfs: true
- name: Check out LFS objects
run: git lfs fetch --all
- name: Set up JDK
uses: actions/setup-java@v2
with:
java-version: ${{ matrix.java }}
distribution: 'adopt'

- name: Build with Maven and submit coverage report
- name: Build JAR with Maven without submitting coverage report
run: >
mvn
--batch-mode
--update-snapshots
clean
test
package
- name: Run end-to-end tests
run: |
cd tests/hg19-chr22
bash run-tests.sh
testing-coverage:
runs-on: ubuntu-latest
Expand All @@ -78,7 +87,7 @@ jobs:
java-version: ${{ matrix.java }}
distribution: 'adopt'

- name: Build with Maven and submit coverage report
- name: Run test with Maven and submit coverage report
run: >
mvn
--define repoToken=${{ secrets.COVERALLS_REPO_TOKEN }}
Expand Down
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## v0.26-SNAPSHOT

- Adding end-to-end tests `hg19-chr22` (#61)

## v0.25

- Fixing unresolved issue with self-test (#51, #56)
Expand Down
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,15 @@ The following will create `varfish-annotator-db-1906.h2.db` and fill it.
```
# mvn com.coveo:fmt-maven-plugin:format -Dverbose=true
```

## Tests

The folder `/tests` contains some data sets that are appropriate for system (aka "end-to-end") tests of the software.

- `hg19-chr22` --
This folder contains examples for annotating GATK HC and Delly2 calls on the first 20MB of chr22.
Only the variants overlapping with `ADA2` and `GAB4` are used.

You can build the data sets with the `build.sh` script that is available in each folder.
This script also serves for documenting the test data's provenance.
The Jannovar software must be available as `jannovar` (e.g., through bioconda) on your `PATH` and you will need `samtools`.
6 changes: 6 additions & 0 deletions tests/hg19-chr22/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.tmp

chr22.*
chr22_part.fa

/data
Git LFS file not shown
Git LFS file not shown
9 changes: 9 additions & 0 deletions tests/hg19-chr22/Case_1_index.delly2.db-info.tsv-expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
genomebuild db_name release
GRCh37 clinvar for-testing
GRCh37 exac r1.0
GRCh37 gnomad_exomes r2.1.1
GRCh37 gnomad_genomes r2.1.1
GRCh37 hgmd_public for-testing
GRCh37 thousand_genomes v3.20101123
GRCh37 varfish-annotator 0.26-SNAPSHOT
GRCh37 varfish-annotator-db for-testing
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
case_id set_id sv_uuid refseq_gene_id refseq_transcript_id refseq_transcript_coding refseq_effect ensembl_gene_id ensembl_transcript_id ensembl_transcript_coding ensembl_effect
1 change: 1 addition & 0 deletions tests/hg19-chr22/Case_1_index.delly2.gts.tsv-expected
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
release chromosome chromosome_no bin chromosome2 chromosome_no2 bin2 pe_orientation start end start_ci_left start_ci_right end_ci_left end_ci_right case_id set_id sv_uuid caller sv_type sv_sub_type info num_hom_alt num_hom_ref num_het num_hemi_alt num_hemi_ref genotype
3 changes: 3 additions & 0 deletions tests/hg19-chr22/Case_1_index.delly2.vcf.gz
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/Case_1_index.delly2.vcf.gz.tbi
Git LFS file not shown
9 changes: 9 additions & 0 deletions tests/hg19-chr22/Case_1_index.gatk_hc.db-info.tsv-expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
genomebuild db_name release
GRCh37 clinvar for-testing
GRCh37 exac r1.0
GRCh37 gnomad_exomes r2.1.1
GRCh37 gnomad_genomes r2.1.1
GRCh37 hgmd_public for-testing
GRCh37 thousand_genomes v3.20101123
GRCh37 varfish-annotator 0.26-SNAPSHOT
GRCh37 varfish-annotator-db for-testing
167 changes: 167 additions & 0 deletions tests/hg19-chr22/Case_1_index.gatk_hc.gts.tsv-expected

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions tests/hg19-chr22/Case_1_index.gatk_hc.vcf.gz
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/Case_1_index.gatk_hc.vcf.gz.tbi
Git LFS file not shown
1 change: 1 addition & 0 deletions tests/hg19-chr22/Clinvar.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
release chromosome start end bin reference alternative clinvar_version set_type variation_type symbols hgnc_ids vcv summary_clinvar_review_status_label summary_clinvar_pathogenicity_label summary_clinvar_pathogenicity summary_clinvar_gold_stars summary_paranoid_review_status_label summary_paranoid_pathogenicity_label summary_paranoid_pathogenicity summary_paranoid_gold_stars details
Binary file added tests/hg19-chr22/Clinvar.tsv.gz
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/hg19-chr22/ExAC.r1.sites.vep.vcf.gz
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/ExAC.r1.sites.vep.vcf.gz.tbi
Git LFS file not shown
Binary file added tests/hg19-chr22/HgmdPublicLocus.tsv.gz
Binary file not shown.
119 changes: 119 additions & 0 deletions tests/hg19-chr22/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/usr/bin/bash

set -euo pipefail
set -x

if [[ ! -e chr22.fa ]]; then
wget -O chr22.fa.gz.tmp https://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/chr22.fa.gz
zcat chr22.fa.gz.tmp >chr22.fa.tmp
mv chr22.fa.tmp chr22.fa
fi

if [[ ! -e chr22_part.fa.fai ]]; then
samtools faidx chr22.fa chr22:1-22,000,000 > chr22_part.fa.tmp
perl -p -i -e 's/^>chr.*/>22/g' chr22_part.fa.tmp
gzip -c chr22_part.fa.tmp >chr22_part.fa.gz
mv chr22_part.fa.tmp chr22_part.fa
samtools faidx chr22_part.fa
fi

# ADA2(hg19): 22:17,660,192-17,680,545
# GAB4(hg19): 22:17,442,827-17,489,112

if [[ ! -e hg19_refseq.ser ]]; then
jannovar -Xmx4096m download -d hg19/ensembl --gene-ids ENSG00000093072 ENSG00000215568 # ADA2 GAB4
cp data/hg19_refseq.ser hg19_refseq.ser.tmp
mv hg19_refseq.ser.tmp hg19_refseq.ser
fi

if [[ ! -e hg19_ensembl.ser ]]; then
jannovar -Xmx4096m download -d hg19/ensembl --gene-ids 51816 128954 # ADA2 GAB4
cp data/hg19_ensembl.ser hg19_ensembl.ser.tmp
mv hg19_ensembl.ser.tmp hg19_ensembl.ser
fi

REGIONS="22:17,660,192-17,680,545 22:17,442,827-17,489,112"
BASEDIR=/fast/groups/cubi/work/projects/2021-07-20_varfish-db-downloader-holtgrewe/varfish-db-downloader

( \
tabix --only-header $BASEDIR/GRCh37/ExAC/r1/download/ExAC.r1.sites.vep.vcf.gz $REGIONS; \
tabix $BASEDIR/GRCh37/ExAC/r1/download/ExAC.r1.sites.vep.vcf.gz $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> ExAC.r1.sites.vep.vcf.gz
tabix -f ExAC.r1.sites.vep.vcf.gz

( \
tabix --only-header $BASEDIR/GRCh37/gnomAD_exomes/r2.1.1/download/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS; \
tabix $BASEDIR/GRCh37/gnomAD_exomes/r2.1.1/download/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz
tabix -f gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz

( \
tabix --only-header $BASEDIR/GRCh37/gnomAD_genomes/r2.1.1/download/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS; \
tabix $BASEDIR/GRCh37/gnomAD_genomes/r2.1.1/download/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz
tabix -f gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz

( \
tabix --only-header $BASEDIR/GRCh37/thousand_genomes/phase3/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz; \
tabix $BASEDIR/GRCh37/thousand_genomes/phase3/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz
tabix -f ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz

(head -n 1 /tmp/Clinvar.tsv; tail -n +2 /tmp/Clinvar.tsv | sort -k2,2V -k3,3n -k4,4n) \
| bgzip -c >/tmp/Clinvar.tsv.gz
tabix -S 1 -b 3 -e 4 -s 2 -f /tmp/Clinvar.tsv.gz
head -n 1 /tmp/Clinvar.tsv >Clinvar.tsv.tmp
tabix /tmp/Clinvar.tsv.gz $REGIONS \
>>Clinvar.tsv.tmp
gzip Clinvar.tsv.tmp
mv Clinvar.tsv.tmp.gz Clinvar.tsv.gz

(head -n 1 /tmp/HgmdPublicLocus.tsv; tail -n +2 /tmp/HgmdPublicLocus.tsv | sort -k2,2V -k3,3n -k4,4n) \
| bgzip -c >/tmp/HgmdPublicLocus.tsv.gz
tabix -S 1 -b 3 -e 4 -s 2 -f /tmp/HgmdPublicLocus.tsv.gz
head -n 1 /tmp/HgmdPublicLocus.tsv >HgmdPublicLocus.tsv.tmp
tabix /tmp/HgmdPublicLocus.tsv.gz $REGIONS \
>>HgmdPublicLocus.tsv.tmp
gzip HgmdPublicLocus.tsv.tmp
mv HgmdPublicLocus.tsv.tmp.gz HgmdPublicLocus.tsv.gz

BASEDIR=/fast/groups/cubi/work/projects/2022-07-06_VarFish_Course_Data/snappy-processing/
VCF=$BASEDIR/variant_calling/output/bwa.gatk_hc.Case_1_index-N1-DNA1-WGS1/out/bwa.gatk_hc.Case_1_index-N1-DNA1-WGS1.vcf.gz

( \
tabix --only-header $VCF $REGIONS; \
tabix $VCF $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> Case_1_index.gatk_hc.vcf.gz
tabix -f Case_1_index.gatk_hc.vcf.gz

VCF=$BASEDIR/wgs_sv_calling/output/bwa.delly2.Case_1_index-N1-DNA1-WGS1/out/bwa.delly2.Case_1_index-N1-DNA1-WGS1.vcf.gz

( \
tabix --only-header $VCF $REGIONS; \
tabix $VCF $REGIONS \
| sort -k1,1V -k2,2n \
| uniq; \
) \
| bgzip -c \
> Case_1_index.delly2.vcf.gz
tabix -f Case_1_index.delly2.vcf.gz
3 changes: 3 additions & 0 deletions tests/hg19-chr22/chr22_part.fa.fai
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/chr22_part.fa.gz
Git LFS file not shown
12 changes: 12 additions & 0 deletions tests/hg19-chr22/db-info.txt-expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Table clinvar_var
TABLE RELEASE CHROM COUNT
clinvar_var GRCh37 22 124
Table hgmd_locus
TABLE RELEASE CHROM COUNT
hgmd_locus GRCh37 22 11
Table gnomad_exome_var
TABLE RELEASE CHROM COUNT
gnomad_exome_var GRCh37 22 1437
Table gnomad_genome_var
TABLE RELEASE CHROM COUNT
gnomad_genome_var GRCh37 22 6072
3 changes: 3 additions & 0 deletions tests/hg19-chr22/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz.tbi
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz
Git LFS file not shown
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/hg19_ensembl.ser
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/hg19-chr22/hg19_refseq.ser
Git LFS file not shown
90 changes: 90 additions & 0 deletions tests/hg19-chr22/run-tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/usr/bin/bash

set -euo pipefail
set -x

JAR=$(ls ../../varfish-annotator-cli/target/varfish-annotator-cli-*.jar | grep -v sources | tail -n 1)

## step 0: help

java -jar $JAR --help >/tmp/the-output
test -s /tmp/the-output

set +e
java -jar $JAR > /tmp/the-output
retcode=$?
set -e
test 1 -eq $retcode
test -s /tmp/the-output

## step 1: init-db

java -jar $JAR init-db \
--release GRCh37 \
--db-release-info varfish-annotator:main \
--db-release-info varfish-annotator-db:for-testing \
--db-path /tmp/out \
\
--ref-path chr22_part.fa \
\
--db-release-info exac:r1.0 \
--exac-path ExAC.r1.sites.vep.vcf.gz \
\
--db-release-info thousand_genomes:v3.20101123 \
--thousand-genomes-path ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz \
\
--db-release-info clinvar:for-testing \
--clinvar-path Clinvar.tsv.gz \
\
--db-release-info gnomad_exomes:r2.1.1 \
--gnomad-exomes-path gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz \
\
--db-release-info gnomad_genomes:r2.1.1 \
--gnomad-genomes-path gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz \
\
--db-release-info hgmd_public:for-testing \
--hgmd-public HgmdPublicLocus.tsv.gz

## step 2: db-info

java -jar $JAR db-stats --db-path /tmp/out.h2.db --parseable \
> /tmp/db-info.txt
diff /tmp/db-info.txt db-info.txt-expected

## step 3: annotate

java -jar $JAR annotate \
--release GRCh37 \
--input-vcf Case_1_index.gatk_hc.vcf.gz \
--output-gts /tmp/Case_1_index.gatk_hc.gts.tsv \
--output-db-info /tmp/Case_1_index.gatk_hc.db-info.tsv \
--ref-path chr22_part.fa \
--refseq-ser-path hg19_refseq.ser \
--ensembl-ser-path hg19_ensembl.ser \
--db-path /tmp/out.h2.db \
--self-test-chr22-only

diff /tmp/Case_1_index.gatk_hc.gts.tsv Case_1_index.gatk_hc.gts.tsv-expected
diff /tmp/Case_1_index.gatk_hc.db-info.tsv Case_1_index.gatk_hc.db-info.tsv-expected

## step 4: annotate-svs

java -jar $JAR annotate-svs \
--release GRCh37 \
--input-vcf Case_1_index.delly2.vcf.gz \
--output-gts /tmp/Case_1_index.delly2.gts.tsv \
--output-feature-effects /tmp/Case_1_index.delly2.feature-effects.tsv \
--output-db-info /tmp/Case_1_index.delly2.db-info.tsv \
--refseq-ser-path hg19_refseq.ser \
--ensembl-ser-path hg19_ensembl.ser \
--db-path /tmp/out.h2.db \
--self-test-chr22-only

diff /tmp/Case_1_index.delly2.gts.tsv Case_1_index.delly2.gts.tsv-expected
diff /tmp/Case_1_index.delly2.db-info.tsv Case_1_index.delly2.db-info.tsv-expected
diff /tmp/Case_1_index.delly2.feature-effects.tsv Case_1_index.delly2.feature-effects.tsv-expected

## if we reach here, everything is fine

echo "-- ALL TESTS PASSED --"
exit 0
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ public static void main(String[] args) {
if ((args == null || args.length == 0)) {
jc.usage();
System.exit(1);
} else if (args.length == 1 && "--help".equals(args[0])) {
jc.usage();
System.exit(0);
}

try {
Expand Down

0 comments on commit c5ea649

Please sign in to comment.