
# <span style="color:#006E7F">__Introduction to Oxford Nanopore Data Analysis__ <a class="anchor"></span>  


Created by J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) and G. Sarah (AGAP-INRAE) - Septembre 2021 Formation SouthGreen

Adapted by J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) - Novembre 2022

# <span style="color:#006E7F">__TP4 - VARIANTS DETECTION__ <a class="anchor" id="data"></span>  
    
# <span style="color: #4CACBC;"> Structural variation with Sniffles</span>  

Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore).

It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
    
Check the sniffles website https://github.com/fritzsedlazeck/Sniffles/ an its wiki for more details.

## Prepare data

In [3]:
# download  all clones fastq.gz
cd ~/work/DATA
# download your compressed CloneX 
wget --no-check-certificat -rm -nH --cut-dirs=1 --reject="index.html*" https://itrop.ird.fr/ont-training/all_clones_short.tar.gz

--2022-11-10 21:38:45--  https://itrop.ird.fr/ont-training/all_clones_short.tar.gz
Resolving itrop.ird.fr (itrop.ird.fr)... 91.203.35.184
Connecting to itrop.ird.fr (itrop.ird.fr)|91.203.35.184|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 842363643 (803M) [application/x-gzip]
Saving to: ‘all_clones_short.tar.gz’


2022-11-10 21:39:02 (45.3 MB/s) - ‘all_clones_short.tar.gz’ saved [842363643/842363643]

FINISHED --2022-11-10 21:39:02--
Total wall clock time: 18s
Downloaded: 1 files, 803M in 18s (45.3 MB/s)


In [4]:
#decompress it
cd ~/work/DATA
tar zxvf all_clones_short.tar.gz

all_clones_short/
all_clones_short/Clone2.fastq.gz
all_clones_short/Clone6.fastq.gz
all_clones_short/Clone10.fastq.gz
all_clones_short/Clone15.fastq.gz
all_clones_short/Clone18.fastq.gz


In [11]:
# create SNIFFLES folder
mkdir -p ~/work/RESULTS/SNIFFLES/
cd  ~/work/RESULTS/SNIFFLES/

# declare your Clone
CLONE="Clone10"

# symbolic links of reference 
ln -s /home/jovyan/work/DATA/${CLONE}/reference.fasta .
REF="reference.fasta"

ln: failed to create symbolic link './reference.fasta': File exists


In [12]:
ls ~/work/DATA/all_clones_short/

[0m[01;32mClone10.fastq.gz[0m  [01;32mClone18.fastq.gz[0m  [01;32mClone6.fastq.gz[0m
[01;32mClone15.fastq.gz[0m  [01;32mClone2.fastq.gz[0m


# <span style="color: #4CACBC;">1. Mapping and SV detection for all CLONES</span>  



### Obtain calls for each samples

Call SV candidates and create an associated .snf file for each sample:

`sniffles --input sample1.bam --snf sample1.snf`


In [46]:
for i in {2,6,10,15,18}
    do
      cd  ~/work/RESULTS/SNIFFLES/
      echo "\n\n============ Clone$i==============\n";
      CLONE="Clone${i}" # this is the first parametter of this fonction
      REF="reference.fasta"
      ONT="/home/jovyan/work/DATA/all_clones_short/${CLONE}.fastq.gz"
      ## Mapping using minimap2 : Mapping ONT reads (clone) vs a reference using minimap2 
      minimap2 -t 4 -ax map-ont --MD  -R '@RG\tID:${CLONE}\tSM:${CLONE}' ${REF} ${ONT} > ${CLONE}.bam
      ## Sort BAM
      samtools sort -@4 -o ${CLONE}_SORTED.bam ${CLONE}.bam
      #index bam
      samtools index -@4 ${CLONE}_SORTED.bam
      # Obtain calls for a samples
      sniffles -t 4 -i ${CLONE}_SORTED.bam --snf ${CLONE}.snf --allow-overwrite   > ${CLONE}_SV.log
    done

# -s/--min_support	Minimum number of reads that support a SV to be reported. Default: 10
# -l/--min_length	Minimum length of SV to be reported. Default: 30bp
# -q/--minmapping_qual	Minimum mapping quality of alignment to be taken into account. Default: 20
# -r/--min_seq_size	Discard read if non of its segment is larger then this. Default: 2kb

[M::mm_idx_gen::0.036*2.27] collected minimizers
[M::mm_idx_gen::0.045*2.59] sorted minimizers
[M::main::0.045*2.59] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.049*2.46] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.054*2.34] distinct minimizers: 165344 (91.75% are singletons); average occurrences: 1.156; average spacing: 5.336
[M::worker_pipeline::15.786*3.40] mapped 10241 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 4 -ax map-ont --MD -R @RG\tID:${CLONE}\tSM:${CLONE} reference.fasta /home/jovyan/work/DATA/all_clones_short/Clone2.fastq.gz
[M::main] Real time: 15.793 sec; CPU: 53.678 sec; Peak RSS: 0.410 GB
[bam_sort_core] merging from 0 files and 4 in-memory blocks...
[M::mm_idx_gen::0.028*2.35] collected minimizers
[M::mm_idx_gen::0.046*2.93] sorted minimizers
[M::main::0.046*2.93] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.049*2.80] mid_occ = 10
[M::mm_idx_st

### Count the number of variations, 

How much SV were found for each Clone ? 

check log files !

### Create a variable containing the snf files names

In [48]:
SNFS=""
for i in {2,6,10,15,18}; do SNFS="$SNFS Clone${i}.snf"; done
echo $SNFS

Clone2.snf Clone6.snf Clone10.snf Clone15.snf Clone18.snf


# <span style="color: #4CACBC;"> 2. Merge all the vcf files across all samples</span>  

Combined calling using multiple .snf files into a single .vcf: 

`sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf`

In [50]:
sniffles --input $SNFS --vcf multisample.vcf --allow-overwrite

Running Sniffles2, build 2.0.7
  Run Mode: combine
  Start on: 2022/11/10 22:37:09
  Working dir: /home/jovyan/work/RESULTS/SNIFFLES
  Used command: /opt/conda/bin/sniffles --input Clone2.snf Clone6.snf Clone10.snf Clone15.snf Clone18.snf --vcf multisample.vcf --allow-overwrite
Opening for writing: multisample.vcf (multi-sample, sorted)
Verified headers for 5 .snf files.
The following samples will be processed in multi-calling:
    Clone2.snf (sample ID in output VCF='Clone2')
    Clone6.snf (sample ID in output VCF='Clone6')
    Clone10.snf (sample ID in output VCF='Clone10')
    Clone15.snf (sample ID in output VCF='Clone15')
    Clone18.snf (sample ID in output VCF='Clone18')

Calling SVs across 5 samples (695 candidates total)...

(Estimating progress); 1/1 tasks done; parallel 0/4; 237 SVs. 
Took 0.08s.

Done.
Wrote 237 called SVs to multisample.vcf (multi-sample, sorted)


# Have a look on the VCF file

In [51]:
head -n 100 multisample.vcf | tail -n 5

Reference	189602	Sniffles2.INS.2EM0	N	AAGAAAATGAGAAAGGGCAGGATAGGGCACTGCTTGTAATCTTTACAGATGTTTTGTTAAACAAACATTATTGGTTCAGTAATGATGGTTCATTGTTGTATGATCATTGTACAAATTGCCTCTGATGCCTGGGCAAATTGACTTGATGGGATTTTGTGCTGTCAGTTTGTCAACTCTGCTGTTTATGGTCCTCCTAGTTTTGGTGCCATACCCGTTTATGATTGAGTACTTACTACTACTAAGTGACTGTATCAATGCAGGTAATAACATTGTTTCGATCAGATGTTTATCTTGACAAATAACTTTAATTAACAGATGATGTTGCCTGCCAAGTTTCTCGATATCTCTAATTCAACAGTAAGTAATTCGTTCCTGGCAAATATTCACTCAAATATCTTTTAACTAATTGGCTTATTGAACATAGTCAATTTTAAATTTTAAATTTCATTTTAGAGTTGATTTTATGATTTATTTTATCCTTTTTCTATCTTATCTTTAAAAACTACTAATAATACAAACTATAAATTTTAATCATATACTACTTCCTTTTAATAGATGACAGTGTTCACCTTTTGTCACACAGTTTGATAGTTTGCTTAATAAGTTTACGTAATTATAATTTATTTTTTATGAGTTGTTTTTATCACTCAAAAGTATTTTATGTATGACTTATTAATCTTATACATTTATATAAAATTTTAAATAAAACGCATTATTAAAAATGTGTTCAAAATTAACGGCTTCATCCGCTTAAAACGAGGGAGTATTACTTTAGTCGCTTGATTTATTCCTCTACGACTCATCAGATC	60	PASS	PRECISE;SVTYPE=INS;SVLEN=809;END=189602;SUPPORT=121;COVERAGE=140,121,121,121,137;STRAND=+;AC=2;STDEV_LEN=0;STDEV_POS=0;SUPP_VEC=00001	GT:GQ:DR:D