# Basecaller

This script performs basecalling to convert fast5 files to fastq files.

# Linode GPU Setup

- Create a GPU VM with Ubuntu 18.04
- Check GPU is available
``` 
ssh root@69.164.211.148
lspci -vnn | grep NVIDIA
```
- Harden the server
- Install NVIDIA driver dependencies
```
sudo apt-get install build-essential
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run
```
- Select only the NVIDIA driver out of the 5 installation options
- Check driver installation successful
```
nvidia-smi
```

# Data, Github repository and Docker image

### Clone Bioliquid Nanopore repository
```
cd ~
git clone https://github.com/aretiandev/bioliquid-nanopore.git
```

### Get data
Data should be inside a fast5 folder in the data folder.
```
mkdir ~/data
cd ~/data
s3cmd get -r [data_filepath]
```

### Docker
- Install Docker
- Pull image
``` 
docker pull yufernando/bioaretian:guppy-gpu
```
- Run image
```
cd ~
docker run -d -v "$PWD":/home/jovyan/work -e GRANT_SUDO=yes --user root --name bioaretian yufernando/bioaretian:guppy-gpu
```
- Step into container
```
docker exec -it bioaretian /bin/bash
```


# Run Guppy

To run the guppy basecaller for a GPU instance, execute the `basecaller.sh` script. The full set of options is:
- 'gpu': guppy GPU with high accuracy algorithm
- 'fast': guppy CPU with fast algorithm
- No options: guppy CPU with high accuracy algorithm

In [None]:
# !./basecaller.sh gpu

# Concatenate

Concatenate all reads into single fastq file.

In [9]:
# !cat ~/work/basecall-fast-0531/pass/*fastq > ~/work/basecall-fast-0531/basecall-0531.fastq

# Notes

Notes from Marc Tormo Puiggros. 

In [None]:
### basecalling para cada run: guppy/4.5.2-cpu
guppy_basecaller  --input_path $path --save_path BaseCall --flowcell FLO-MIN111 --kit SQK-LSK110 --min_qscore 7 -r --fast5_out --records_per_fastq 0 --cpu_threads_per_caller 8 --num_callers 4

### juntar los fastq de los 3 runs:
cat /scratch/lab_genomica/mtormo/202103*/BaseCall/pass/*fastq > bioliquid_3runs.fastq

###  mapping (lo he probado con dos versiones obteniendo los mismos resultados, 2.11 y 2.18) minimap2/2.11-foss-2016b y versión GRCh38 de Ensembl
minimap2 -x map-ont -t 32 -a /homes/users/mtormo/lab_genomica/Genomes/hsapiens_hg38-GRCh38_ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.mmi bioliquid_3runs.fastq > bioliquid_3runs.sam

### Convertir a bam y ordenar: SAMtools/1.6-foss-2016b
samtools view -bSh bioliquid_3runs.sam > bioliquid_3runs.bam
samtools sort -@ 32 bioliquid_3runs.bam > bioliquid_3runs.sort.bam
samtools index bioliquid_3runs.sort.bam
samtools flagstat bioliquid_3runs.sort.bam > bioliquid_3runs.sort.bam.flag

# High Accuracy Model: you should use it if you want to increase quality. E.g. when you are interested SNPs
