# JM-lab virome pipeline: tutorial
This jupyter notebook gives an overview of the commands needed for the primary analysis of raw NGS data with explanations. This is intended as a learning tool for new PhD-students, master students, interns, etc. 

**Submitting jobs to the HPC, see _JM lab HPC notebook_**

```bash
ls | cut -f1 -d '_'| sort -u > names.txt
while read line;do cat ${line}_*_*_R1_*.fastq.gz > $line.R1.fastq.gz;done < names.txt
while read line;do cat ${line}_*_*_R2_*.fastq.gz > $line.R2.fastq.gz;done < names.txt
rm *L00*
``` 

### Logging in to teaching server
For this this tutorial we can work on the teaching server of gbiomed (bmw.gbiomed.kuleuven.be). Everyone with a u- or r-number can connect to this server by ssh-ing to *__'your_r/u-number'@bmw.gbiomed.kuleuven.be__* and giving your intranet password.

Perform following actions in the terminal.

In [None]:
ssh 'your_r/u-number'@bmw.gbiomed.kuleuven.be

Next you should give your password connected to your KU Leuven account.

## Installing all necessary software

### Miniconda
Miniconda is a package manager from which you can install a lot of (bioinformatics) software. link
1. Create in your datafolder a new directory and move into that directory:

In [None]:
cd ~/data
mkdir software
cd software
pwd

2. Download the Miniconda installer with `wget`. Next, run the installation script (`-b` makes the installation run silent and `-p` provides the path where to install Miniconda). When Miniconda is installed, activate the tool by sourcing the initialization script, this simply sets a couple of shell environment variables, and conda command as a shell function. More information: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html

In [None]:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p /home/luna.kuleuven.be/'your_u-number'/data/software/miniconda
source /home/luna.kuleuven.be/'your_u-number'/data/software/miniconda/bin/activate
conda init

3. When installing new software with conda the best practice is to create a new conda environment for each part of a project your working on, for example:

In this tutorial we will run the virome pipeline, so we will create a conda environment with all software we need to run the pipeline, installed in this environment. Then we need to activate this environment to make the software available for use.


In [None]:
conda create -y --name virome_pipeline python
conda activate virome_pipeline
conda install -y -c bioconda krona samtools bwa-mem2 bowtie2 spades trimmomatic bedtools

### From another source
Next to Anaconda/Miniconda their are a lot of other possibilities to install software (`pip`, building from source, installing binaries, cloning from github, etc.)

As the newest version of Diamond is not available through conda, we can install it from github by following the instructions (link).

In [None]:
wget diamond link
tar -xvzf diamond tarball

Now we still need to put the diamond executable in our `$PATH` so we can call it on the command line from everywhere in the terminal. This can be done by making a `bin` subdirectory in our `software` directory, next creating a symlink from the diamond executable to this bin directory and finally export `bin` to our `$PATH` by adding it to the `.bashrc` file.

In [None]:
cd ~/data/software/
mkdir bin
cd bin/
ln -s <path_to_diamond_executable> .

Next, open the `.bashrc` file with `nano` and add following line to the bottom of the file:
```bash
export PATH='$PATH:~/data/software/bin'
```
When you `source` your `.bashrc` file, you should now be able to call diamond from everywhere.

In [None]:
source ~/.bashrc