<img align="right" src="https://rega.kuleuven.be/cev/viralmetagenomics/pictures/lovm/image_preview" height="10%" width="10%" />

# JM-lab virome pipeline: tutorial
This jupyter notebook gives an overview of the commands needed for the primary analysis of raw NGS data, with explanations. This is intended as a learning tool for new PhD-students, master students, interns, etc. Basic command-line knowledge is required to be able to complete this tutorial, [this video](https://www.youtube.com/watch?v=oxuRxtrO2Ag) or [this one](https://www.youtube.com/watch?v=SkB-eRCzWIU) should suffice.

This tutorial can be followed by running the commands directly from the terminal or within this jupyter notebook. To run a jupyter notebook on the teaching server, copy it to the `~/data/jupyternotebooks/` folder and open an internet browser to navigate to [bmw.gbiomed.kuleuven.be](bmw.gbiomed.kuleuven.be), log in with your KU Leuven credentials and then you should be able to run the notebook. You can find more information on how to transfer files to a remote server [below](#1.2-Installing-an-SFTP-client).

#### Overview:

* [Part 1: Connecting](#Part-1)
* [Part 2: Pipeline](#Part-2:-Start-pipeline)

---
## **Part 1: Connecting**
### 1.1 Logging into the teaching server
To be able to perform analysis on a Linux machine or a server, a connection needs to be made through the shell of the operating system. Through this shell we can use the command-line interface to execute tasks.

For this tutorial we can work on the teaching server of gbiomed (bmw.gbiomed.kuleuven.be). Everyone with a u- or r-number from KU Leuven can connect to this remote server by using `ssh` ([more info](https://searchsecurity.techtarget.com/definition/Secure-Shell)) with your KU Leuven credentials.

MacOS and Linux users can already proceed to the next step as these operating systems already have a terminal and `ssh` natively installed. 

Windows users on the other hand, will have to install an SSH client ([PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) or [Bitvise](https://www.bitvise.com/ssh-client-download)) or a Linux (Ubuntu) subsytem for Windows (the latter is preferred because it is already a part of Windows and subsequently the next steps will be the same as for Linux or Mac users). Instructions on how to install Ubuntu on Windows can be found [here](https://ubuntu.com/tutorials/ubuntu-on-windows#1-overview), be aware that this requires at least a x86 PC running Windows 10 (Fall Creators update Oct 2017).

**Perform following actions in a terminal (command-line interface):**

<span style="color:red">Replace uXXXXXXX with your u- or r-number. </span>
```bash
ssh u/rXXXXXXX@bmw.gbiomed.kuleuven.be
```

Next, you should give your password connected to your KU Leuven account and you're in!

<b><span style="color:red"> Important:</span> Always avoid working in your `home` directory as this has limited storage and will cause trouble when it's full!</b> Therefore, on the teaching server, work in the `data` directory, which is a symbolic link to `/mnt/storage/uXXXXXXX`. Here, you will have enough space to store raw data, results and databases.

### 1.2 Installing an SFTP client

In order to transfer files from your local computer to a remote server you need an SFTP (SSH or Secure File Transfer Protocol) client, a commonly used SFTP client is [FileZilla](https://filezilla-project.org/). You can download and install it through the FileZilla project website.

*Note: Bitvise is an SSH and SFTP client in one, so if you're using Bitvise there is no need to install FileZilla.*

Once you installed and opened FileZilla, it should look like this:

<img src="images/Filezilla_start.png" height="60%" width="60%" style="left" />

On top of the program you can fill out the host (bmw.gbiomed.kuleuven.be), your u/r-number and password connected to KU Leuven and set the port to 22 (see image above). 

Once you click connect you should be able to transfer files from your computer to the remote server simply by dragging them one or the other way.

<img src="images/Filezilla_transfer.png" height="60%" width="60%" style="left" />

<span style="color:red"> Important:</span> FileZilla will connect to your `home` directory by default, but do not copy files here! Instead, copy them to the `data` directory.

### 1.3 Installing all necessary software
#### 1.3.1 Miniconda
(Mini)conda is a package manager from which you can install a lot of (bioinformatics) software. More info on conda can be found [here](https://docs.conda.io/projects/conda/en/latest/).

**Perform following steps to install all software we will need along the pipeline:**
1. Create in your datafolder a new `software` directory and move into that directory:

In [1]:
cd ~/data
mkdir software
cd software
pwd

bash: cd: /Users/lander/data: No such file or directory
/Users/lander/Documenten/Doctoraat/Metagenomic tutorial/software


2. Download the Miniconda installer with `wget`. Next, run the installation script (`-b` makes the installation run silent and `-p` provides the path where to install Miniconda). When Miniconda is installed, activate conda by sourcing the initialization script, this simply sets a couple of shell environment variables, and `conda` command as a shell function. More information in the [installer guidelines](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).

In [2]:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/data/software/miniconda
source $HOME/data/software/miniconda/bin/activate
conda init
source ~/.bashrc

bash: wget: command not found
bash: Miniconda3-latest-Linux-x86_64.sh: No such file or directory
bash: /Users/lander/data/software/miniconda/bin/activate: No such file or directory
no change     /Users/lander/.miniconda/miniconda3/condabin/conda
no change     /Users/lander/.miniconda/miniconda3/bin/conda
no change     /Users/lander/.miniconda/miniconda3/bin/conda-env
no change     /Users/lander/.miniconda/miniconda3/bin/activate
no change     /Users/lander/.miniconda/miniconda3/bin/deactivate
no change     /Users/lander/.miniconda/miniconda3/etc/profile.d/conda.sh
no change     /Users/lander/.miniconda/miniconda3/etc/fish/conf.d/conda.fish
no change     /Users/lander/.miniconda/miniconda3/shell/condabin/Conda.psm1
modified      /Users/lander/.miniconda/miniconda3/shell/condabin/conda-hook.ps1
no change     /Users/lander/.miniconda/miniconda3/lib/python3.8/site-packages/xontrib/conda.xsh
no change     /Users/lander/.miniconda/miniconda3/etc/profile.d/conda.csh
no change     /Users/lande

: 1

*Notice that* `$HOME` *and* `~/` *both point to your `home` directory.*

3. When installing new software with conda, the best practice is to create a new conda environment for each project you are working on, for example:

In this tutorial we will run the virome pipeline, so we will create a conda environment with all software we need to run the pipeline installed in this environment. Then we need to activate this environment to make the software available for use.


In [None]:
conda create -y --name virome_pipeline python
conda activate virome_pipeline
conda install -y -c bioconda krona samtools bwa-mem2 bowtie2 spades trimmomatic bedtools

##### Downloading taxonomy database for Krona
Krona is installed but we still need to run `ktUpdateTaxonomy.sh` to download the taxonomy database, see message below:

```console
Krona installed.  You still need to manually update the taxonomy databases before Krona can generate taxonomic reports. The update script is ktUpdateTaxonomy.sh. 
The default location for storing taxonomic databases is /home/luna.kuleuven.be/u0140985/data/software/miniconda/envs/virome_pipeline/opt/krona/taxonomy

If you would like the taxonomic data stored elsewhere, simply replace
this directory with a symlink.  For example:
```
```bash
rm -rf /home/luna.kuleuven.be/u0140985/data/software/miniconda/envs/virome_pipeline/opt/krona/taxonomy
mkdir /path/on/big/disk/taxonomy
ln -s /path/on/big/disk/taxonomy /home/luna.kuleuven.be/u0140985/data/software/miniconda/envs/virome_pipeline/opt/krona/taxonomy
ktUpdateTaxonomy.sh
```

In [None]:
ktUpdateTaxonomy.sh

#### 1.3.2 From another source

The bioconda installation will most of the time lag a few versions behind the most current release. If you really want the most recent, then you'll need to manually install the software and its dependencies.

Next to Anaconda/Miniconda their are a lot of other possibilities to install software (`pip`, compiling from source, unpacking binaries, installing from github repository, etc.)

As the latest version of [Diamond](https://github.com/bbuchfink/diamond) (a sequence aligner for protein and translated DNA searches) is not available through `conda`, we can install it from github by following the [installation instructions](https://github.com/bbuchfink/diamond/wiki).

In [None]:
cd ~/data/software/
mkdir diamond
cd diamond
wget http://github.com/bbuchfink/diamond/releases/download/v2.0.6/diamond-linux64.tar.gz
tar -xzf diamond-linux64.tar.gz

Now we still need to put the diamond executable in our `PATH` variable so we can call the `diamond` command from everywhere in the command line. This can be done by making a `bin` subdirectory in `~/data/software/`, followed by creating a symlink from the `diamond` executable to `~/data/software/bin/` and finally export this directory to our `$PATH` by adding it to your `.profile` file.

In [None]:
cd ~/data/software
mkdir bin
cd bin/
ln -s ~/data/software/diamond/diamond .

Next, open the `.profile` file with `nano` (a text editor) and add following line to the bottom of the file:
```bash
PATH="~/data/software/bin:$PATH"
```
More documentation on where and how to set the `PATH` variable in these two topics: 
* https://superuser.com/questions/183870/difference-between-bashrc-and-bash-profile/183980#183980 
* https://unix.stackexchange.com/questions/26047/how-to-correctly-add-a-path-to-path


When you `source` your `.profile` file, you should now be able to call `diamond`.

In [None]:
source ~/.profile

Let's check out the version of diamond you have installed!

In [None]:
diamond version

---
## **Part 2: Start pipeline**

```bash
ls | cut -f1 -d '_'| sort -u > names.txt
while read line; do cat ${line}_*_*_R1_*.fastq.gz > $line.R1.fastq.gz; done < names.txt
while read line; do cat ${line}_*_*_R2_*.fastq.gz > $line.R2.fastq.gz; done < names.txt
rm *L00*```
