---
title: "Preparation"
editor: visual
jupyter: python3
---


# Before you start the pipline

Before running either pipeline (manual or automatic), create a new run folder (e.g. Viro_Run_0001). This will be the folder where you will have all your files from a particular run.

For example:

``` bash
cd /mnt/viro0002-data/sequencedata/processed/Diagnostics_metagenomics/ 
mkdir Viro_Run_0001
```

::: callout-tip
## Tip

**"cd"** means change directory (or path) and **"mkdir"** means to create a new directory (or folder)
:::

Before you perform any analysis, you need to be in the right **conda environment**. This means you need to be in the right container which includes all necessary software dependencies and helper scripts. We will use the environment "nanopore_diagnostics" for most of our analysis (except for assembly and nextstrain). To activate this environment, please copy the following into your terminal.

``` bash
conda activate nanopore_diagnostics
```

::: callout-important
## Important

For the automation pipeline, now go to **Metagenomics Automation**
:::

## Manual Pipeline


Copy all barcodes of interest from the raw nanopore folder, along with the report file into this newly created folder (e.g. Viro_Run_0001). Each barcode will contain all the fastq.gz files generated by your run.

For example:

``` bash
cp -r /mnt/viro0002-nanopore/GRIDIon_Viro_Run_0001/barcode01 /mnt/viro0002-data/sequencedata/processed/Diagnostics_metagenomics/Viro_Run_0001/
```

::: callout-tip
## Tip

**"cp"** means copy.
:::

![Folder example](Preparation2.jpg)

If you are following the manual pipeline, the next steps are still required for each barcode. To keep track of your analysis, you can open notepad++ and create a command log for each barcode per run. This should be saved within the Diagnostics_metagenomics folder.

::: callout-warning
## Warning

Remember to always update the path for the specific run number or barcode of interest
:::

First, make sure you are in the right path.

``` bash
cd /mnt/viro0002-data/sequencedata/processed/Diagnostics_metagenomics/Viro_Run_0001/barcode01/
ls
```
::: callout-tip
## Tip

**"ls"** means list all the files in the current folder.
:::

Next, the fastq reads need to be concatenated (or combined) into one file to streamline analysis.

``` bash
zcat *.fastq.gz > all_reads.fastq
```

::: callout-tip
## Tip

**"zcat"** is used to unzip and concatenate fastq.gz files. You can use the \* to include all files with the same extension. If you want to concatenate fasta or fastq files, you can use **cat** instead.
:::

To organize your workspace, you can move all fastq.gz files into a new raw folder within the current directory. This folder can be deleted after your analysis is complete, as the original fastq.gz files will remain stored in the original nanopore raw directory (see figure above).

``` bash
mkdir raw
mv *.fastq.gz raw
```

::: callout-tip
## Tip

**"mv"** means move.
:::

Create further directories for the next few chapters:

``` bash
mkdir Kraken2_standard
```

::: callout-tip
## Tip

Type **pwd** if you ever want to check what folder you are currently in. **"pwd"** means print working directory.
:::

::: callout-warning
## Warning

Throughout this pipeline, the output file names are typically not changed. This way you don't need to keep changing for each path. Please remember if you are in the wrong path, you will replace all your files with the new ones!
:::