This tutorial demonstrates how to run a variant calling workflow using Camber, which simplifies configuring and executing genomics pipelines at scale. Variant calling is a key step in analyzing whole genome and targeted sequencing data, identifying germline and somatic mutations that can be important for understanding genetic diseases, cancer, and other biological questions. In this example, we use the [nf-core/sarek](https://nf-co.re/sarek/3.5.1/) pipeline, a standard workflow for variant detection that supports tumor/normal comparisons and joint variant calling.

The first step is to import the camber package:

In [None]:
import camber


Here's an example of how to configure and execute a job:

* `command`: The full Nextflow command to run the [nf-core/sarek](https://nf-co.re/sarek/3.5.1/parameters/) pipeline.

    * `--input`: `"./samplesheet.csv"`: the relative path of `samplesheet.csv` file to the current notebook. In case of using local fastq files, the locations in `samplesheet.csv` file content are relative.
	
	* `--outdir`: `"./outputs"`: the location stores output data of the job.

	* `--tools`: `"freebayes"`: specifies the tool that will be used to perform variant calling

* `node_size`=`"MICRO"`: indicate [node size](https://docs.cambercloud.com/docs/engines-pricing/#cpu-engine-sizes) to perform the job.

* `num_nodes`=`4`: indicate number of nodes to run workflow tasks in parallel when possible.

In [None]:
command = "nextflow run nf-core/sarek \
    --input ./samplesheet.csv \
    --outdir ./outputs \
    --tools freebayes \
    -r 3.5.1"

In [None]:
nf_sarek_job = camber.nextflow.create_job(
    command=command,
    node_size="SMALL",
    num_nodes=4,
)


This step is to check [job status](https://docs.cambercloud.com/docs/reference/job-attributes/#status):

In [None]:
nf_sarek_job.status


To monitor job exectution, you can show job logs in real-time by `read_logs` method:

In [None]:
nf_sarek_job.read_logs()


When the job is done, you can discover and download the results of the job by two ways:

1. View data directly in notebook environment by visiting the `--outdir` directory in the root of your notebook container:
<p style="text-align:left;"><img src="https://raw.githubusercontent.com/CamberCloud-Inc/demos/refs/heads/main/30-applications/02-genomics/variant-calling/images/notebook_sarek_outputs.png" alt="image" width="33%" /></p>

2. Go to the Stash UI and visit the `--outdir` directory:
<p style="text-align:left;"><img src="https://raw.githubusercontent.com/CamberCloud-Inc/demos/refs/heads/main/30-applications/02-genomics/variant-calling/images/stash_ui_sarek_outputs.png" alt="image" width="100%" /></p>

The resulting VCF files from variant calling are available in the `variant_calling` directory and can be downloaded or further analyzed directly within this notebook.

***Note:** Please note that the files and folders saved in the `demos` directory are temporary and will be reset after each JupyterHub session. We recommend changing the value of `--outdir` to a different location if you wish to store your data permanently.*