RNA sequencing (RNA-seq) is a key technique in modern biology, used to quantify gene expression, detect alternative splicing, and understand transcriptional changes under different conditions—whether in development, disease, or response to treatment. This notebook demonstrates a full RNA-seq analysis workflow powered by [Nextflow](https://www.nextflow.io/) and the [nf-core/rnaseq pipeline](https://nf-co.re/rnaseq/), showcasing how Camber simplifies and scales reproducible cloud-based analysis.

The first step is to import the nextflow package:

In [1]:
from camber import nextflow


Here's an example of how to setup configurations and execute a job:
- `pipeline="nf-core/rnaseq"`: specify pipeline to run.
- `engine_size="MICRO"`: indicate [engine size](https://docs.cambercloud.com/docs/engines-pricing/#cpu-engine-sizes) to perform the job.
- `num_engines=4`: indicate number of engines to run workflow tasks in parallel.

[Pipeline parameters](https://nf-co.re/rnaseq/3.18.0/parameters/) must be defined in `params` argument. To ensure the pipeline works as expected, please take note that:
- `"--input": "./samplesheet.csv"`: the relative path of `samplesheet.csv` file to the current notebook. In case of using local FastQ files, the locations of them in `samplesheet.csv` file content are relative also.
- `"--outdir": "/camber_outputs"`: the location stores output data of the job.

In [None]:
nf_rnaseq_job = nextflow.create_job(
    pipeline="nf-core/rnaseq",
    engine_size="MICRO",
    num_engines=4,
    params={
        "--input": "./samplesheet.csv",
        "--outdir": "/camber_outputs",
        "-r": "3.18.0",
        "--fasta": "stash://public/fastq/rnaseq/ITAG2.3_genomic_Ch6.fasta",
        "--gtf": "stash://public/fastq/rnaseq/ITAG_pre2.3_gene_models_Ch6.gtf",
        "--aligner": "star_rsem",
        "--skip_biotype_qc": "true",
    },
)


This step is to check [job status](https://docs.cambercloud.com/docs/reference/job-attributes/#status):

In [None]:
nf_rnaseq_job.status


View job logs online:

In [None]:
nf_rnaseq_job.read_logs()


When the job is done, you can discover and download the results and logs of the job by two ways:

1. Browser data directly in notebook environment:
<p style="text-align:left;"><img src="https://raw.githubusercontent.com/CamberCloud-Inc/demos/refs/heads/main/30-applications/02-genomics/rnaseq/images/notebook_rnaseq_outputs.png" alt="image" width="50%" /></p>

2. Go to the Stash UI:
<p style="text-align:left;"><img src="https://raw.githubusercontent.com/CamberCloud-Inc/demos/refs/heads/main/30-applications/02-genomics/rnaseq/images/stash_ui_rnaseq_outputs.png" alt="image" width="100%" /></p>

By running this RNA-seq pipeline on Camber, you’ve leveraged a reproducible, cloud-optimized workflow with minimal infrastructure overhead. This approach streamlines large-scale data analysis and sets the stage for scalable genomics research using community standards and modern tools.