# Use Nextflow to run workflows using the Cloud Google Batch Part II
Here we are going to build on Part I to download some real data using the SRA toolkit and then submit an nf-core Methyseq job to Google Batch.

## 1. Optional: Setup the environment
If you did not do part 1, then set up your environment. Otherwise, skip to the next section.

### Create a bucket

In [None]:
#make sure you change this name, it needs to be globally unique
%env BUCKET=gbatch-api-nextflow

In [None]:
#will only create the bucket if it doesn't yet exist
! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET

In [None]:
#set versioning on the bucket so it can overwrite old files
! gsutil versioning set on gs://$BUCKET

### Install mambaforge
You can also use the default installed conda, but mamba is so much faster! 

In [None]:
! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

In [None]:
#add to your path
import os
os.environ["PATH"] += os.pathsep + os.environ["HOME"]+"/mambaforge/bin"

### Install other dependencies 

In [None]:
#First install java
!sudo apt update
!sudo apt-get install default-jdk -y
!java -version

In [None]:
#Specify nexflow version and platfrom
! export NXF_VER=21.10.0
! export NXF_MODE=google
#Install nexflow, make it exceutable, and update it
! curl https://get.nextflow.io | bash
! chmod +x nextflow
! ./nextflow self-update

In [None]:
#Install SRAtools to download data
! mamba install -c bioconda -c conda-forge sra-tools==2.11.0 -y

## 2. Download data with SRA tools
If you want more work with SRA tools, check out our [SRA-focused notebook](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb).

In [None]:
#set up directory structure
!mkdir -p data data/fasterqdump

First bring in the compressed .sra file

In [None]:
%%time
! prefetch -O data/raw_fastq -f yes SRR067701 --location GCP -v 

Now convert the compressed .sra file to fastq. It will take about two minutes, so be patient. 

In [None]:
%%time
! fasterq-dump -f -e 8 -m 24G SRR067701.sra

In [None]:
#compress the fastq files
! gzip data/raw_fastq/SRR067701.fastq

## 3. Run methylseq with Google Batch

Ensure you include the following in your command:
- nf-core tool version [-r]
- Add fastq.gz file input [--input]
- Reference Genome [--genome] (no need to have it on hand nf-core uses iGenomes and will pull in the correct reference file)
- Confile file location [-c]
- Wanted profile [-profile]
- Other flags such as:
    - If the fastq file is single-ended or not
    - The max cpus and memory wanted

You can recycle the nextflow.config from Part I. Since our fastq file is pretty big, it may take some time to finish.

In [None]:
!./nextflow run nf-core/methylseq -r 1.6.1 \
    --input 'data/raw_fastq/SRR067701.fastq.gz' \
    --genome GRCh38 \
    --single_end \
    -c nextflow-methyseq.config \
    -profile gbatch \
    --max_cpus 32 \
    --max_memory '110.GB'

#### Check to see if files are in your output directory bucket
If you skipped part one, go run the first cell where you assign your bucket name to a variable. 

In [None]:
!gsutil ls gs://$BUCKET/methyl-seq/outdir

__Optional__: View your MultiQC HTML file

In [None]:
!gsutil cp -r gs://$BUCKET/methyl-seq/outdir/MultiQC/multiqc_report.html .

In [None]:
from IPython.display import IFrame

IFrame(src='multiqc_report.html', width=900, height=600)

## 4. Clean Up

If you want to clean up all resources associated with this tutorial then 
+ delete your bucket with `gsutil rm -r $BUCKET`
+ delete this VM in either Vertex AI or Compute Engine