# Use Nextflow to run workflows using the Cloud Life Sciences API Part I

## Overview
Here we are going to walk through submitting simple jobs directly to the Life Sciences API, then dive into interacting with the API using Nextflow. We will run some basic Hello World jobs, then move to a more complex [nf-core Methylseq workflow](https://nf-co.re/methylseq). 

<div class="alert alert-block alert-danger"> <b>Warning:</b> Google Life Sciences API is depreciated and will no longer be avaible by July 8, 2025 on the platform. Please switch to the <a href="../../GoogleBatch/nextflow/Part1_GBatch_Nextflow.ipynb">Google Batch Nextflow tutorials</a>. </div>

## Learning Objectives
+ Learn to use Nextflow on Google Cloud
+ Learn to submit Nextflow jobs to Google Life Sciences API

## Prerequisites
Make sure that Cloud Life Sciences, Compute Engine, and Cloud Storage APIs are all enabled.

You also want to make sure your Compute Engine Default Service Account has the following Roles:

    - lifesciences.workflowsRunner
    - iam.serviceAccountUser
    - serviceusage.serviceUsageConsumer
    - storage.objectAdmin
Your Service Account should already have these roles assigned, but if not, reach out to Support to have your account updated.


## Get Started

### Install packages and setup your environment

#### Create a bucket

In [None]:
# make sure you change this name, it needs to be globally unique
%env BUCKET=gls-api-nextflow

In [None]:
# will only create the bucket if it doesn't yet exist
! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET

In [None]:
# set versioning on the bucket so it can overwrite old files
! gsutil versioning set on gs://$BUCKET

#### Install dependencies

In [None]:
#First install java
!sudo apt update
!sudo apt-get install default-jdk -y
!java -version

In [None]:
#Specify nexflow version and platfrom
! export NXF_VER=21.10.0
! export NXF_MODE=google
#Install nexflow, make it exceutable, and update it
! curl https://get.nextflow.io | bash
! chmod +x nextflow
! ./nextflow self-update

## Submit Hello World to the API

In [None]:
! gcloud beta lifesciences pipelines run \
    --location us-central1 \
    --regions us-east1 \
    --logging gs://$BUCKET/hello_world.log \
    --command-line 'echo "hello world!"'

### Check job status
To check the job status enter operation ID from the gcloud output

Running [projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID]
The output is kind of hard to parse, but it starts at the bottom, the top is the most recent action. If you have an error, it should be towards the top. Even for this simple job, it may take a few minutes to finish all operations, so keep checking until it says `done: true` at the top. 

In [None]:
# set your operation ID here
%env ID=10485099716669037373

In [None]:
! gcloud beta lifesciences operations describe $ID

### View your output

In [None]:
! gsutil ls gs://$BUCKET/

In [None]:
! gsutil cp gs://$BUCKET/hello_world.log .
! cat hello_world.log

## Run Nextflow Locally

### Nextflow 101

Nextflow interacts with many different files to have a proper working workflow:

- __Main file__: The main file is a .nf file that holds the processes and channels describing the input, output, a shell script of your commands, workflow which acts like a recipe book for nextflow, and/or conditions. For snakemake users this is equivalent to 'rules'.
    - __Process__: Contains channels and scripts that can be executed in a Linux server like bash commands.
    - __Channel__: Produces ways through which processes communicate to each other for example input and output are channels of value that point the process to where data is or should be located.
- __Config file__: The .config file contains parameters, and multiple profiles. Each profile can contain a different executor type (e.g. LS API, conda, docker, etc.), memory or machine type, output directory, working directory and more!
- __Docker file__: Contains dependencies and enviroments that is needed for the nextflow workflow to run.
- __Schema file__: Schmema files are optional and are structured json files that contain information about the usage and commands that your workflow will excecute.You might have seen this when you run a command along with the flag '--help'.
    

### Run a nextflow 'Hello World' process locally

We are going to first run Hello World locally using the config file called hello.nf. 

It should look like this:

```
#!/usr/bin/env nextflow
nextflow.enable.dsl=2 

params.str = 'Hello World'

process sayHello {
  input:
  val str

  output:
  stdout

  """
  echo $str > hello.txt
  cat hello.txt
  """
}
workflow {
  sayHello(params.str) | view
}
```

In [None]:
! ./nextflow run hello.nf --str 'Hello!'

## Submit Nextflow Job to the Life Sciences API
Create and modify your own config file to include a 'gls' profile block to tell Nextflow to submit the job to the API instead of running locally

The config file allows nextflow to utilize excecuters like Life Science API. In this tutorial the config files is named __'nextflow.config'__. Make sure you open this file and update the `<VARIABLES>` that are account specific.
- Make sure that your region is a region included in the LS API!
- Specify your working directory bucket and output directory bucket
- Specify the machine type you would like to use, ensuring that there is enough memory and cpus for the workflow
    - Otherwise LS API will automatically use 1 CPU

```
profiles{
  gls{
      process.executor = 'google-lifesciences'
      workDir = 'gs://<BUCKET>/methyl-seq'
      google.location = 'us-central1'
      google.region  = 'us-central1'
      google.project = '<YOUR_PROJECT>'
      params.outdir = 'gs://<BUCKET>methyl-seq/outdir'
      process.machineType = 'c2-standard-30'
     }
}
```

__Note:__ Make sure your working directory and output directory are different! Life Sciences creates temporary file in the working directory within your bucket that do take up space so once your pipeline has completed succesfully feel free to delete the temporary files.

### Optional: Listing nf-core tools with docker and viewing their commands
Using the command below you can see all the tools that nfcore holds and their versions/lastes releases

In [None]:
! docker run nfcore/tools list

You can view commands for methylseq (or any other specified nf-core tool) by using the [--help] flag

In [None]:
! ./nextflow run nf-core/methylseq -r 1.6.1 --help

### Run Methylseq with the test profile

The 'test' profile uses a small dataset allowing you to ensure the workflow works with your config file without long runtimes. Ensure you include:
- Version of the nf-core tool [-r]
- Location of the config file [-c]

In [None]:
! ./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow-methyseq.config

You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.

## Conclusion
Congrats! You are done with Part I. If you want to keep going and learn how to use the Methylseq workflow with real data, then move to Part II. If not, then feel free to clean up your resources. 

## Clean up
If you want to clean up all resources associated with this tutorial then 
+ delete your bucket with `gsutil rm -r $BUCKET`
+ delete this VM in either Vertex AI or Compute Engine