# 1. Submit a Job directly to Life Science API

### 1.1 Specify the output directory [--logging], region, location, and the command

Remember to point to a directory in your specified bucket

In [None]:
!gcloud beta lifesciences pipelines run \
    --location us-central1 \
    --regions us-east1 \
    --logging gs://nextflowdemobucket/zy-test/example.log \
    --command-line 'echo "hello world!"'

### 1.2 Check job status

In [None]:
#To check the job status enter operation ID from the gcloud output in command below : 
#Running [projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID]
!gcloud beta lifesciences operations describe 17977600789736315778

### 1.3 View your output

In [None]:
!gsutil cp gs://nextflowdemobucket/zy-test/example.log .
!cat example.log

# 2. Submit a Job to Life Science API through Nextflow

### 2.1 Create a Nexflow Service Account

- Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage APIs by searching each of the GCP products and clicking <img src="/images/service_account_5.jpg" width="50" height="50">
- Click the navigation menu <img src="/images/service_account_6.jpg" width="20" height="20">, go to IAM then click Service Accounts

<img src="/images/service_account_1.jpeg" width="250" height="50">

- Select  <img src="/images/service_account_2.jpeg" width="150" height="30">

- Type in 'nextflow-service-account' as the service account name and press 'Done'
- On the AMI & Admin menu click 'IAM' then click <img src="/images/service_account_3.jpeg" width="20" height="20"> next to the nextflow service account

- Add the following roles and click 'Save':
<img src="/images/service_account_4.jpg" width="400" height="400"> 

__Roles:__

    - lifesciences.workflowsRunner
    - iam.serviceAccountUser
    - serviceusage.serviceUsageConsumer
    - storage.objectAdmin

### 2.2 Add Service Account to Notebook Permissions

When creating a notebook you can edit the permissions to utilze the nextflow service account.
- Using the 'IAM & Admin' menu on the left click 'Service Accounts' (if you aren't there already) locate your nextflow service account and copy the entire email name
- Start to create your notebook and edit the Permissions section by unclicking 'Use Compute Engine default service account' and enter your service account email.

<img src="/images/service_account_7.jpg" width="400" height="400">
- then click 'Create'


<div class="alert alert-block alert-danger">
    <i class="fa fa-exclamation-circle" aria-hidden="true"></i>
    <b>WARNING: </b>  Please <b>do not create a service key</b> if instructed by any tutorial. API keys are generally not considered secure. They are typically accessible to all users making it easy for someone to steal an API key. Once the key is stolen it has no expiration, so it may be used indefinitely unless the project owner revokes or regenerates the key. 
</div>

### 2.3 Nextflow 101

Nextflow interacts with many different files to have a proper working workflow:

- __Main file__: The main file is a .nf file that holds the processes and channels describing the input, output, a shell script of your commands, workflow which acts like a recipe book for nextflow, and/or conditions. For snakemake users this is equivalent to 'rules'.
    - __Process__: Contains channels and scripts that can be executed in a Linux server like bash commands.
    - __Channel__: Produces ways through which processes communicate to each other for example input and output are channels of value that point the process to where data is or should be located.
- __Config file__: The .config file contains parameters, and multiple profiles. Each profile can contain a different executor type (e.g. LS API, conda, docker, etc.), memory or machine type, output directory, working directory and more!
- __Docker file__: Contains dependencies and enviroments that is needed for the nextflow workflow to run.
- __Schema file__: Schmema files are optional and are structured json files that contain information about the usage and commands that your workflow will excecute.You might have seen this when you run a command along with the flag '--help'.
    

### 2.4 Install Nextflow

In [None]:
#First install java
!sudo apt update
!sudo apt-get install default-jdk -y
!java -version

In [None]:
#Specify platfrom
!export NXF_MODE=google
#Install nexflow, make it exceutable, and update it
!curl https://get.nextflow.io | bash
!chmod +x nextflow
!./nextflow self-update

### 2.5 Scripting and running a nextflow 'Hello World' process

- Create a .nf file in the terminal
- Be sure to include _#!/usr/bin/env nextflow_ and _nextflow.enable.dsl=2_ at the top of your script
- Add a process that is named sayHello
- you can add input, output, and script
    - For our example we have a input that catches the string, a script asking the workflow to write our string  'Hello World' to a file then print the content of that file, and a output that creates the file in the current directory.
- At the end write the order of your workflow
    - For our example we are running the sayHello process and 'view' where we ask nextflow to show our workflow process

It should look something like this:

```
#!/usr/bin/env nextflow
nextflow.enable.dsl=2 

params.str = 'Hello World'

process sayHello {
  input:
  val str

  output:
  stdout

  """
  echo $str > hello.txt
  cat hello.txt
  """
}
workflow {
  sayHello(params.str) | view
}
```

In [None]:
!./nextflow run hello.nf --str 'Hello!'

In [None]:
!cat /home/jupyter/work/34/35a36d596ad458c0d2dc2e2a8193f9/hello.txt

### 2.6 Create and modify your own config file to include a 'gls' profile block

The config file allows nextflow to utilize executors like Life Science API. In this tutorial the config files is named __'test_hw2.config'__.
- Make sure that your region is a region included in the LS API!
- Make sure you create the bucket ahead of time using the `mb` command (e.g., `gsutil mb gs://Your_Bucket_Name`)
    - Specify your working directory bucket and output directory make sure they are separate
- Specify the machine type you would like to use, ensuring that there is enough memory and cpus for the workflow
    - Otherwise LS API will automatically use 1 CPU

```
profiles{
  gls{
      process.executor = 'google-lifesciences'
      workDir = 'gs://Your_Bucket_Name/methyl-output'
      google.location = 'us-central1'
      google.region  = 'us-central1'
      google.project = 'Your_Project_ID'
      params.outdir = 'gs://Your_Bucket_Name/methyl-tmp'
      process.machineType = 'c2-standard-30'
     }
}
```

<div class="alert alert-block alert-info">
    <i class="fa fa-lightbulb-o" aria-hidden="true"></i>
    <b>Tip: </b> Make sure your working directory and output directory are different! Life Sciences creates temporary file in the working directory within your bucket that do take up space so once your pipeline has completed successfully feel free to delete the temporary files.
</div>

If you run into memory issues try increasing your bootdisk size by adding the following parameter into your GLS profile:

`google.lifeSciences.bootDiskSize=100.GB`

### Optional: Listing nf-core tools with docker and viewing their commands
Using the command below you can see all the tools that nfcore holds and their versions/lastes releases

In [None]:
!docker run nfcore/tools list

You can view commands for methylseq (or any other specified nf-core tool) by using the [--help] flag

In [None]:
!./nextflow run nf-core/methylseq -r 1.6.1 --help

### 2.7 Test Methylseq

The 'test' profile uses a small dataset allowing you to ensure the workflow works with your config file without long runtimes. Ensure you include:
- Version of the nf-core tool [-r]
- Location of the config file [-c]

In [None]:
!./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow.config

You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.

### 2.8 Run Methylseq with a real world data

#### a. Importing Fastq files with mambaforge

Installing mambaforge

In [None]:
!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

Installing SRA-tools

In [None]:
!mambaforge/bin/mamba install -c bioconda sra-tools -y

Downloading the SRA single-celled fastq file

In [None]:
!fasterq-dump --threads 4 --progress SRR067701

#### b. Run Methylseq with your own profile and data. Ensure you include:
- nf-core tool version [-r]
- Add fastq.gz file input [--input]
- Reference Genome [--genome] (no need to have it on hand nf-core uses iGenomes and will pull in the correct reference file)
- Confile file location [-c]
- Wanted profile [-profile]
- Other flags such as:
    - If the fastq file is single-ended or not
    - The max cpus and memory wanted


In [None]:
!./nextflow run nf-core/methylseq -r 1.6.1 \
    --input 'SRR067701.fastq.gz' \ 
    --genome GRCh37 \
    --single_end \
    -c nextflow.config \
    -profile gls \
    --max_cpus 32 \
    --max_memory '110.GB'


#### c. Check to see if files are in your output directory bucket

In [None]:
!gsutil ls gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1

__Optional__: View your MultiQC HTML file

In [None]:
!gsutil cp -r gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/MultiQC/multiqc_report.html .

In [None]:
from IPython.display import IFrame

IFrame(src='multiqc_report.html', width=900, height=600)

---

#### To run Nextflow or any bash command in a R kernel use system("your bash command", intern = TRUE)

In [None]:
#example 
system('./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow.config', intern = TRUE)

In [None]:
#view MultiQC html file in R
IRdisplay::display_html('<iframe src="multiqc_report.html" width=1000, height=1000></iframe> ') 