# 1. Submit a Job directly to Life Science API

### 1.1 Specify the output directory [--logging], region, location, and the command

Remember to point to a directory in your specified bucket

In [57]:
!gcloud beta lifesciences pipelines run \
    --location us-central1 \
    --regions us-east1 \
    --logging gs://nextflowdemobucket/zy-test/example.log \
    --command-line 'echo "hello world!"'

Running [projects/25874201052/locations/us-central1/operations/17977600789736315778].


### 1.2 Check job status

In [58]:
#To check the job status enter operation ID from the gcloud output in command below : 
#Running [projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID]
!gcloud beta lifesciences operations describe 17977600789736315778

metadata:
  '@type': type.googleapis.com/google.cloud.lifesciences.v2beta.Metadata
  createTime: '2022-07-27T18:22:48.187635Z'
  events:
  - description: Worker "google-pipelines-worker-a90f18302bb692140844568b5f65ee4c"
      assigned in "us-east1-b" on a "n1-standard-1" machine
    timestamp: '2022-07-27T18:22:57.589812819Z'
    workerAssigned:
      instance: google-pipelines-worker-a90f18302bb692140844568b5f65ee4c
      machineType: n1-standard-1
      zone: us-east1-b
  pipeline:
    actions:
    - commands:
      - -c
      - echo "hello world!"
      entrypoint: bash
      imageUri: google/cloud-sdk:slim
    - alwaysRun: true
      commands:
      - /bin/sh
      - -c
      - gsutil -m -q cp /google/logs/output gs://nextflowdemobucket/zy-test/example.log
      imageUri: google/cloud-sdk:slim
    resources:
      regions:
      - us-east1
      virtualMachine:
        bootDiskSizeGb: 10
        bootImage: projects/cos-cloud/global/images/family/cos-stable
        labels:
         

### 1.3 View your output

In [59]:
!gsutil cp gs://nextflowdemobucket/zy-test/example.log .
!cat example.log

Copying gs://nextflowdemobucket/zy-test/example.log...
/ [1 files][   13.0 B/   13.0 B]                                                
Operation completed over 1 objects/13.0 B.                                       
hello world!


# 2. Submit a Job to Life Science API through Nextflow

### 2.1 Create a Nexflow Service Account

- Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage APIs by searching each of the GCP products and clicking <img src="images/service_account_5.jpg" width="50" height="50">
- Click the navigation menu <img src="images/service_account_6.jpg" width="20" height="20">, go to IAM then click Service Accounts

<img src="images/service_account_1.jpeg" width="250" height="50">

- Select  <img src="images/service_account_2.jpeg" width="150" height="30">

- Type in 'nextflow-service-account' as the service account name and press 'Done'
- On the AMI & Admin menu click 'IAM' then click <img src="images/service_account_3.jpeg" width="20" height="20"> next to the nextflow service account

- Add the following roles and click 'Save':
<img src="images/service_account_4.jpg" width="400" height="400"> 

__Roles:__

    - lifesciences.workflowsRunner
    - iam.serviceAccountUser
    - serviceusage.serviceUsageConsumer
    - storage.objectAdmin

### 2.2 Add Service Account to Notebook Permissions

When creating a notebook you can edit the permissions to utilze the nextflow service account.
- Using the 'IAM & Admin' menu on the left click 'Service Accounts' (if you aren't there already) locate your nextflow service account and copy the entire email name
- Start to create your notebook and edit the Permissions section by unclicking 'Use Compute Engine default service account' and enter your service account email.

<img src="images/service_account_7.jpg" width="400" height="400">
- then click 'Create'


<div class="alert alert-block alert-danger">
    <i class="fa fa-exclamation-circle" aria-hidden="true"></i>
    <b>WARNING: </b>  Please <b>do not create a service key</b> if instructed by any tutorial. API keys are generally not considered secure. They are typically accessible to all users making it easy for someone to steal an API key. Once the key is stolen it has no expiration, so it may be used indefinitely unless the project owner revokes or regenerates the key. 
</div>

### 2.3 Nextflow 101

Nextflow interacts with many different files to have a proper working workflow:

- __Main file__: The main file is a .nf file that holds the processes and channels describing the input, output, a shell script of your commands, workflow which acts like a recipe book for nextflow, and/or conditions. For snakemake users this is equivalent to 'rules'.
    - __Process__: Contains channels and scripts that can be executed in a Linux server like bash commands.
    - __Channel__: Produces ways through which processes communicate to each other for example input and output are channels of value that point the process to where data is or should be located.
- __Config file__: The .config file contains parameters, and multiple profiles. Each profile can contain a different executor type (e.g. LS API, conda, docker, etc.), memory or machine type, output directory, working directory and more!
- __Docker file__: Contains dependencies and enviroments that is needed for the nextflow workflow to run.
- __Schema file__: Schmema files are optional and are structured json files that contain information about the usage and commands that your workflow will excecute.You might have seen this when you run a command along with the flag '--help'.
    

### 2.4 Install Nextflow

In [6]:
#First install java
!sudo apt update
!sudo apt-get install default-jdk -y
!java -version

Hit:2 http://packages.cloud.google.com/apt cloud-sdk-buster InRelease          [0m
Hit:3 http://packages.cloud.google.com/apt google-cloud-packages-archive-keyring-buster InRelease
Hit:4 http://packages.cloud.google.com/apt gcsfuse-buster InRelease            [0m
Hit:1 https://packages.cloud.google.com/apt kubernetes-xenial InRelease        [0m
Get:5 http://packages.cloud.google.com/apt google-compute-engine-buster-stable InRelease [5526 B]
Hit:6 http://deb.debian.org/debian buster InRelease                            [0m[33m
Hit:7 http://security.debian.org/debian-security buster/updates InRelease      [0m
Hit:8 https://cloud.r-project.org/bin/linux/debian buster-cran40/ InRelease    [0m
Hit:9 http://deb.debian.org/debian buster-updates InRelease [0m               
Hit:10 http://deb.debian.org/debian buster-backports InRelease
Get:11 https://download.docker.com/linux/debian buster InRelease [54.0 kB]
Fetched 59.5 kB in 1s (66.3 kB/s)    [0m[33m[33m[33m[33m[33m[33m[33m

In [1]:
#Specify platfrom
!export NXF_MODE=google
#Install nexflow, make it exceutable, and update it
!curl https://get.nextflow.io | bash
!chmod +x nextflow
!./nextflow self-update

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14624  100 14624    0     0  88726      0 --:--:-- --:--:-- --:--:-- 89170
[Knloading nextflow dependencies. It may require a few seconds, please wait .. Downloading nextflow dependencies. It may require a few seconds, please wait .. 
      N E X T F L O W
      version 22.04.5 build 5708
      created 15-07-2022 16:09 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io


Nextflow installation completed. Please note:
- the executable file `nextflow` has been created in the folder: /home/jupyter
- you may complete the installation by moving it to a directory in your $PATH

[Knloading nextflow dependencies. It may require a few seconds, please wait .. Downloading nextflow dependencies. It may require a few seconds, please wait .. 
      N E X T F L O W
      version 22.04.5 build 5708
      created 15-07-2022 16:09 U

### 2.5 Scripting and running a nextflow 'Hello World' process

- Create a .nf file in the terminal
- Be sure to include _#!/usr/bin/env nextflow_ and _nextflow.enable.dsl=2_ at the top of your script
- Add a process that is named sayHello
- you can add input, output, and script
    - For our example we have a input that catches the string, a script asking the workflow to write our string  'Hello World' to a file then print the content of that file, and a output that creates the file in the current directory.
- At the end write the order of your workflow
    - For our example we are running the sayHello process and 'view' where we ask nextflow to show our workflow process

It should look something like this:

```
#!/usr/bin/env nextflow
nextflow.enable.dsl=2 

params.str = 'Hello World'

process sayHello {
  input:
  val str

  output:
  stdout

  """
  echo $str > hello.txt
  cat hello.txt
  """
}
workflow {
  sayHello(params.str) | view
}
```

In [1]:
!./nextflow run hello.nf --str 'Hello!'

N E X T F L O W  ~  version 22.04.5
Launching `hello.nf` [soggy_cuvier] DSL2 - revision: f5c6efda4a
[-        ] process > sayHello [  0%] 0 of 1[K
[2A
executor >  local (1)[K
[cf/4cfacf] process > sayHello [100%] 1 of 1 ✔[K
Hello![K
[K



In [1]:
!cat /home/jupyter/work/34/35a36d596ad458c0d2dc2e2a8193f9/hello.txt

hello


### 2.6 Create and modify your own config file to include a 'gls' profile block

The config file allows nextflow to utilize executors like Life Science API. In this tutorial the config files is named __'test_hw2.config'__.
- Make sure that your region is a region included in the LS API!
- Make sure you create the bucket ahead of time using the `mb` command (e.g., `gsutil mb gs://Your_Bucket_Name`)
    - Specify your working directory bucket and output directory make sure they are separate
- Specify the machine type you would like to use, ensuring that there is enough memory and cpus for the workflow
    - Otherwise LS API will automatically use 1 CPU

```
profiles{
  gls{
      process.executor = 'google-lifesciences'
      workDir = 'gs://Your_Bucket_Name/methyl-output'
      google.location = 'us-central1'
      google.region  = 'us-central1'
      google.project = 'Your_Project_ID'
      params.outdir = 'gs://Your_Bucket_Name/methyl-tmp'
      process.machineType = 'c2-standard-30'
     }
}
```

<div class="alert alert-block alert-info">
    <i class="fa fa-lightbulb-o" aria-hidden="true"></i>
    <b>Tip: </b> Make sure your working directory and output directory are different! Life Sciences creates temporary file in the working directory within your bucket that do take up space so once your pipeline has completed successfully feel free to delete the temporary files.
</div>

If you run into memory issues try increasing your bootdisk size by adding the following parameter into your GLS profile:

`google.lifeSciences.bootDiskSize=100.GB`

### Optional: Listing nf-core tools with docker and viewing their commands
Using the command below you can see all the tools that nfcore holds and their versions/lastes releases

In [2]:
!docker run nfcore/tools list


                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 2.4.1 - https://nf-co.re


┏━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Pipeline    ┃       ┃      Latest ┃             ┃             ┃ Have latest  ┃
┃ Name        ┃ Stars ┃     Release ┃    Released ┃ Last Pulled ┃ release?     ┃
┡━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ sarek       │   180 │         3.0 │  1 week ago │           - │ -            │
│ viralrecon  │    73 │         2.5 │ 2 weeks ago │           - │ -            │
│ rnafusion   │    76 │       2.1.0 │ 2 weeks ago │           - │ -            │
│ isoseq      │     4 │       1.1.0 │ 2 weeks ago │           - │ -            │
│ fetchngs    │    58 │         1.7 │ 4 we

You can view commands for methylseq (or any other specified nf-core tool) by using the [--help] flag

In [3]:
!./nextflow run nf-core/methylseq -r 1.6.1 --help

N E X T F L O W  ~  version 22.04.5
Launching `https://github.com/nf-core/methylseq` [tiny_sanger] DSL1 - revision: 03972a686b [1.6.1]


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/methylseq v1.6.1[0m
-[2m----------------------------------------------------[0m-

Typical pipeline command:

  [0;36mnextflow run nf-core/methylseq --input '*_R{1,2}.fastq.gz' -profile docker[0m

[4m[1mInput/output options[0m
  --input                           [2m[string]  [0mInput FastQ files.[2m[0m
  --single_end                      [2m[boolean] [0mSpecifies that the input is single-end reads.[2m[0m
  --outdir               

### 2.7 Test Methylseq

The 'test' profile uses a small dataset allowing you to ensure the workflow works with your config file without long runtimes. Ensure you include:
- Version of the nf-core tool [-r]
- Location of the config file [-c]

In [4]:
!./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow.config

N E X T F L O W  ~  version 22.04.5
Launching `https://github.com/nf-core/methylseq` [grave_noether] DSL1 - revision: 03972a686b [1.6.1]


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/methylseq v1.6.1[0m
-[2m----------------------------------------------------[0m-

[1mCore Nextflow options[0m
  [0;34mrevision                  : [0;32m1.6.1[0m
  [0;34mrunName                   : [0;32mgrave_noether[0m
  [0;34mcontainer                 : [0;32mnfcore/methylseq:1.6.1[0m
  [0;34mlaunchDir                 : [0;32m/home/jupyter[0m
  [0;34mworkDir                   : [0;32m/zy-test[0m
  [0;34mprojectDir      

You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.

### 2.8 Run Methylseq with a real world data

#### a. Importing Fastq files with mambaforge

Installing mambaforge

In [35]:
!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 88.1M  100 88.1M    0     0  47.6M      0  0:00:01  0:00:01 --:--:-- 66.4M
PREFIX=/home/jupyter/mambaforge
Unpacking payload ...
Extracting "pybind11-abi-4-hd8ed1ab_3.tar.bz2"
Extracting "pysocks-1.7.1-py39hf3d152e_5.tar.bz2"
Extracting "_openmp_mutex-4.5-2_gnu.tar.bz2"
Extracting "icu-70.1-h27087fc_0.tar.bz2"
Extracting "_libgcc_mutex-0.1-conda_forge.tar.bz2"
Extracting "cryptography-37.0.2-py39hd97740a_0.tar.bz2"
Extracting "libffi-3.4.2-h7f98852_5.tar.bz2"
Extracting "mamba-0.22.1-py39hfa8f2c8_1.tar.bz2"
Extracting "conda-package-handling-1.8.1-py39hb9d737c_1.tar.bz2"
Extracting "reproc-14.2.3-h7f98852_0.tar.bz2"
Extracting "pycosat-0.6.3-py39hb9d737c_1010.tar.bz2"
Ext

Installing SRA-tools

In [36]:
!mambaforge/bin/mamba install -c bioconda sra-tools -y


                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.22.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['sra-tools']

[?25l[2K[0G[+] 0.0s
[2K[1A[2K[0G[+] 0.1s
bioconda/linux-64    [33m━━━━━━━━━━

Downloading the SRA single-celled fastq file

In [37]:
!fasterq-dump --threads 4 --progress SRR067701

join   :|  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.99-  1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.4 2.4 2.4 2.4 2.4 2.4 2.

#### b. Run Methylseq with your own profile and data. Ensure you include:
- nf-core tool version [-r]
- Add fastq.gz file input [--input]
- Reference Genome [--genome] (no need to have it on hand nf-core uses iGenomes and will pull in the correct reference file)
- Confile file location [-c]
- Wanted profile [-profile]
- Other flags such as:
    - If the fastq file is single-ended or not
    - The max cpus and memory wanted


In [9]:
!./nextflow run nf-core/methylseq -r 1.6.1 \
    --input 'SRR067701.fastq.gz' \ 
    --genome GRCh37 \
    --single_end \
    -c nextflow.config \
    -profile gls \
    --max_cpus 32 \
    --max_memory '110.GB'


N E X T F L O W  ~  version 22.04.3
Launching `https://github.com/nf-core/methylseq` [stoic_booth] DSL1 - revision: 03972a686b [1.6.1]


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/methylseq v1.6.1[0m
-[2m----------------------------------------------------[0m-

[1mCore Nextflow options[0m
  [0;34mrevision              : [0;32m1.6.1[0m
  [0;34mrunName               : [0;32mstoic_booth[0m
  [0;34mcontainer             : [0;32mnfcore/methylseq:1.6.1[0m
  [0;34mlaunchDir             : [0;32m/home/jupyter[0m
  [0;34mworkDir               : [0;32m/zy-test/methyl-seq[0m
  [0;34mprojectDir            : [0;3

#### c. Check to see if files are in your output directory bucket

In [11]:
!gsutil ls gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1

gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/MultiQC/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/bismark_alignments/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/bismark_deduplicated/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/bismark_methylation_calls/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/bismark_reports/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/bismark_summary/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/fastqc/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/pipeline_info/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/preseq/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/qualimap/
gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/trim_galore/


__Optional__: View your MultiQC HTML file

In [39]:
!gsutil cp -r gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/MultiQC/multiqc_report.html .

Copying gs://nextflowdemobucket/zy-test/methyl-seq/methylseq-1/MultiQC/multiqc_report.html...
/ [1 files][  1.2 MiB/  1.2 MiB]                                                
Operation completed over 1 objects/1.2 MiB.                                      


In [46]:
from IPython.display import IFrame

IFrame(src='multiqc_report.html', width=900, height=600)

---

#### To run Nextflow or any bash command in a R kernel use system("your bash command", intern = TRUE)

In [None]:
#example 
system('./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow.config', intern = TRUE)

In [None]:
#view MultiQC html file in R
IRdisplay::display_html('<iframe src="multiqc_report.html" width=1000, height=1000></iframe> ') 