## **Disclaimer**:
**This Notebook is to be implemented at the user’s discretion. We are not responsible for any unexpected behavior (user error or otherwise). Please ensure that you have saved the files you would like to persist to the Data Model (or a more permanent location)before running this Notebook.**

### What is this Notebook?
This notebook offers you a simple option to delete Workflow intermediates generated from a submission launched in a single Terra workspace. 

### What does it do?
This Notebook utilizes the `mop` command from FISS, a programmatic interface to FireCloud that provides Python bindings to the API, to enable users to remove unwanted and/or unnecessary intermediate files. Outputs from a Workflow can include final products, like vcfs, as well as additional intermediate files from intermediate tasks. If intermediates are not necessary, storage in the bucket continually incurs a cost to the user. This option circumvents the need to manually delete intermediates in submissions and offers the option to keep required outputs.

### How does it do it?
Every submission launched (can contain single or multiple workflows) is assigned a submissionID that is used to label the “directory” in the Google bucket where output files are copied. With the individual workspace name and the Terra billing-project, a list of submissionIDs is generated and intermediate files within the submission directory are deleted.

### What gets deleted? 
1. Workflow output files minus logs are deleted except any outputs that are bound to the Data Model. To bind outputs to the Data Model, select Defaults from the Outputs section of the Workflow before selecting “Launch Analysis”.

### What gets left behind?
1. Files uploaded to the Google bucket that do not live inside a submission “directory” will NOT be deleted.
2. Log files (stderr, stdout, *.log) within a submission “directory” will NOT be deleted.
3. Submission folders/“directories” will NOT be deleted - only the contents.
4. Notebooks in the Google bucket will NOT be deleted.

### What should you do before using this Notebook?
1. If there are outputs that should not be deleted, they will need to be bound to the Data Model. If a file is NOT bound to the Data Model, it will be removed. 
2. If not bound to the Data Model, desired files should be copied to a secondary location.

### What should you do to run this Notebook?
1. Copy this Notebook into the workspace where you want to remove intermediates generated from launched submissions.
2. Open the Notebook and Create a Runtime Environment if necessary.
3. After the Notebook is open, select Cell > Run All.
4. You will be prompted to enter Yes/No before deletion begins. Enter Yes/No and press Enter.


In [1]:
# install up-to-date version of FISS
version_fiss = !pip show firecloud | grep "Version" | sed 's/\Version: //'
version_fiss = version_fiss[0]
install_cmd = "firecloud==" + version_fiss

!pip install $install_cmd

# Check version of fiss installed in runtime environment
fiss_nb = !fissfc -v
fiss_nb = fiss_nb[0]

# Check if runtime environement and FISS versions match
if fiss_nb != version_fiss:
    raise Exception("Notebook FISS version ({fiss_nb}) is not up to date ({version_fiss}). Stopping deletion.".format(fiss_nb=fiss_nb, version_fiss=version_fiss))



**The next cell performs deletion of workflow intermediate files. You will be prompted to confirm if you want to proceed with deletion of all intermediate files.**

In [None]:
import os
from firecloud.__about__ import __version__
print(__version__)
from firecloud.fiss import main as fiss_func

BUCKET = os.environ['WORKSPACE_BUCKET']
WORKSPACE_NAME = os.environ['WORKSPACE_NAME']
WORKSPACE_NAMESPACE = os.environ['WORKSPACE_NAMESPACE']

args=["fissfc", "-V","mop", "-w", WORKSPACE_NAME, "-p", WORKSPACE_NAMESPACE]
fiss_func(args)

0.16.25
Retrieving workspace information...
Reproducibility_Case_Study_Tetralogy_of_Fallot_fissMop_sushma -- gs://fc-37bf0981-4a76-4321-81e4-37fbb863fe85
gsutil ls -l gs://fc-37bf0981-4a76-4321-81e4-37fbb863fe85/**
Found 1261 files in bucket fc-37bf0981-4a76-4321-81e4-37fbb863fe85
Getting annotations for participant entities...
Getting annotations for participant_set entities...
Found 6 referenced files in workspace Reproducibility_Case_Study_Tetralogy_of_Fallot_fissMop_sushma
Found 100 files to delete:
  24.57 MiB  gs://fc-37bf0981-4a76-4321-81e4-37fbb863fe85/c626a6fc-86ce-434f-b3fe-1fad04ee804e/CallSingleSampleGvcfGATK4/22d1b88b-927e-43c8-87cb-324457da9ac5/call-HaplotypeCaller/shard-16/HG00096.synthetic.exome.mutated.g.vcf.gz
   8.95 MiB  gs://fc-37bf0981-4a76-4321-81e4-37fbb863fe85/c626a6fc-86ce-434f-b3fe-1fad04ee804e/CallSingleSampleGvcfGATK4/22d1b88b-927e-43c8-87cb-324457da9ac5/call-HaplotypeCaller/shard-21/HG00096.synthetic.exome.mutated.g.vcf.gz
  40.54 KiB  gs://fc-37bf0981-4a7