# Quickstart for the Seven Bridges Platform
### Overview
To introduce you to the major features of the Seven Bridges Platform, this QuickStart walks you through the process of a [Whole Exome Sequencing Analysis](https://igor.sbgenomics.com/public/apps#workflow/sevenbridges/public-apps/whole-exome-sequencing-gatk-2-3-9-lite). This API tutorial mirrors the [tutorial for the visual interface](http://docs.sevenbridges.com/docs/quickstart).

This tutorial makes use of the [sevenbridges-python bindings](http://sevenbridges-python.readthedocs.io/en/latest/installation/).

### Prerequisites
 1. You need an account on the Seven Bridges Platform. ([Sign up](https://www.sbgenomics.com/login) here for free.)
 2. You need your **authentication token**, and you need to pass this credential to the API. See <a href="Setup_API_environment.ipynb">**the tutorial on setting up the API environment**</a> for details.
 
## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The _Api_ object needs to know your **auth\_token** and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] specify platform {cgc, sbg}
prof = 'sbpla'

config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## Create a project
_Projects_ are the foundation of any analysis on the the Platform. We can either work inside a project that has already been created or create a new project. Here we **create a new project**, but first **check that it doesn't already exist** to show both methods. The project name, billing group (we will use our free credits in the **Pilot Fund** billing group), and a project description will be sent in our API call. 

We start by listing all of your projects and your billing groups. Next, we create the JSON that will be passed to the API to create the project. The dictionary should include:
* **billing_group** *Billing group* that will be charged for this project
* **description**   (optional) Project description
* **name**   Name of the project, may be *non-unique*<sup>1</sup>
* **type**   (optional) If setting this, it must be 'v2' always. Other project types may summon a pale horse on the horizon

**After** creating the project, you can re-check the project list and get *additional* details assigned by the CGC, including:

* **id**     _Unique_ identifier for the project, generated based on Project Name
* **href**   Address<sup>2</sup> of the project.
* **flag**   (unimportant) this is set by the object constructor, here always 'longList':False 
* **tags**   List of tags, currently NOT used. 

<sup>1</sup> Please **don't** use non-unique *project names*. However, if you insist, the backend will allow it and assign a unique `id` to you project. This `id` is known as a [short name](http://docs.sevenbridges.com/docs/the-api#section-project-short-names)

<sup>2</sup> This is the address where, by using API you can get this resource.

#### PROTIPS
 * A detailed _recipe_ for creating projects is [here](../../Recipes/SBPLAT/projects_makeNew.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/create-a-new-project)

In [None]:
# [USER INPUT] Set project name here:
new_project_name = 'Azzurri'                          
      
    
# What are my funding sources?
billing_groups = api.billing_groups.query()  

# Pick the first group (arbitrary)
print((billing_groups[-1].name + \
       ' will be charged for computation and storage (if applicable) for your new project'))

# Set up the information for your new project
new_project = {
        'billing_group': billing_groups[-1].id,
        'description': """A project created by the API recipe (projects_makeNew.ipynb).
                      This also supports **markdown**
                      _Pretty cool_, right?
                   """,
        'name': new_project_name
}

# check if this project already exists. LIST all projects and check for name match
my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name]      
              
if my_project:    # exploit fact that empty list is False, {list, tuple, etc} is True
    print('A project with the name (%s) already exists, please choose a unique name' \
          % new_project_name)
    raise KeyboardInterrupt
else:
    # CREATE the new project
    my_project = api.projects.create(name = new_project['name'], \
                                     billing_group = new_project['billing_group'], \
                                     description = new_project['description'], \
                                     )
    
    # (re)list all projects, and get your new project
    my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name][0]

    print('Your new project %s has been created.' % (my_project.name))
    if hasattr(my_project, 'description'): # need to check if description has been entered
        print('Project description: %s \n' % (my_project.description)) 

## Copy input files from the _Public Reference Files_ repository
[Public Reference Files](http://docs.sevenbridges.com/docs/file-repositories) is a repository of files maintained by the Seven Bridges Platform. It contains the latest and most frequently used reference genomes and annotation files, so you won't have to upload your own reference files every time you run a task. You can access this repository via the API as you would a project.

Below, we will first list all our projects, then we'll list the files within the Public Reference Files repository<sup>3</sup>, and copy a file from Public Reference Files to your target project. We've hard-coded a list of file names to copy based on the tutorial.

The critical information for this POST is the **file_id**. Note, you are allow to copy the same file as many times as you like. However, duplicates will be automatically have a prefix attached of (\_1\_, \_2\_, etc) depending on how many times you copy the file.

#### PROTIPS 
 * A detailed _recipe_ for copying Public Files is [here](../../Recipes/SBPLAT/files_copyFromPublicReference.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/copy-a-file)

<sup>3</sup> Remember, files are only accessible **within** a project - here the Public Reference Files project

In [None]:
# [USER INPUT] Set your project name; source project name; and file (f_) indices here:
source_project_id = 'admin/sbg-public-data'        
files_list = ['WES_human_Illumina.pe_1.fastq',
             'WES_human_Illumina.pe_2.fastq'
             ]


# LIST all files in the source and target project
my_file_names = [f.name for f in \
                 api.files.query(limit = 100, project = my_project).all()]
source_files = api.files.query(limit = 100, project = source_project_id, \
                              names = files_list)

# Copy files if they don't exist in my_project
for f in source_files.all():
    if f.name in my_file_names:
        print('File already exists in second project, skipping')
    else:
        print('File (%s) does not exist in Project (%s); copying now' % \
              (f.name, my_project.name))

        new_file = f.copy(project = my_project, \
                          name = f.name)

        # re-list files in target project to verify the copy worked
        my_files = [f.name for f in api.files.query(limit = 100, project = my_project.id).all()]

        if f.name in my_files:
            print('Sucessfully copied one file!')
        else:
            print('Something went wrong...')

## What is the meaning of this?
Files are great, but without **metadata** they can be hard to manage. So here were are going to add metadata to these files. We will add one field that is _needed for the task_ and one to show _generality_.

We've already listed all your files in the last cell. Here we will check the metadata for each one. A **detail**-call for files returns the following *attributes*:
* **created_on** File creation date
* **id**     _Unique_ identifier for the file
* **name**   Name of the file, note this **is** metadata and can be _changed_
* **href**   Address<sup>4</sup> of the file.
* **modified_on** File modification date
* **metadata** Dictionary of metadata
* **origin**  Will link back to a *task* if this is an output file
* **project** Project the file is in
* **size** file size in bytes
* **flag**   (unimportant) this is set by the object constructor, here always 'longList':False 

The **metadata** dictionary is both _changeable_ and _expandable_, but initially rather sparse with:
* sample_id
* platform
* paired_end
* library_id

<sup>4</sup> This is the address where, by using API, you can get this resource.

#### PROTIP
 * A detailed _recipe_ for detailing files is [here](../../Recipes/SBPLAT/files_detailOne.ipynb).
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/get-file-details).

In [None]:
my_files = api.files.query(project = my_project, limit = 100)

for f in my_files.all():
    single_file = api.files.get(id = f.id)
    print('You have selected file %s (size %s [bytes]). \n' % \
          (single_file.name, single_file.size))
    print('The metadata in this file was: \n %s' % \
          (single_file.metadata))
    md = {
        'platform_unit_id': '1',
        'hasFlair':'True'
    }

    for k in md.keys():
        single_file.metadata[k] = md[k]
    
    single_file.save()
    
    print('After the update, file metadata is: \n %s \n' % \
          (single_file.metadata))

## Copy reference files from the _Public Reference_
Equivalent to the operation in **Copy input files from the _Public Reference_**, we are just looking for other file names. 

In [None]:
# Files to copy
ref_files = ['dbsnp_137.b37.vcf',
             '1000G_phase1.indels.b37.vcf',
             'Mills_and_1000G_gold_standard.indels.b37.sites.vcf',
             'human_g1k_v37_decoy.breakpoints.bed',
             'snpEff_v3_6_GRCh37.75.zip',
             'human_g1k_v37_decoy.fasta'
            ]


# LIST all files in the source and target project
my_file_names = [f.name for f in \
                 api.files.query(limit = 100, project = my_project).all()]
source_files = api.files.query(limit = 100, project = source_project_id, \
                              names = ref_files)

# Copy files if they don't exist in my_project
for f in source_files.all():
    if f.name in my_file_names:
        print('File already exists in second project, skipping')
    else:
        print('File (%s) does not exist in Project (%s); copying now' % \
              (f.name, my_project.name))

        new_file = f.copy(project = my_project, \
                          name = f.name)

        # re-list files in target project to verify the copy worked
        my_files = [f.name for f in api.files.query(limit = 100, project = my_project.id).all()]

        if f.name in my_files:
            print('Sucessfully copied one file!')
        else:
            print('Something went wrong...')

## Copy a Public App
Seven Bridges maintains a [repository of publicly available apps](http://docs.sevenbridges.com/docs/public-apps) suitable for many different types of data analysis. These public apps, including tools and workflows, can be accessed via the API as part of the Public Reference project. They are also accessibly by a *visibility* property which can be set to `public`. Below, we use the first method to find apps within Public Reference project

We will first list all our projects, then list the apps within the Public Reference project, and finally copy an app between the Public Reference to the my\_project. Here we also explicitly set _'limit':100_ inside the _query_. This helps speed up the auto-pagination feature within the object constructor.

The critical information for this POST is the **app_id**. Note, you are **not** allowed to copy the same app **and** assign the same name more than once. If you change the name, it is ok. Here we are using the App's **name**, alternatively you can match bases on the App's **ID** as shown here [apps_copyFromMyProject](../../Recipes/SBPLAT/apps_copyFromMyProject.ipynb). 


### Warning on the NAME argument
When copying apps, it is also possible to specify the **name** using

```python
    my_new_app = public_app.copy(project = my_project, name = my_app_source.name)
```

However, we caution that this can lead to conflicts with similarly named apps. Unless you are _certain_ that you want to set a specific, **different** app name, _please_ omit the name argument. It will still inherit the name from the app.

```python
    my_new_app = public_app.copy(project = my_project)
```


<sup>6</sup> Note that setting the **name** of an app, actually changes the **id**. We are working on fixing this inconsistency.

In [None]:
# [USER INPUT] Set app name here
a_name = 'Whole Exome Sequencing GATK 2.3.9.-lite'


my_apps = api.apps.query(project = my_project.id, limit=100)
public_app = [a for a in api.apps.query(visibility='public', limit=100).all() \
                 if a.name == a_name][0]

duplicate_app = [a for a in my_apps.all() if a.name == public_app.name]

if duplicate_app:
    print('App already exists in second project, please try another app')
else:
    print('App (%s) does not exist in Project (%s); copying now' % \
          (a_name, my_project.name))
    
    new_app = public_app.copy(project = my_project.id)
        
    # re-list apps in target project to verify the copy worked
    my_app_names = [a.name for a in \
                    api.apps.query(project = my_project.id, limit=100).all()]
    
    if a_name in my_app_names:
        print('Sucessfully copied one app!')
    else:
        print('Something went wrong...')

## Build & Start tasks
Here, we use the reference file and set one of the six inputs. Note that input files are passed as a _file_ (or a _list_ of _files_). Here, there are no configuration objects. For Apps that require these inputs, they would be the values, such as:

```python
inputs = {
    'num_repititions' : 8,
    'mask' : False, 
    'file_name' : 'output_backup.txt'
}
```

In [None]:
task_name = 'task created with quickstart.ipynb'

# get the file objects
known_snps = api.files.query(project=my_project, limit=100,
                       names = [ref_files[0]])[0]
known_indels = api.files.query(project=my_project, limit=100,
                       names = ref_files[1:2])
target_bed = api.files.query(project=my_project, limit=100,
                       names = [ref_files[3]])[0]
snpeff_db = api.files.query(project=my_project, limit=100,
                       names = [ref_files[4]])[0] 
input_ref = api.files.query(project=my_project, limit=100,
                       names = [ref_files[5]])[0] 

fastq = api.files.query(project=my_project, limit=100,
                       names = files_list)

inputs = {
    'Known_Indels': known_indels,
    'Target_BED': target_bed,
    'Known_SNPs': [known_snps],
    'SnpEff_Database': snpeff_db,
    'input_tar_with_reference': input_ref,
    'FASTQ': fastq
}

my_task = api.tasks.create(name=task_name, project=my_project, \
                           app=new_app, inputs=inputs)

# Check for errors and warnings
if my_task.errors:
    print(my_task.errors)
# elif my_task.warnings:        # feature is in staging
#     print(my_task.warnings)
else:
    print('Your task is good to go, launching!')
    
    # Start the task
    my_task.run()

## Print task status
Here we poll the recently created task. There are many more details in this structure.

In [None]:
details = my_task.get_execution_details()
print('Your task is in %s status' % (details.status))

## Wait for task completion
Simple loop to ping for task completion.

In [None]:
# [USER INPUT] Set loop time (seconds):
loop_time = 600


from time import sleep
flag = {'taskRunning': True}

while flag['taskRunning']:
    print('Pinging SBPLAT for task completion, will download summary files once all tasks completed.')
    details = my_task.get_execution_details()
    if details.status == 'COMPLETED':
        flag['taskRunning'] = False
        print('Task has completed, life is beautiful')
    elif details.status  == 'FAILED':  
        print('Task failed, can not continue')
        raise KeyboardInterrupt
    else:
        sleep(loop_time)

## Get task outputs
Here we poll the recently created task. 

In [None]:
my_details = api.tasks.get(id = my_task.id)
print(my_details.outputs)

That’s it! We've executed a data analysis and obtained some results. We encourage you to try this procedure for yourself before getting started on your own data analyses. You can also visit our [API documentation](http://docs.sevenbridges.com/v1.0/page/api) to learn more about the Seven Bridges Platform and bringing your own tools.