In [None]:
import gp

In [None]:
# Environment variables
DATA_DIR = '../data/mutsigcv/LUSC.maf/'

In [None]:
# Create a GenePattern server proxy instance
gpserver = gp.GPServer('https://cloud.genepattern.org/gp','athon','v6giHMa87v6XbWB')

# Obtaining GPTask by module name
msigcv = gp.GPTask(gpserver, "MutSigCV")

# Load the full parameter data
msigcv.param_load()

In [None]:
# Get the list of tasks
task_list = gpserver.get_task_list()
for task in task_list:
    print(task.get_name())

In [None]:
def get_details(module):
    # Print the module name
    print( module.get_name() )

    # Print the module LSID
    print( module.get_lsid() )

    # Print the module version
    print( module.get_version() )

    # Print the description
    print( module.get_description() )
    
get_details(msigcv)

In [None]:
def iterate_params(module):
    for param in module.get_parameters():  # Loop through each parameter
        print( param.get_name() )          # Print the parameter's name
        print( param.get_type() )          # Print the parameter's type (text, number, file, etc.)
        print( param.get_description() )   # Print the parameter's description
        print( param.get_default_value() ) # Print the parameter's default value
        print( param.is_optional() )       # Print whether the parameter is optional
        print( '' )                        # Leave a blank line between printed parameters
        
iterate_params(msigcv)

In [None]:
def get_test_data():
    import urllib.request
    import zipfile
    
    testfile = "http://software.broadinstitute.org/cancer/cga/sites/default/files/data/tools/mutsig/LUSC.MutSigCV.input.data.v1.0.zip"
    urllib.request.urlretrieve(testfile, "LUSC.zip")
    
    zip_ref = zipfile.ZipFile("LUSC.zip", 'r')
    zip_ref.extractall("LUSC.maf")
    zip_ref.close()

In [None]:
# Create the GPJobSpec
job_spec = module.make_job_spec()

# Loop through all the parameters and set their default values
for param in module.get_parameters():  
    # If the parameter has a default value, set that value
    if param.get_default_value() != None: 
        # Set the default value
        job_spec.set_parameter( param.get_name(), param.get_default_value() )  

def upload_and_set(file_name, file_path, parameter):
    """
    Upload file to gpserver and set as parameter.
    """
    
    # Upload the input file
    uploaded_file = gpserver.upload_file("file_name", "/path/to/the/file/on/the/file/system/file_name")  
    
    # Attach the input file to the correct parameter
    job_spec.set_parameter(parameter, 'uploaded_file.get_url()')  

    

# set coverage table file
job_spec.set_parameter("coverage.table.file", 
                       "shared_data/example_files/MutSigCV_1.3/exome_full192.coverage.txt"
        
# set output filename
job_spec.set_parameter("output.filename.base", 'LUSC_msigcv')  
        

# Attach the input file to the correct parameter
job_spec.set_parameter("input.filename", 
                       "http://software.broadinstitute.org/cancer/cga/sites/default/files/data/tools/mutsig/LUSC.MutSigCV.input.data.v1.0.zip")


        




In [None]:
job_spec.params

In [None]:
# This will return the job object and continue execution even if the job isn't finished
job = gpserver.run_job(job_spec, False)

Some parameters come with a list of valid choices. These parameters can be identified by calling the is_choice_param() method. Additionally, choice parameters have a number of other methods available for working with the choice list. Calling these methods on a non-choice parameter will result in an error being thrown. An example is given below.

In [None]:
# Loop through each parameter
for param in params_list:
    if param.is_choice_param():        # If the parameter is a choice param
        print( param.get_name() )      # Print the parameter's name
        
        choices = param.get_choices()  # Get a list of valid choices 
        for choice in choices:         # Print the label and value for each choice
            print( choice['label'] + " = " + choice['value'] )
            
        # Print the default selected value for each choice
        print( param.get_choice_selected_value() )

## Creating a Job Specification

In order to run a GenePattern job from Python, you must first obtain a GPJobSpec object from the correct GPTask object and then set the appropriate parameters for the job. For many parameters their default values will suffice. For others, you will want to set a specific value.

Below is code showing how to obtain a GPJobSpec object and how iterate over the parameters, setting them to their default values.

To set a specific value for a parameter, the set_parameter() method should be called. In the code below you will set the *input.filename* parameter to point to a publicly available dataset. This data should suffice for the purposes of this tutorial.

In [None]:
# Attach the input file to the correct parameter
job_spec.set_parameter("input.filename", 
                       "https://software.broadinstitute.org/cancer/software/genepattern/data/all_aml/all_aml_test.gct")

"http://software.broadinstitute.org/cancer/cga/sites/default/files/data/tools/mutsig/LUSC.MutSigCV.input.data.v1.0.zip"

Data files can be uploaded by calling GPServer.upload_file(). This will return a GPFile object, and the parameter can be set to point to the URL of this object. An example for the PreprocessDataset module is shown below. The code has been commented out, however, as it will not be used in this tutorial.

## Submitting Your First Job

Once the GPJobSpec is ready, it can be used to launch a GenePattern job. This will return a GPJob object, representing the specific job that was just launched. A code example of how to do this is below.

Why are we passing *False* in as a parameter, you ask? By default the run_job() method will halt code execution of your Python script until the job has finished running in GenePattern. For long running jobs, however, this may not be desirable. By optionally passing in False as a parameter, the method will return as soon as the job is submitted, allowing the Python program to continue.

For the purposes of this tutorial, it is better than you do not have to wait. If you did want to submit the job and wait for it to complete, however, the code is below (albeit commented out).

In [None]:
#  This will halt execution until the job is complete
# job = gpserver.run_job(job_spec)

### Querying for Job Status

When a GenePattern job is submitted, it passes through several states: pending, running and then either to complete or error. At any time after a job has been submitted, its status can be checked by calling get_status_message(). Similarly, its completion can be checked by calling is_finished(). Examples of both are shown below.

In [None]:
# Prints a brief description of the job's current state
print( job.get_status_message() )

# Quaries the server and returns True if the job is complete, False otherwise
print( job.is_finished() )

Finally, if at any point you decide that you just want to wait until the job is complete, you can always call wait_until_done().

In [None]:
job.wait_until_done()

## Working with Output Files

Finally, once the job is complete and assuming there were no errors, a list of its output files may be obtained by making the get_output_files() call shown below. 

This will return a list of GPFile objects, each containing methods to download or read the contents of the file.

In [None]:
# Get a list of output files
output_list = job.get_output_files()  

for file in output_list:     # Loop through each output file
    print( file.get_url() )  # Print the URL to the file
    data = file.read()       # Read the data in the file 

Once the contents of a data file has been assigned to a variable, it may used in conjunction with other common Python libraries, such as matplotlib, pandas, numpy or scipy.

The code below will print out the contents of the last output file assigned to the *data* variable in the previous code block.

In [None]:
print ( data )

This concludes the tutorial on how to work with GenePattern using the GenePattern Python library. For more information, please see the [GenePattern Programmer's Guide](http://www.broadinstitute.org/cancer/software/genepattern/programmers-guide).