# Demo of Synapse API

This notebook demonstrates how to upload files, create tables, add annotations, and modify permissions for the NCANDA Project using the Synapse API for Python.

## Log onto Synapse

This is the code used to Log onto Synapse

In [1]:
import os
import synapseclient
import yaml

syn = synapseclient.Synapse()

config = yaml.load(open(os.path.join(os.path.expanduser("~"),".server_config/synapse.cfg")))

syn.login(config.get('user'), config.get('password'))

Welcome, abonil91!


## Uploading Data to the NCANDA Project

This section shows how to upload a file onto the NCANDA Project using the API.

First, the necessary libaries are importated.

In [2]:
from synapseclient import Project, Folder, File

The Code to Create a Project:

    project = Project('My uniquely named project')
    project = syn.store(project)

The Code to Create a File:

    data_folder = Folder('Data', parent=project)
    data_folder = syn.store(data_folder)

The Code to Add File:

       test_entity = File('/path/to/data/file.xyz', description='Fancy new data', parent=data_folder)
       test_entity = syn.store(test_entity)

### Example Upload 

In [3]:
###Select Project###
project = syn.get('syn3565171')

###Creating Folder in Project###
data_folder = Folder('Example_Upload', parent=project)
data_folder = syn.store(data_folder)

###Uploading File###
example = File('./examplefile.csv', description = "Example CSV File Upload", parent = data_folder)
example = syn.store(example)


##################################################
 Uploading file to Synapse storage 
##################################################
Uploaded Chunks [####################]100.00%     23.0bytes/23.0bytes ./examplefile.csv Done...
Upload completed in 3 seconds.


## Creating Tables

This sections demonstrates how to create and upload tables to the NCANDA Project using the Synapse API for Python.

First, the necessary libaries are importated.

In [4]:
from synapseclient import Schema, Column, Table, Row, RowSet, as_table_columns

Next, one must create columns and make a table schema using the ***Column()*** & ***Schema()*** functions.

In [9]:
cols = [
    Column(name='SUBJECT_ID', columnType='STRING', maximumSize=10),
    Column(name='SEX', columnType='STRING', enumValues=['M', 'F'], maximumSize=1),
    Column(name='PROTOCOL', columnType='INTEGER',enumValues=[ 1, 2, 3, 4]),
    Column(name='YEAR', columnType='INTEGER'),
    Column(name='MISSED_VISIT', columnType='BOOLEAN')]

schema = Schema(name='Subjects_Demo', columns=cols, parent=project)

Finally, we insert the table in the schema and store the table on Synapse.

In [10]:
table = Table(schema, "./TableExampleFile.csv")
table = syn.store(table)

Uploaded Chunks [####################]100.00%     161.0bytes/161.0bytes ./TableExampleFile.csv Done...
Upload completed in 3 seconds.
 [####################]100.00%     1/1  Done...
    

The ***Table()*** function takes two arguments, a schema object and data in some form, which can be:

* path to a CSV file
* Pandas DataFrame
* RowSet object
* list of lists where each of the inner lists is a row



Below is an example how to create a table using a Pandas Dataframe.

In [27]:
import pandas as pd

df = pd.read_csv("./TableExampleFile.csv", index_col=False)
schema = Schema(name='Subjects_Demo', columns=as_table_columns(df), parent=project)
table = syn.store(Table(schema, df))

Uploaded Chunks [####################]100.00%     161.0bytes/161.0bytes /tmp/tmpP1UMtP Done...
Upload completed in 9 seconds.
 [####################]100.00%     1/1  Done...
    

### Changing Data

Once the schema is set, we change the table by **appending** new rows and **updating** existing ones.

The following is an example of **appending** new rows.

In [28]:
table = syn.store(Table(table.schema.id, "./TableExampleFile.csv"))

Uploaded Chunks [####################]100.00%     161.0bytes/161.0bytes ./TableExampleFile.csv Done...
Upload completed in 3 seconds.
 [####################]100.00%     1/1  Done...
    

In [29]:
new_rows = [["Y000025", "F", 1, 3, False],
            ["Z900909", "M", 4, 2, False]]
table = syn.store(Table(schema, new_rows))

Uploaded Chunks [####################]100.00%     48.0bytes/48.0bytes /tmp/tmp8jDsXi Done...
Upload completed in 3 seconds.
 [####################]100.00%     1/1  Done...
    

**Updating** rows requires an etag, which identifies the most recent change set plus row IDs and version numbers for each row to be modified. We get those by querying before updating. Minimizing changesets to contain only rows that actually change will make processing faster.

In [30]:
results = syn.tableQuery("select * from %s where PROTOCOL=1" %table.schema.id)
df = results.asDataFrame()
df['YEAR'] = [3,2,2,3,2,2,3,3,4]

##Note: The etag is propogated from the query results. Without it, an “Invalid etag” will generate.

 [####################]100.00%     1/1  Done...
Downloaded   [####################]100.00%     417.0bytes/417.0bytes query_results.csv Done...
    

In [31]:
table = syn.store(Table(schema, df, etag=results.etag))

Uploaded Chunks [####################]100.00%     277.0bytes/277.0bytes /tmp/tmpHuf2OD Done...
Upload completed in 3 seconds.
 [####################]100.00%     1/1  Done...
    

### Changing Table Structure

Adding columns can be done using the methods ***Schema.addColumn()*** or ***addColumns()*** on the Schema object:

In [32]:
visit_data_column = syn.store(Column(name='VISIT_DATE', columnType='DATE'))
schema.addColumn(visit_data_column)
schema = syn.store(schema)

In [33]:
###Renaming or otherwise modifying a column involves removing the column and adding a new column:###
cols = syn.getTableColumns(schema)
for col in cols:
    if col.name == "VISIT_DATE":
        schema.removeColumn(col)
bday_column2 = syn.store(Column(name='DOV', columnType='DATE'))
schema.addColumn(bday_column2)
schema = syn.store(schema)

### Table attached files

Synapse tables support a special column type called ‘File’ which contain a file handle, an identifier of a file stored in Synapse. Here’s an example of how to upload files into Synapse, associate them with a table and read them back later.

This feature could potentially be a method to store scan data on Synapse.

Demo code available here:
http://python-docs.synapse.org/Table.html#table-attached-files

### Deleting Rows

Query for the rows you want to delete and call syn.delete on the results:

In [34]:
results = syn.tableQuery("select * from %s where PROTOCOL='4'" %table.schema.id)
a = syn.delete(results.asRowSet())

 [####################]100.00%     1/1  Done...
Downloaded   [####################]100.00%     121.0bytes/121.0bytes query_results.csv Done...
    

### Deleting Tables

Deleting the schema deletes the whole table and all rows:

In [26]:
syn.delete(schema)

## Permissions and Controlling Access 

By default, data sets in Synapse are private to the user account.

The following function allows one to view the permissions a specific user has for a specific data set:

    Synapse.getPermissions(entity, principalId=None)

The following function allows one to set the permissions for a specific user in regards to a specific data set:

    Synapse.setPermissions(entity, principalId=None, accessType=[u'READ'], modify_benefactor=False, warn_if_inherits=True, overwrite=True)
    
Parameters:	

* entity – An Entity or Synapse ID to lookup or modify
* principalId – Identifier of a user or group
* accessType – Type of permission to be granted
* modify_benefactor – Set as True when modifying a benefactor’s ACL
* warn_if_inherits – Set as False, when creating a new ACL. Trying to modify the ACL of an Entity that inherits its ACL will result in a warning
* overwrite – By default this function overwrites existing permissions for the specified user. Set this flag to False to add new permissions nondestructively.
