Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYNPY-714 #548

Merged
merged 2 commits into from May 25, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 25 additions & 14 deletions synapseclient/table.py
Expand Up @@ -26,8 +26,23 @@

project = syn.get('syn123')

To create a Table, you first need to create a Table :py:class:`Schema`. This
defines the columns of the table::
First, let's load some data. Let's say we had a file, genes.csv::

Name,Chromosome,Start,End,Strand,TranscriptionFactor
foo,1,12345,12600,+,False
arg,2,20001,20200,+,False
zap,2,30033,30999,-,False
bah,1,40444,41444,-,False
bnk,1,51234,54567,+,True
xyz,1,61234,68686,+,False

To create a Table::

table = build_table('My Favorite Genes', project, "/path/to/genes.csv")
syn.store(table)

:py:func:`build_table` will set the Table :py:class:`Schema` which defines the columns of the table.
To create a table with a custom :py:class:`Schema`, first create the :py:class:`Schema`::

cols = [
Column(name='Name', columnType='STRING', maximumSize=20),
Expand All @@ -39,16 +54,6 @@

schema = Schema(name='My Favorite Genes', columns=cols, parent=project)

Next, let's load some data. Let's say we had a file, genes.csv::

Name,Chromosome,Start,End,Strand,TranscriptionFactor
foo,1,12345,12600,+,False
arg,2,20001,20200,+,False
zap,2,30033,30999,-,False
bah,1,40444,41444,-,False
bnk,1,51234,54567,+,True
xyz,1,61234,68686,+,False

Let's store that in Synapse::

table = Table(schema, "/path/to/genes.csv")
Expand Down Expand Up @@ -1116,12 +1121,17 @@ def build_table(name, parent, values):
:param name: the name for the Table Schema object
:param parent: the project in Synapse to which this table belongs
:param values: an object that holds the content of the tables
- a string holding the path to a CSV file
- a Pandas `DataFrame <http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe>`_

:return: a Table object suitable for storing

Example::

path = "\path\to\file.csv"
table = build_table("simple_table", "syn123", path)
table = syn.store(table)

import pandas as pd

df = pd.DataFrame(dict(a=[1, 2, 3], b=["c", "d", "e"]))
Expand All @@ -1134,13 +1144,14 @@ def build_table(name, parent, values):
except:
pandas_available = False

if not isinstance(values, pd.DataFrame):
if not isinstance(values, pd.DataFrame) and not isinstance(values, six.string_types):
raise ValueError("Values of type %s is not yet supported." % type(values))
if not pandas_available:
raise ValueError("pandas package is required.")
cols = as_table_columns(values)
schema = Schema(name=name, columns=cols, parent=parent)
return Table(schema, values)
headers = [SelectColumn.from_column(col) for col in cols]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @zimingd, we should refactor the name of SelectColumn to something else that represents what the class is: type information for each column of the CSV file. SelectColumn used to be used for only for CSVs retrieved from synapse; in that case, SelectColumn represented the types of columns that were in the SQL SELECT clause (including aggregates).

return Table(schema, values, headers = headers)


def Table(schema, values, **kwargs):
Expand Down
27 changes: 24 additions & 3 deletions tests/unit/unit_test_tables.py
Expand Up @@ -736,18 +736,39 @@ def test_RowSetTable_len():
def test_build_table__with_pandas_DataFrame():
df = pd.DataFrame(dict(a=[1, 2, 3], b=["c", "d", "e"]))
table = build_table("test", "syn123", df)
assert len(table)==3
for i, row in enumerate(table):
assert row[0]==(i+1)
assert row[1]==["c", "d", "e"][i]
assert len(table)==3

headers = [
{'name': 'a', 'columnType': 'INTEGER'},
{'name': 'b', 'columnType': 'STRING'}
]
assert_equals(headers, table.headers)

def test_build_table__with_non_pandas_DataFrame():
assert_raises(ValueError, build_table, "test", "syn123", [])
def test_build_table__with_csv():
string_io = StringIOContextManager('a,b\n'
'1,c\n'
'2,d\n'
'3,e')
with patch.object(synapseclient.table, "as_table_columns",
return_value = [Column(name = "a", columnType = "INTEGER"),
Column(name = "b", columnType = "STRING")]),\
patch.object(io, "open", return_value = string_io):
table = build_table("test", "syn123", "some_file_name")
assert len(table) == 3
for col, row in enumerate(table):
assert row[0] == (col + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert_equals instead because assert does not show the different between the 2 values

assert row[1] == ["c", "d", "e"][col]
headers = [
{'name': 'a', 'columnType': 'INTEGER'},
{'name': 'b', 'columnType': 'STRING'}
]
assert_equals(headers, table.headers)

def test_build_table__with_dict():
assert_raises(ValueError, build_table, "test", "syn123", dict(a=[1, 2, 3], b=["c", "d", "e"]))


class TestTableQueryResult():
Expand Down