-
Notifications
You must be signed in to change notification settings - Fork 90
Extended order_by and bugfixes #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,13 @@ | ||
from collections import OrderedDict | ||
from functools import wraps | ||
import itertools | ||
import re | ||
from .blob import unpack | ||
import numpy as np | ||
from datajoint import DataJointError | ||
from . import key as PRIMARY_KEY | ||
from collections import abc | ||
|
||
|
||
def prepare_attributes(relation, item): | ||
if isinstance(item, str) or item is PRIMARY_KEY: | ||
|
@@ -20,46 +25,57 @@ def prepare_attributes(relation, item): | |
raise DataJointError("Index must be a slice, a tuple, a list, a string.") | ||
return item, attributes | ||
|
||
class FetchQuery: | ||
def copy_first(f): | ||
@wraps(f) | ||
def ret(*args, **kwargs): | ||
args = list(args) | ||
args[0] = args[0].__class__(args[0]) # call copy constructor | ||
return f(*args, **kwargs) | ||
|
||
def __init__(self, relation): | ||
""" | ||
return ret | ||
|
||
""" | ||
self.behavior = dict( | ||
offset=0, limit=None, order_by=None, descending=False, as_dict=False, map=None | ||
) | ||
self._relation = relation | ||
class Fetch: | ||
def __init__(self, relation): | ||
if isinstance(relation, Fetch): # copy constructor | ||
self.behavior = dict(relation.behavior) | ||
self._relation = relation._relation | ||
else: | ||
self.behavior = dict( | ||
offset=0, limit=None, order_by=None, as_dict=False | ||
) | ||
self._relation = relation | ||
|
||
|
||
@copy_first | ||
def from_to(self, fro, to): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the pair of methods ( relation.fetch.offset(300).limit(100)()
relation.fetch.from_to(300,400)() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could introduce another method called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no standalone There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's the reason why I put in the |
||
self.behavior['offset'] = fro | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's in line with the index convention of python and I think it makes a lot of sense. If you want to split an interval a,b at c you can say from_to(a,c) and from_to(c,b) instead of from_to(a,c) and from_to(c+1,b). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python index convention does not use the language 'from' and 'to', which implies including the first and last values. I really think we should revert to SQL's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with offset_by and limit_to. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, SQL alchemy also uses the method name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I don't like is that |
||
self.behavior['limit'] = to - fro | ||
return self | ||
|
||
def order_by(self, order_by): | ||
self.behavior['order_by'] = order_by | ||
@copy_first | ||
def order_by(self, *args): | ||
if len(args) > 0: | ||
self.behavior['order_by'] = self.behavior['order_by'] if self.behavior['order_by'] is not None else [] | ||
namepat = re.compile(r"\s*(?P<name>\w+).*") | ||
for a in args: # remove duplicates | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to remove duplicates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Users could do something like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each method call creating separate One thing that SQLAlchemy supports that may be nice is to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If SQL sorts from last to first, this means that the first attribute is the major sort key and the others are subordinate keys. The question is, what do we expect from a call
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would think that it would actually make sense to make later keys the subordinate ones, in line with how MySQL handles There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's not get carried away and overcomplicate things. Let's do this:
|
||
name = namepat.match(a).group('name') | ||
pat = re.compile(r"%s(\s*$|\s+(\S*\s*)*$)" % (name,)) | ||
self.behavior['order_by'] = [e for e in self.behavior['order_by'] if not pat.match(e)] | ||
self.behavior['order_by'].extend(args) | ||
return self | ||
|
||
@copy_first | ||
def as_dict(self): | ||
self.behavior['as_dict'] = True | ||
|
||
def ascending(self): | ||
self.behavior['descending'] = False | ||
return self | ||
|
||
def descending(self): | ||
self.behavior['descending'] = True | ||
return self | ||
|
||
def apply(self, f): | ||
self.behavior['map'] = f | ||
return self | ||
|
||
def limit_by(self, limit): | ||
@copy_first | ||
def limit_to(self, limit): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think that making code read like a sentence is an advantage. Expressiveness and brevity are better than sounding like English sentences. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's clear that we have different opinions on this. Let's have a vote and stick with what the majority prefers. I suggest that the two options are 1. |
||
self.behavior['limit'] = limit | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the reason for grouping the settings in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is to make copying of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it's easier to insert them into the cursor as kwargs. It distinguishes the fetch behavior from other properties of the object, like |
||
return self | ||
|
||
@copy_first | ||
def set_behavior(self, **kwargs): | ||
self.behavior.update(kwargs) | ||
return self | ||
|
@@ -78,9 +94,7 @@ def __call__(self, **kwargs): | |
""" | ||
behavior = dict(self.behavior, **kwargs) | ||
|
||
cur = self._relation.cursor(offset=behavior['offset'], limit=behavior['limit'], | ||
order_by=behavior['order_by'], descending=behavior['descending'], | ||
as_dict=behavior['as_dict']) | ||
cur = self._relation.cursor(**behavior) | ||
|
||
heading = self._relation.heading | ||
if behavior['as_dict']: | ||
|
@@ -92,22 +106,15 @@ def __call__(self, **kwargs): | |
for blob_name in heading.blobs: | ||
ret[blob_name] = list(map(unpack, ret[blob_name])) | ||
|
||
if behavior['map'] is not None: | ||
f = behavior['map'] | ||
for i in range(len(ret)): | ||
ret[i] = f(ret[i]) | ||
|
||
return ret | ||
|
||
def __iter__(self): | ||
""" | ||
Iterator that returns the contents of the database. | ||
""" | ||
behavior = self.behavior | ||
behavior = dict(self.behavior) | ||
|
||
cur = self._relation.cursor(offset=behavior['offset'], limit=behavior['limit'], | ||
order_by=behavior['order_by'], descending=behavior['descending'], | ||
as_dict=behavior['as_dict']) | ||
cur = self._relation.cursor(**behavior) | ||
|
||
heading = self._relation.heading | ||
do_unpack = tuple(h in heading.blobs for h in heading.names) | ||
|
@@ -126,10 +133,10 @@ def keys(self, **kwargs): | |
""" | ||
Iterator that returns primary keys. | ||
""" | ||
b = dict(self.behavior, **kwargs) | ||
if 'as_dict' not in kwargs: | ||
kwargs['as_dict'] = True | ||
yield from self._relation.project().fetch.set_behavior(**kwargs) | ||
|
||
b['as_dict'] = True | ||
yield from self._relation.project().fetch.set_behavior(**b) | ||
|
||
def __getitem__(self, item): | ||
""" | ||
|
@@ -146,7 +153,7 @@ def __getitem__(self, item): | |
single_output = isinstance(item, str) or item is PRIMARY_KEY or isinstance(item, int) | ||
item, attributes = prepare_attributes(self._relation, item) | ||
|
||
result = self._relation.project(*attributes).fetch() | ||
result = self._relation.project(*attributes).fetch(**self.behavior) | ||
return_values = [ | ||
np.ndarray(result.shape, | ||
np.dtype({name: result.dtype.fields[name] for name in self._relation.primary_key}), | ||
|
@@ -158,8 +165,7 @@ def __getitem__(self, item): | |
return return_values[0] if single_output else return_values | ||
|
||
|
||
class Fetch1Query: | ||
|
||
class Fetch1: | ||
def __init__(self, relation): | ||
self._relation = relation | ||
|
||
|
@@ -202,4 +208,4 @@ def __getitem__(self, item): | |
else result[attribute][0] | ||
for attribute in item | ||
) | ||
return return_values[0] if single_output else return_values | ||
return return_values[0] if single_output else return_values |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,26 @@ class Subject(dj.Manual): | |
def prepare(self): | ||
self.insert(self.contents, ignore_errors=True) | ||
|
||
@schema | ||
class Language(dj.Lookup): | ||
|
||
definition = """ | ||
# languages spoken by some of the developers | ||
|
||
entry_id : int | ||
--- | ||
name : varchar(40) # name of the developer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, this table is not in 3rd normal form. I know it's just a test example but we might as well show good examples. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can change it later. |
||
language : varchar(40) # language | ||
""" | ||
|
||
contents = [ | ||
(0, 'Fabian', 'English'), | ||
(1, 'Edgar', 'English'), | ||
(2, 'Dimitri', 'English'), | ||
(3, 'Dimitri', 'Ukrainian'), | ||
(4, 'Fabian', 'German'), | ||
(5, 'Edgar', 'Japanese'), | ||
] | ||
|
||
@schema | ||
class Experiment(dj.Imported): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
from operator import itemgetter, attrgetter | ||
import itertools | ||
from nose.tools import assert_true | ||
from numpy.testing import assert_array_equal, assert_equal | ||
import numpy as np | ||
|
||
from . import schema | ||
import datajoint as dj | ||
|
||
|
||
class TestFetch: | ||
def __init__(self): | ||
self.subject = schema.Subject() | ||
self.lang = schema.Language() | ||
|
||
def test_getitem(self): | ||
"""Testing Fetch.__getitem__""" | ||
|
||
np.testing.assert_array_equal(sorted(self.subject.project().fetch(), key=itemgetter(0)), | ||
sorted(self.subject.fetch[dj.key], key=itemgetter(0)), | ||
'Primary key is not returned correctly') | ||
|
||
tmp = self.subject.fetch(order_by=['subject_id']) | ||
|
||
for column, field in zip(self.subject.fetch[:], [e[0] for e in tmp.dtype.descr]): | ||
np.testing.assert_array_equal(sorted(tmp[field]), sorted(column), 'slice : does not work correctly') | ||
|
||
subject_notes, key, real_id = self.subject.fetch['subject_notes', dj.key, 'real_id'] | ||
# | ||
np.testing.assert_array_equal(sorted(subject_notes), sorted(tmp['subject_notes'])) | ||
np.testing.assert_array_equal(sorted(real_id), sorted(tmp['real_id'])) | ||
np.testing.assert_array_equal(sorted(key, key=itemgetter(0)), | ||
sorted(self.subject.project().fetch(), key=itemgetter(0))) | ||
|
||
for column, field in zip(self.subject.fetch['subject_id'::2], [e[0] for e in tmp.dtype.descr][::2]): | ||
np.testing.assert_array_equal(sorted(tmp[field]), sorted(column), 'slice : does not work correctly') | ||
|
||
def test_order_by(self): | ||
"""Tests order_by sorting order""" | ||
langs = schema.Language.contents | ||
|
||
for ord_name, ord_lang in itertools.product(*2 * [['ASC', 'DESC']]): | ||
cur = self.lang.fetch.order_by('name ' + ord_name, 'language ' + ord_lang)() | ||
langs.sort(key=itemgetter(2), reverse=ord_lang == 'DESC') | ||
langs.sort(key=itemgetter(1), reverse=ord_name == 'DESC') | ||
for c, l in zip(cur, langs): | ||
assert_true(np.all(cc == ll for cc, ll in zip(c, l)), 'Sorting order is different') | ||
|
||
def test_order_by_default(self): | ||
"""Tests order_by sorting order with defaults""" | ||
langs = schema.Language.contents | ||
|
||
cur = self.lang.fetch.order_by('language', 'name DESC')() | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
|
||
for c, l in zip(cur, langs): | ||
assert_true(np.all([cc == ll for cc, ll in zip(c, l)]), 'Sorting order is different') | ||
|
||
def test_order_by_direct(self): | ||
"""Tests order_by sorting order passing it to __call__""" | ||
langs = schema.Language.contents | ||
|
||
cur = self.lang.fetch(order_by=['language', 'name DESC']) | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
for c, l in zip(cur, langs): | ||
assert_true(np.all([cc == ll for cc, ll in zip(c, l)]), 'Sorting order is different') | ||
|
||
def test_limit_to(self): | ||
"""Test the limit_to function """ | ||
langs = schema.Language.contents | ||
|
||
cur = self.lang.fetch.limit_to(4)(order_by=['language', 'name DESC']) | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
assert_equal(len(cur), 4, 'Length is not correct') | ||
for c, l in list(zip(cur, langs))[:4]: | ||
assert_true(np.all([cc == ll for cc, ll in zip(c, l)]), 'Sorting order is different') | ||
|
||
def test_from_to(self): | ||
"""Test the from_to function """ | ||
langs = schema.Language.contents | ||
|
||
cur = self.lang.fetch.from_to(2, 6)(order_by=['language', 'name DESC']) | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
assert_equal(len(cur), 4, 'Length is not correct') | ||
for c, l in list(zip(cur, langs[2:6])): | ||
assert_true(np.all([cc == ll for cc, ll in zip(c, l)]), 'Sorting order is different') | ||
|
||
def test_iter(self): | ||
"""Test iterator""" | ||
langs = schema.Language.contents | ||
|
||
cur = self.lang.fetch.order_by('language', 'name DESC') | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
for (_, name, lang), (_, tname, tlang) in list(zip(cur, langs)): | ||
assert_true(name == tname and lang == tlang, 'Values are not the same') | ||
|
||
def test_keys(self): | ||
"""test key iterator""" | ||
langs = schema.Language.contents | ||
langs.sort(key=itemgetter(1), reverse=True) | ||
langs.sort(key=itemgetter(2), reverse=False) | ||
|
||
cur = self.lang.fetch.order_by('language', 'name DESC')['entry_id'] | ||
cur2 = [e['entry_id'] for e in self.lang.fetch.order_by('language', 'name DESC').keys()] | ||
|
||
keys, _, _ = list(zip(*langs)) | ||
for k, c, c2 in zip(keys, cur, cur2): | ||
assert_true(k == c == c2, 'Values are not the same') | ||
|
||
def test_fetch1(self): | ||
key = {'entry_id': 0} | ||
true = schema.Language.contents[0] | ||
|
||
dat = (self.lang & key).fetch1() | ||
for k, (ke, c) in zip(true, dat.items()): | ||
assert_true(k == c == (self.lang & key).fetch1[ke], 'Values are not the same') | ||
|
||
def test_copy(self): | ||
"""Test whether modifications copy the object""" | ||
f = self.lang.fetch | ||
f2 = f.order_by('name') | ||
assert_true(f.behavior['order_by'] is None and len(f2.behavior['order_by']) == 1, 'Object was not copied') | ||
|
||
def test_overwrite(self): | ||
"""Test whether order_by overwrites duplicates""" | ||
f = self.lang.fetch.order_by('name DeSc ') | ||
f2 = f.order_by('name') | ||
assert_true(f2.behavior['order_by'] == ['name'], 'order_by attribute was not overwritten') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for
copy_first
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We thought that it would make more sense for each operation on the
fetch
object to return a newfetch
object rather than modifying the object in place. Sincefetch
object in itself is very cheap, this shouldn't be a real over head.copy_first
is a decorator that modifies the decorated method to work on and modify the copy of the object, supporting this idea.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use
copy.copy
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy.copy
does not copy the dictbehavior
. The functiondeepcopy
would copy_relation
as well which seemed like an overkill. That's why I introduced a copy constructor and used a decorator to tell the function, that I'd likeself
copied. Having the elements ofbehavior
as object attributes wouldn't solve the problem, since some of the values ofbehavior
are lists.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why do we need to copy the object in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i have looked into other modules and, yes, things like these usually copy the object. I am merging this PR.