Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
f8a98dd
simplified Fetch, fixed a bug in autopopulate.progress, fixed a bug i…
dimitri-yatsenko Apr 8, 2016
fb0d260
minor cleanup
dimitri-yatsenko Apr 8, 2016
4fbb9f3
simplified fetch, removed slice indexing
dimitri-yatsenko May 8, 2016
8706bd6
Merge branch 'master' of https://github.com/datajoint/datajoint-python
dimitri-yatsenko May 8, 2016
3240966
implemented aliased foreign keys.
dimitri-yatsenko May 18, 2016
57e46a7
simplified the handling of dependencies
dimitri-yatsenko May 19, 2016
4a7d6cf
restored a test that was accidentally removed in last commit
dimitri-yatsenko May 19, 2016
956ac5f
a few simplifications. ERD is not yet fully functional.
dimitri-yatsenko May 20, 2016
1f364be
removed Schema.table2class
dimitri-yatsenko May 20, 2016
a264772
dependencies are loaded more efficiently for individual tables.
dimitri-yatsenko May 20, 2016
88a90b9
Completed ERD layout with relabeled nodes.
dimitri-yatsenko May 23, 2016
db3b534
debugged the spawning of missing relation classes, now in Schema.spaw…
dimitri-yatsenko May 24, 2016
a7d9c06
renamed BaseRelation._prepare to prepare and made it possible to bypa…
dimitri-yatsenko May 24, 2016
9f12677
fixed a bug from the previous commit
dimitri-yatsenko May 24, 2016
da30a2b
bugfix in Schema.spawn_missing_classes
dimitri-yatsenko May 24, 2016
c707dc3
improved ERD layout
dimitri-yatsenko May 24, 2016
f24af64
added ERD algbebra: union, intersection, difference
dimitri-yatsenko May 24, 2016
cbb09aa
added color to the ERD
dimitri-yatsenko May 25, 2016
9b3a655
ERD can now omit part tables.
dimitri-yatsenko May 25, 2016
edd91a1
UserRelation subclasses now keep the order of their member declaration
dimitri-yatsenko May 26, 2016
3201653
bugfix, improved ERD appearance
dimitri-yatsenko May 29, 2016
9c4cc07
Merge branch 'master' of https://github.com/datajoint/datajoint-python
dimitri-yatsenko May 29, 2016
08fb275
added nose tests for dependencies and ERDs
dimitri-yatsenko May 29, 2016
1d565ff
Merge branch 'master' of https://github.com/datajoint/datajoint-python
dimitri-yatsenko May 29, 2016
fc2ed59
expanded unit tests for dependencies and ERD
dimitri-yatsenko May 29, 2016
1612b62
Merge branch 'master' of https://github.com/datajoint/datajoint-python
dimitri-yatsenko May 31, 2016
9a3dc48
removed utils.group_by -- use dj.U instead.
dimitri-yatsenko May 31, 2016
ec71731
added a test for handling an undecorated class
dimitri-yatsenko May 31, 2016
454a4fd
removed indexing fetch by an int. Added more tests.
dimitri-yatsenko May 31, 2016
c1e103b
fixed duplicate nose test function names
dimitri-yatsenko May 31, 2016
50ea24b
added a test for unauthorized database access
dimitri-yatsenko May 31, 2016
4f44b4f
improved ERD appearance: better colors, dashed lines for non-primary …
dimitri-yatsenko May 31, 2016
494ea16
travis user is now "datajoint" rather than root and has limited privi…
dimitri-yatsenko May 31, 2016
a1bb98d
bugfix from previous commit
dimitri-yatsenko May 31, 2016
838b75e
bugfix from previous commit
dimitri-yatsenko May 31, 2016
37104b7
ERD layout now minimizes edge crossings
dimitri-yatsenko Jun 1, 2016
43ee432
fixed syntax error for Python3.4
dimitri-yatsenko Jun 1, 2016
e3aa42e
fixed another bug specific to Python3.4 and earlier versions
dimitri-yatsenko Jun 1, 2016
bcc1834
another Python3.4 compatibility fix
dimitri-yatsenko Jun 1, 2016
cee2f14
added test for schema.drop()
dimitri-yatsenko Jun 1, 2016
7b44378
minor
dimitri-yatsenko Jun 1, 2016
b66b8ee
minor
dimitri-yatsenko Jun 1, 2016
184d9ad
added copy constructed to ERD, added tests for ERD algebra.
dimitri-yatsenko Jun 1, 2016
218150c
bugfix from previous commit
dimitri-yatsenko Jun 1, 2016
6a25bda
removed the option to omit part tables from ERD
dimitri-yatsenko Jun 1, 2016
a396e61
simplified ERD constructor.
dimitri-yatsenko Jun 1, 2016
e241cf9
minor cleanup in BaseRelation.insert
dimitri-yatsenko Jun 1, 2016
84a7d2c
bugfix from previous commit
dimitri-yatsenko Jun 1, 2016
ea11a30
minor cleanup, added test
dimitri-yatsenko Jun 1, 2016
5374177
minor simplification in tests
dimitri-yatsenko Jun 1, 2016
decbc32
removed lazy table declaration -- it was not used: all tables are now…
dimitri-yatsenko Jun 1, 2016
f6a3d0b
simplified UserRelation subclasses
dimitri-yatsenko Jun 1, 2016
54018c1
removed the prepare callback. Initial contents is now entered only u…
dimitri-yatsenko Jun 1, 2016
47088e1
bugfix in `schema.spawn_missing_classes`
dimitri-yatsenko Jun 2, 2016
f4b170e
improved handling of underclared relation classses
dimitri-yatsenko Jun 2, 2016
a2d06b0
renamed populated_from to poprel.
dimitri-yatsenko Jun 2, 2016
f310627
RelationalOperand._repr_html_ now prints the table comment as the title
dimitri-yatsenko Jun 2, 2016
13814dd
added tutorial in IPython notebooks, added more tests.
dimitri-yatsenko Jun 3, 2016
8d4c107
updated README to reference tutorial
dimitri-yatsenko Jun 3, 2016
b2bb987
updated the tutorial links from main page
dimitri-yatsenko Jun 3, 2016
9a8b7a5
updated tutorial notebooks with crosslinks
dimitri-yatsenko Jun 3, 2016
946b9f0
edited tutorials
dimitri-yatsenko Jun 3, 2016
14db2b7
updated tutorial
dimitri-yatsenko Jun 3, 2016
262eceb
updated tutorial-notebooks, removed demos
dimitri-yatsenko Jun 3, 2016
5b8e2f1
added test for aliased foreign key
dimitri-yatsenko Jun 6, 2016
cf1ce1a
renamed the argument `left` to `keep_all_rows` in aggregate
dimitri-yatsenko Jun 6, 2016
e1d70dd
added tests for relation U.
dimitri-yatsenko Jun 6, 2016
0fce887
ERD fonts are scalable with an input argument
dimitri-yatsenko Jun 7, 2016
f149964
All relations now have copy constructors. Also, RelationalOperand no …
dimitri-yatsenko Jun 9, 2016
bb216f9
Redesigned Relation U as a subclass of RelationalOperand
dimitri-yatsenko Jun 9, 2016
6be9bd8
minor cleanup
dimitri-yatsenko Jun 9, 2016
2540730
added tests for relation U restirction
dimitri-yatsenko Jun 9, 2016
a5141a8
bug fix in restrict
dimitri-yatsenko Jun 9, 2016
6a6fc3c
minor cleanup
dimitri-yatsenko Jun 9, 2016
6f2f2a0
fixed #209: foreign key attributes are now emphasized in HTML represe…
dimitri-yatsenko Jun 10, 2016
075204e
replaced `poprel` with `key_source`
dimitri-yatsenko Jun 11, 2016
3f094b9
removed the use of abstract classes, made Part a UserRelation
dimitri-yatsenko Jun 11, 2016
486a5d6
Updates to the `contents` property of user relations are now synced t…
dimitri-yatsenko Jun 11, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
sudo: required
language: python
env:
- DJ_TEST_HOST="127.0.0.1" DJ_TEST_USER="root" DJ_TEST_PASSWORD="" DJ_HOST="127.0.0.1" DJ_USER="root" DJ_PASSWORD=""
- DJ_TEST_HOST="127.0.0.1" DJ_TEST_USER="datajoint" DJ_TEST_PASSWORD="datajoint" DJ_HOST="127.0.0.1" DJ_USER="datajoint" DJ_PASSWORD="datajoint"
python:
- "3.4"
- "3.5"
services: mysql
before_install:
- sudo apt-get -qq update
- sudo apt-get install -y libblas-dev liblapack-dev libatlas-dev gfortran
- mysql -e "create user 'datajoint'@'%' identified by 'datajoint'; GRANT ALL PRIVILEGES ON \`djtest\_%\`.* TO 'datajoint'@'%';" -uroot
install:
- travis_wait 30 pip install -r requirements.txt
- pip install nose nose-cov python-coveralls
Expand Down
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ DataJoint for Python is a high-level programming interface for relational databa

DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab for the distributed processing and management of large volumes of data streaming from regular experiments. Starting in 2011, DataJoint has been available as an open-source project adopted by other labs and improved through contributions from several developers.


## Quick start guide
The current pip version is ancient. We will update it as soon as datajoint release 1.0 is out.
~~To install datajoint using `pip` just run:~~
Expand All @@ -27,3 +28,26 @@ pip install datajoint

~~However, please be aware that DataJoint for Python is still undergoing major changes, and thus what's available on PyPI via `pip` is in **pre-release state**!~~

## Tutorial
1. [Setup](tutorial-notebooks/Primer00.ipynb)
1. [Connect](tutorial-notebooks/Primer01.ipynb)
1. [Create a schema, define a table](tutorial-notebooks/Primer02.ipynb)
1. [Dependencies](tutorial-notebooks/Primer03.ipynb)
1. [Schemas as Python modules](tutorial-notebooks/Primer04.ipynb)
1. [Lookup tables](tutorial-notebooks/Primer05.ipynb)
1. [Queries 1: restrictions and joins](tutorial-notebooks/Primer06.ipynb)
1. [Dependencies 2: non-primary](tutorial-notebooks/Primer07.ipynb)
1. [Queries 2: projections](tutorial-notebooks/Primer08.ipynb)
1. [Dependencies 3: aliased foreign keys](tutorial-notebooks/Primer09.ipynb)
1. [Computations](tutorial-notebooks/Primer10.ipynb)
1. [Parameterized Computations](tutorial-notebooks/Primer11.ipynb)
1. [Master-part relationships](tutorial-notebooks/Primer12.ipynb)
1. [Understanding transactions](tutorial-notebooks/Primer13.ipynb)
1. [Job management for distributed computation](tutorial-notebooks/Primer14.ipynb)
1. [Projection and aggregation](tutorial-notebooks/Primer15.ipynb)
1. [Relation U](tutorial-notebooks/Primer16.ipynb)
1. [Dependencies 4: mapped dependencies](tutorial-notebooks/Primer17.ipynb)
1. [Representing graphs](tutorial-notebooks/Primer18.ipynb)
1. [Customizing computations](tutorial-notebooks/Primer19.ipynb)
1. [BOSS interface](tutorial-notebooks/Primer20.ipynb)
1. [Web interfaces](tutorial-notebooks/Primer21.ipynb)
15 changes: 11 additions & 4 deletions datajoint/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,25 @@

DataJoint is free software under the LGPL License. In addition, we request
that any use of DataJoint leading to a publication be acknowledged in the publication.

Please cite:
http://biorxiv.org/content/early/2015/11/14/031658
http://dx.doi.org/10.1101/031658
"""

import logging
import os

__author__ = "Dimitri Yatsenko, Edgar Walker, and Fabian Sinz at Baylor College of Medicine"
__version__ = "0.2"
__version__ = "0.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be 1.0? I thought you killed of all issues for milestone 1.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just incremented the version. When we have a real 1.0 candidate, we can call it 0.9.1 or something.

__date__ = "June 1, 2016"
__all__ = ['__author__', '__version__',
'config', 'conn', 'kill',
'Connection', 'Heading', 'BaseRelation', 'FreeRelation', 'Not', 'schema',
'Manual', 'Lookup', 'Imported', 'Computed', 'Part',
'AndList', 'OrList']
'AndList', 'OrList', 'ERD', 'U']

print('DataJoint', __version__, '('+__date__+')')


class key:
Expand Down Expand Up @@ -65,12 +72,12 @@ class DataJointError(Exception):

logger.setLevel(log_levels[config['loglevel']])


# ------------- flatten import hierarchy -------------------------
from .connection import conn, Connection
from .base_relation import BaseRelation
from .user_relations import Manual, Lookup, Imported, Computed, Part
from .relational_operand import Not, AndList, OrList
from .relational_operand import Not, AndList, OrList, U
from .heading import Heading
from .schema import Schema as schema
from .kill import kill
from .erd import ERD
40 changes: 22 additions & 18 deletions datajoint/autopopulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import logging
import datetime
import random
from pymysql import OperationalError
from .relational_operand import RelationalOperand, AndList
from . import DataJointError
from .base_relation import FreeRelation
Expand All @@ -12,41 +13,40 @@
logger = logging.getLogger(__name__)


class AutoPopulate(metaclass=abc.ABCMeta):
class AutoPopulate:
"""
AutoPopulate is a mixin class that adds the method populate() to a Relation class.
Auto-populated relations must inherit from both Relation and AutoPopulate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this to matlab style? I liked the more verbose version populated_from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I write tutorials and record presentations, I have to use poprel as a noun and it's much easier to discuss than populated_from. It's a new term that we introduce in datajoint, which has specific uses. It has been used in MATLAB from the early versions, so changing it there is not practical. Naming it the same way in both languages helps.

Copy link
Contributor

@fabiansinz fabiansinz Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I breaks all computed relation classes that have been written so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can either fix them or provide populated_from alias for backward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most computed tables rely on the default poprel and are not affected. Very few tables should modify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good portion of my computed and imported tables rely on this feature for more efficient computation. Let's not make things complicated by providing several ways of doing the same thing. Calling it populated_from was a team decision, it is descriptive, and the ease with which something can be used in discussions should not be a primary reason to choose property names. If most computed tables rely on the default value, why not change it in Matlab?

Copy link
Member Author

@dimitri-yatsenko dimitri-yatsenko Jun 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MATLAB does not yet have a default popRel implemented, so every computed class does have one.

The poprel feature needs to be documented fairly well. Try using populated_from in a sentence. It is easy to define a poprel in the first paragraph and then use it throughout. From the name alone, no one will understand what populated_from actually is, so the name is not that descriptive. Might as well give it a name and describe it in the tutorial.

must define the property populated_from, and must define the callback method _make_tuples.
must define the property `key_source`, and must define the callback method _make_tuples.
"""
_jobs = None
_populated_from = None

@property
def populated_from(self):
def key_source(self):
"""
:return: the relation whose primary key values are passed, sequentially, to the
`_make_tuples` method when populate() is called.The default value is the
join of the parent relations. Users may override to change the granularity
or the scope of populate() calls.
"""
if self._populated_from is None:
self.connection.dependencies.load()
parents = [FreeRelation(self.target.connection, rel) for rel in self.target.parents]
self.connection.dependencies.load(self.full_table_name)
parents = self.target.parents(primary=True)
if not parents:
raise DataJointError('A relation must have parent relations to be able to be populated')
ret = parents.pop(0)
self._populated_from = FreeRelation(self.connection, parents.pop(0)).proj()
while parents:
ret *= parents.pop(0)
self._populated_from = ret
self._populated_from *= FreeRelation(self.connection, parents.pop(0)).proj()
return self._populated_from

@abc.abstractmethod
def _make_tuples(self, key):
"""
Derived classes must implement method _make_tuples that fetches data from tables that are
above them in the dependency hierarchy, restricting by the given key, computes dependent
attributes, and inserts the new tuples into self.
"""
raise NotImplementedError('Subclasses of AutoPopulate must implement the method "_make_tuples"')

@property
def target(self):
Expand All @@ -58,10 +58,10 @@ def target(self):

def populate(self, *restrictions, suppress_errors=False, reserve_jobs=False, order="original"):
"""
rel.populate() calls rel._make_tuples(key) for every primary key in self.populated_from
rel.populate() calls rel._make_tuples(key) for every primary key in self.key_source
for which there is not already a tuple in rel.

:param restrictions: a list of restrictions each restrict (rel.populated_from - target.proj())
:param restrictions: a list of restrictions each restrict (rel.key_source - target.proj())
:param suppress_errors: suppresses error if true
:param reserve_jobs: if true, reserves job to populate in asynchronous fashion
:param order: "original"|"reverse"|"random" - the order of execution
Expand All @@ -73,10 +73,10 @@ def populate(self, *restrictions, suppress_errors=False, reserve_jobs=False, ord
if order not in valid_order:
raise DataJointError('The order argument must be one of %s' % str(valid_order))

todo = self.populated_from
todo = self.key_source
if not isinstance(todo, RelationalOperand):
raise DataJointError('Invalid populated_from value')
todo.restrict(AndList(restrictions))
raise DataJointError('Invalid key_source value')
todo = todo & AndList(restrictions)

error_list = [] if suppress_errors else None

Expand Down Expand Up @@ -104,7 +104,11 @@ def populate(self, *restrictions, suppress_errors=False, reserve_jobs=False, ord
try:
self._make_tuples(dict(key))
except Exception as error:
self.connection.cancel_transaction()
try:
self.connection.cancel_transaction()
except OperationalError:
pass

if reserve_jobs:
jobs.error(self.target.table_name, key, error_message=str(error))
if not suppress_errors:
Expand All @@ -118,14 +122,14 @@ def populate(self, *restrictions, suppress_errors=False, reserve_jobs=False, ord
jobs.complete(self.target.table_name, key)
return error_list

def progress(self, restriction=None, display=True):
def progress(self, *restrictions, display=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make **kwrestrictions as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example? We do not have a concept of kw restrictions in the restrict operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do every time we restrict by a dictionary.

Copy link
Member Author

@dimitri-yatsenko dimitri-yatsenko Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restriction can be a dict, which is not the same as **kwargs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this syntax would be useful: rel.progress(animal_id=5000)

Copy link
Member Author

@dimitri-yatsenko dimitri-yatsenko Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's interesting but not used anywhere else yet. Yes, we could add this syntax. Then progress should not use any other arguments such as display or maybe prefix it as _display to avoid name collisions.

Copy link
Member Author

@dimitri-yatsenko dimitri-yatsenko Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant to do that since progress has a similar signature as populate, which takes a lot more named arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It looks elegant enough that we should implement it for both progress and populate. Name collisions should be obvious to users and they will avoid them. Does this look confusing?

ScanInfo().populate(animal_id=5014, reserve_jobs=True, suppress_errors=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use the keyword arguments in a different order for clarity, but it would be fine with me. I think we should either allow *restrictions and **kwrestrictions or neither and have a simple restrictions argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let use a simple restrictions argument. It will be cleaner.

"""
report progress of populating this table
:return: remaining, total -- tuples to be populated
"""
todo = self.populated_from & restriction
todo = self.key_source & AndList(restrictions)
total = len(todo)
remaining = len(todo - self.target.project())
remaining = len(todo - self.target.proj())
if display:
print('%-20s' % self.__class__.__name__, flush=True, end=': ')
print('Completed %d of %d (%2.1f%%) %s' %
Expand Down
Loading