-
Notifications
You must be signed in to change notification settings - Fork 77
Reorganize project while attempting to maintain as much backwards compatibility as possible #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…patibility as possible This commit is intended to reorganize the project to increase code reuse and make the majority of the behavior more consistent. This implements changes so that every call, whether inside of a batch or async execution or using a database instance directly, will return a Job construct with a lazily loading result. This enables the use of asynchronous http requests as well as other futures-based http responses (like multiprocessing.futures for example). However, this is not enabled by default- it requires setting the old_behavior flag to False to maintain as much backwards compatibility as possible. This commit does make a few changes. Any behavioral changes (mostly around asynchronous job handling and user management) are marked with # TODO CHANGED . The other potential changes involve differences in import statements. I could use some guidance as to which classes are public facing and should be imported in the __init__.py file and which are intended to be internal.
It looks like travis itself is having some issues right now. These changes pass all of the tests on my machine, and prior to the squash passed all the tests on travis. Im not sure how to request a rerun. |
Coverage problems are mostly in the lock.py file, which has python 2 and python 3 specific sections. Not sure how to mark these properly. |
Hi @dancollins34, Thank you very much for the PR! I am overloaded with work lately so I will get back to you in about a week or two. And yes don't worry about the coverage or tests. Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great. I have a few concerns however:
- We should use absolute imports over relative ones.
- While I can see your intentions, I don't see a huge benefit to some of refactor, especially when they break backwards compatibility (i.e. import paths). If we have to change existing tests just to accommodate restructuring, then I don't think we are going in the right direction. We could add imports to init.py files in some cases, but I feel more often than not they just bloat the code without adding much to streamlining our development process.
For example, I don't think the api
directory is necessary.
from arango.database import Database
from arango.aql import AQL
from arango.async import AsyncJob
Is (arguably) more intuitive, easier to remember, and simpler to use for the users than
from arango.api.databases.base import BaseDatabase
from arango.api.wrappers.aql import AQL
from arango.jobs.async import AsyncJob
Also regarding your question, while the recommended/intended way is to access everything from the ArangoClient
, I want to provide users flexibility to use w/e they want. I do not want to hide/limit the import scopes. This adds to why I don't want to complicate the import paths.
I do like how things are structured inside that directory though. So perhaps you can just move them all up a level, and revert things that cannot be without breaking compatibility. I guess what I want to say is "prioritize the user experience".
Lastly, if we do end up changing the paths, then we have to make changes to the doc .rst files as well so their autodoc generation is not broken.
Thanks!
from arango.client import ArangoClient # noqa: F401 | ||
from arango.exceptions import ArangoError # noqa: F401 | ||
from arango.utils.lock import RLock # noqa: F401 | ||
from .request import Request # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use relative import? Otherwise please change to arango.request
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I stated in the post below, I only used relative import for the init.py files, but can change them if you'd like.
@@ -0,0 +1 @@ | |||
from .api import APIWrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again please avoid relative import for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
def __init__(self, handler, response=None, connection=None, | ||
return_result=True): | ||
BaseJob.__init__(self, handler, None, | ||
job_id="ASYNC_NOT_YET_ASSIGNED", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of using magic numbers or strings. Why can't this just be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To differentiate between when return_result is False, and when the id has not yet been created due to lazy loading. End users will never see this id if they only use the id property as opposed to _job_id
http_client=None, | ||
enable_logging=True, | ||
logger=None, | ||
old_behavior=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter name is unclear. Please change to something like async_ready=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's an acceptable name.
self._old_behavior = old_behavior | ||
|
||
if old_behavior: | ||
self._default_job = OldJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to something that gives more context (e.g OldJob
to something like SyncJob
and BaseJob
to AsyncJob
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's the context for the names. It isn't accurate to describe BaseJob as AsyncJob- there is nothing inherently asynchronous about its behavior. All it does is lazy loading of variables to prevent accessing the response until result is called. This is a requirement for asynchronous dispatch, but is not asynchronous in and of itself. The difference between the two is that OldJob is a mask which has the same interface as the rest of the Jobs to enable using the class in the same manner, but instead of an instance of BaseJob, it returns the result of BaseJob.result(raise_errors=True) to maintain the same behavior as previously. OldJob may not be the best name, but I don't think SyncJob is good either.
@@ -4,8 +4,8 @@ | |||
from six import string_types | |||
|
|||
from arango import ArangoClient | |||
from arango.collections.edge import EdgeCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks backwards compatibility. I don't see a great need to have an api
module in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll remove the api module.
|
||
x = txn.commit() | ||
|
||
assert len(x[0]) == len(test_docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's going on here with the indexing 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the changes I mentioned from PR#59. The transaction will now return its results, take a look at arango/connections/executions/transaction.py to see how. However, the result is returned as a one element cursor. Hence the 0 index, to get the first result, which is the result of the entire transaction.
@@ -1,4 +1,4 @@ | |||
from arango.version import VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I would prefer to keep this in version.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep will do
@@ -7,7 +7,7 @@ def arango_version(client): | |||
"""Return the major and minor version of ArangoDB. | |||
|
|||
:param client: The ArangoDB client. | |||
:type client: arango.ArangoClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my IDE made that change automatically. I've created a new class, SystemDatabase, which inherits directly from BaseDatabase and implements all of the functions which are required to be called on _system. ArangoClient still exists, but it now inherits from SystemDatabase and creates the base connection required from it's constructor kwargs, while SystemDatabase takes a connection in its constructor like BaseDatabase. Take a look at arango/client.py
@@ -139,7 +137,10 @@ def test_batch_insert_context_manager_no_commit_on_error(): | |||
except ValueError: | |||
assert len(col) == 0 | |||
assert job1.status() == 'pending' | |||
assert job1.result() is None | |||
# TODO CHANGED Behavior: Result of jobs without a response fails on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which jobs are those exactly? Is this because of the way you changed the responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slight change in the behavior. I think it's a bit unclear for it to return None, especially since it would return None if return_result was false. If the job was never sent to the server, I feel that calling result() on it should throw an error. If you believe I'm wrong I can look for a way to revert this behavior.
Hello, So, just providing the general comments here, I'll also provide specific comments on each of the code sections you pointed out. Regarding your point about imports, you'll notice that relative imports only exist within the init.py files. I use these because it allows for modules (folders) to be relocated and rearranged without any change to their contents. A relative import to .some_file will still reference the same file if the directory is renamed from a to b, whereas a.some_file will have to be changed to b.some_file everywhere. If you would prefer to use absolute imports for everything however, I can change it to that. Regarding the api directory comments, sure I have no problem with flattening this level of the structure. Regarding the documentation, I didn't think about that but yes the documentation will need to be changed to prevent the auto doc links from breaking. I'll do that once the changes are finalized to avoid having to write 87 versions of the file. -Daniel |
closed for pr 66 |
This is probably not ready to be fully merged. It is a large reorganization, and it probably has some issues/inconsistencies and I would appreciate any comments.
This commit is intended to reorganize the project to increase code reuse and make the majority of the behavior more consistent. This implements changes so that every call, whether inside of a batch or async execution or using a database instance directly, will return a Job construct with a lazily loading result. This enables the use of asynchronous http requests as well as other futures-based http responses (like multiprocessing.futures for example). However, this is not enabled by default- it requires setting the old_behavior flag to False to maintain as much backwards compatibility as possible.
This commit does make a few changes. Any behavioral changes (mostly around asynchronous job handling and user management) are marked with # TODO CHANGED . The other potential changes involve differences in import statements. I could use some guidance as to which classes are public facing and should be imported in the init.py file and which are intended to be internal.