Skip to content

Reorganize project while attempting to maintain as much backwards compatibility as possible #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dancollins34
Copy link

This is probably not ready to be fully merged. It is a large reorganization, and it probably has some issues/inconsistencies and I would appreciate any comments.

This commit is intended to reorganize the project to increase code reuse and make the majority of the behavior more consistent. This implements changes so that every call, whether inside of a batch or async execution or using a database instance directly, will return a Job construct with a lazily loading result. This enables the use of asynchronous http requests as well as other futures-based http responses (like multiprocessing.futures for example). However, this is not enabled by default- it requires setting the old_behavior flag to False to maintain as much backwards compatibility as possible.

This commit does make a few changes. Any behavioral changes (mostly around asynchronous job handling and user management) are marked with # TODO CHANGED . The other potential changes involve differences in import statements. I could use some guidance as to which classes are public facing and should be imported in the init.py file and which are intended to be internal.

…patibility as possible

This commit is intended to reorganize the project to increase code reuse and make the majority of the behavior more consistent.  This implements changes so that every call, whether inside of a batch or async execution or using a database instance directly, will return a Job construct with a lazily loading result.  This enables the use of asynchronous http requests as well as other futures-based http responses (like multiprocessing.futures for example).  However, this is not enabled by default- it requires setting the old_behavior flag to False to maintain as much backwards compatibility as possible.

This commit does make a few changes.  Any behavioral changes (mostly around asynchronous job handling and user management) are marked with # TODO CHANGED .  The other potential changes involve differences in import statements.  I could use some guidance as to which classes are public facing and should be imported in the __init__.py file and which are intended to be internal.
@dancollins34
Copy link
Author

It looks like travis itself is having some issues right now. These changes pass all of the tests on my machine, and prior to the squash passed all the tests on travis. Im not sure how to request a rerun.

@coveralls
Copy link

coveralls commented Nov 26, 2017

Coverage Status

Coverage decreased (-0.7%) to 98.413% when pulling 3bdebdb on dancollins34:dev-new-commits into 31c356e on joowani:dev.

@dancollins34
Copy link
Author

Coverage problems are mostly in the lock.py file, which has python 2 and python 3 specific sections. Not sure how to mark these properly.

@joowani
Copy link
Contributor

joowani commented Nov 27, 2017

Hi @dancollins34,

Thank you very much for the PR! I am overloaded with work lately so I will get back to you in about a week or two. And yes don't worry about the coverage or tests.

Best,
Joohwan

Copy link
Contributor

@joowani joowani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I have a few concerns however:

  • We should use absolute imports over relative ones.
  • While I can see your intentions, I don't see a huge benefit to some of refactor, especially when they break backwards compatibility (i.e. import paths). If we have to change existing tests just to accommodate restructuring, then I don't think we are going in the right direction. We could add imports to init.py files in some cases, but I feel more often than not they just bloat the code without adding much to streamlining our development process.

For example, I don't think the api directory is necessary.

from arango.database import Database
from arango.aql import AQL
from arango.async import AsyncJob

Is (arguably) more intuitive, easier to remember, and simpler to use for the users than

from arango.api.databases.base import BaseDatabase
from arango.api.wrappers.aql import AQL
from arango.jobs.async import AsyncJob

Also regarding your question, while the recommended/intended way is to access everything from the ArangoClient, I want to provide users flexibility to use w/e they want. I do not want to hide/limit the import scopes. This adds to why I don't want to complicate the import paths.

I do like how things are structured inside that directory though. So perhaps you can just move them all up a level, and revert things that cannot be without breaking compatibility. I guess what I want to say is "prioritize the user experience".

Lastly, if we do end up changing the paths, then we have to make changes to the doc .rst files as well so their autodoc generation is not broken.

Thanks!

from arango.client import ArangoClient # noqa: F401
from arango.exceptions import ArangoError # noqa: F401
from arango.utils.lock import RLock # noqa: F401
from .request import Request # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use relative import? Otherwise please change to arango.request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in the post below, I only used relative import for the init.py files, but can change them if you'd like.

@@ -0,0 +1 @@
from .api import APIWrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again please avoid relative import for consistency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

def __init__(self, handler, response=None, connection=None,
return_result=True):
BaseJob.__init__(self, handler, None,
job_id="ASYNC_NOT_YET_ASSIGNED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of using magic numbers or strings. Why can't this just be None?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To differentiate between when return_result is False, and when the id has not yet been created due to lazy loading. End users will never see this id if they only use the id property as opposed to _job_id

http_client=None,
enable_logging=True,
logger=None,
old_behavior=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter name is unclear. Please change to something like async_ready=False.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's an acceptable name.

self._old_behavior = old_behavior

if old_behavior:
self._default_job = OldJob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to something that gives more context (e.g OldJob to something like SyncJob and BaseJob to AsyncJob).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's the context for the names. It isn't accurate to describe BaseJob as AsyncJob- there is nothing inherently asynchronous about its behavior. All it does is lazy loading of variables to prevent accessing the response until result is called. This is a requirement for asynchronous dispatch, but is not asynchronous in and of itself. The difference between the two is that OldJob is a mask which has the same interface as the rest of the Jobs to enable using the class in the same manner, but instead of an instance of BaseJob, it returns the result of BaseJob.result(raise_errors=True) to maintain the same behavior as previously. OldJob may not be the best name, but I don't think SyncJob is good either.

@@ -4,8 +4,8 @@
from six import string_types

from arango import ArangoClient
from arango.collections.edge import EdgeCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks backwards compatibility. I don't see a great need to have an api module in the middle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll remove the api module.


x = txn.commit()

assert len(x[0]) == len(test_docs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's going on here with the indexing 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the changes I mentioned from PR#59. The transaction will now return its results, take a look at arango/connections/executions/transaction.py to see how. However, the result is returned as a one element cursor. Hence the 0 index, to get the first result, which is the result of the entire transaction.

@@ -1,4 +1,4 @@
from arango.version import VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I would prefer to keep this in version.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep will do

@@ -7,7 +7,7 @@ def arango_version(client):
"""Return the major and minor version of ArangoDB.

:param client: The ArangoDB client.
:type client: arango.ArangoClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break backward compatibility?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my IDE made that change automatically. I've created a new class, SystemDatabase, which inherits directly from BaseDatabase and implements all of the functions which are required to be called on _system. ArangoClient still exists, but it now inherits from SystemDatabase and creates the base connection required from it's constructor kwargs, while SystemDatabase takes a connection in its constructor like BaseDatabase. Take a look at arango/client.py

@@ -139,7 +137,10 @@ def test_batch_insert_context_manager_no_commit_on_error():
except ValueError:
assert len(col) == 0
assert job1.status() == 'pending'
assert job1.result() is None
# TODO CHANGED Behavior: Result of jobs without a response fails on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which jobs are those exactly? Is this because of the way you changed the responses?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight change in the behavior. I think it's a bit unclear for it to return None, especially since it would return None if return_result was false. If the job was never sent to the server, I feel that calling result() on it should throw an error. If you believe I'm wrong I can look for a way to revert this behavior.

@dancollins34
Copy link
Author

Hello,

So, just providing the general comments here, I'll also provide specific comments on each of the code sections you pointed out.

Regarding your point about imports, you'll notice that relative imports only exist within the init.py files. I use these because it allows for modules (folders) to be relocated and rearranged without any change to their contents. A relative import to .some_file will still reference the same file if the directory is renamed from a to b, whereas a.some_file will have to be changed to b.some_file everywhere. If you would prefer to use absolute imports for everything however, I can change it to that.

Regarding the api directory comments, sure I have no problem with flattening this level of the structure.

Regarding the documentation, I didn't think about that but yes the documentation will need to be changed to prevent the auto doc links from breaking. I'll do that once the changes are finalized to avoid having to write 87 versions of the file.

-Daniel

@dancollins34
Copy link
Author

closed for pr 66

@dancollins34 dancollins34 deleted the dev-new-commits branch January 17, 2018 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants