Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for asynchronous client #39

Merged
merged 1 commit into from
Mar 18, 2016
Merged

Conversation

turu
Copy link
Contributor

@turu turu commented Jan 15, 2016

In a nutshell: This pull request adds, in a fully backward compatible way, a new optional, asynchronous client to the library.

Why this matters?
Druid can be very useful in many scenarios different than ad-hoc analysis of online data - f.e it can be successfully used to create flexible and interactive analytics dashboards or monitors. Applying asynchronous programming style is often a go-to architectural choice, when designing such systems - especially, when they are implemented in single threaded languages like python. The lack of asynchronous client can therefore be a serious problem. This pull request fixes it.

What has been done?

  • PyDruid client has been refactored to facilitate addition of a new client
    • extracted Query and QueryBuilder abstractions (tests migrated as well)
    • extracted common BaseDruidClient class
  • AsyncPyDruid client implemented, which mirrors functionality of synchronous PyDruid, but works in an asynchronous way
    • the client uses an asynchronous http client
    • it is optional in the same fashion as support for pandas export
  • Added minimal set of integration tests for both clients.
  • setup.py enhanced to use pip extras. As a result, both pandas export and async support are explicitly declared as optional features (see updated readme for more info)
  • All new (and old as well) classes are documented. Examples of usage added both to docs and readme.

(accidentally ;)) Fixed issues:

  1. PyDruid not extending object #29
  2. fa3c366 fixes the non-deterministic behavior of test_export_tsv. The issue originated from the fact that python dictionaries are unordered.

A note on implementation of AsyncPyDruid:
Asynchronous client uses elements of Tornado framework. This was a conscious choice based on the fact that Tornado is stable and mature, and integrates very well with python’s 3.5 native async support (async, await keywords) as well as other asynchronous frameworks. As a result one is not locked into using Tornado for the whole of his/her app.
See: http://www.tornadoweb.org/en/stable/guide/coroutines.html#python-3-5-async-and-await

A note on backward compatibility:
As a result of refactoring, clients are reusable. This is achieved through clients’ query methods returning Query objects filled with results, rather than result dictionary. However, this change is fully backward compatible - clients still expose the same (now deprecated) interface allowing one to access details of the most recently executed query. Query objects on the other hand, act as wrappers around result dictionary, giving users the same dictionary interface they had before, enriched to include methods for exporting results and accessing query details.

A note on python 2.x <-> 3.x compatibility:
The changes have been tested and should work reliably both in python 2.x and 3.x

Additional note on the authors:
Changes introduced by this pull request have been originally developed in https://github.com/turu/pydruid by me (github.com/turu), and github.com/l-j. Since then, they were forked to our organization (github.com/Dreamlab) and they are already used in production. We both signed the CLA.

Best regards,
Piotr Turek

@fjy
Copy link
Member

fjy commented Jan 21, 2016

@turu Thanks for this contrib. Do you mind filling out the CLA here: druid.io/community/cla.html

@turu
Copy link
Contributor Author

turu commented Jan 23, 2016

@fjy I already did. Can you verify that you have my CLA registered?

@drcrallen
Copy link
Member

Note there's a commit from https://github.com/l-j in here

@drcrallen
Copy link
Member

and there's a cla on file for @l-j and @turu

@turu
Copy link
Contributor Author

turu commented Jan 30, 2016

@fjy @drcrallen Is there any update on the status of this pr?

@fjy
Copy link
Member

fjy commented Feb 4, 2016

@turu apologies for delay, will ask people to finish reviewing this

@drcrallen @xvrl please review

@fjy
Copy link
Member

fjy commented Feb 20, 2016

@turu can we resolve the merge conflicts and I will merge this

@@ -0,0 +1,157 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the new license header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the year in license header, from 2013 to 2016. Is that ok? If there is a completely new header somewhere, which is supposed to be used, could you point me to it?

@turu
Copy link
Contributor Author

turu commented Mar 7, 2016

@fjy I resolved merge conflicts. To the best of my knowledge, everything should be ok. All tests are passing.

@xvrl
Copy link
Member

xvrl commented Mar 10, 2016

@turu Thanks for the comprehensive description, a few questions:

  • you mentioned something about backwards compatibility of Query objects, can you think of a case where someone's existing code might break because of that change? It might be worthwhile to include it in the release notes once we publish a new version
  • you've been running those changes in production. Are you still using the synchronous client in production as well? Just asking since that may reduce the chance of regressions with the synchronous client.
  • lastly, would you mind squashing commits before we can merge this, we like to keep commit history relatively clean to make it easier for everyone to track change history

@xvrl
Copy link
Member

xvrl commented Mar 10, 2016

Otherwise I'm 👍 on this one, assuming someone has tested that the existing client still works well in practice.

* refactored and tested client abstraction
* improved setup.py enabling pip extras
* fixed python3 compatibility
* fixed issues with test_export_tsv testcase
* provided backward compatibility

Fixes druid-io#29
@turu
Copy link
Contributor Author

turu commented Mar 13, 2016

@xvrl Regarding the points you mentioned:

  • from current user's perspective, the main change is the fact that instead of returning a plain dictionary from client's methods, we are now returning a Query object. However, this object extends collections.MutableSequence and acts as a wrapper around the dictionary. As a result, the change should be completely transparent to the end user. All in all, I think I would recommend the same kind of caution as with every upgrade of external dependency - no less, no more.
  • we have been using the synchronous client, during development of our system and continue to do so, when experimenting with stuff. It's more convenient to use than the asynchronous in such scenarios. Actually, I've just made a small experiment, by running one of our services on the side, replacing async client with the synchronous one and leaving all the logic intact. It worked as expected.
  • should be better now

fjy added a commit that referenced this pull request Mar 18, 2016
Add support for asynchronous client
@fjy fjy merged commit 9373190 into druid-io:master Mar 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants